JSON data format for MCQ data bank

Prahlad Yeri

I'm creating a data bank of MCQ (Multi Choice Questions) and their answers so that an app can be built around it. Regarding the actual storage format, I have two ideas: 1. An array of objects with keys (`q` for question, `a` for choice-a, etc.). 2. An array of arrays. The first one is obviously more readable. Here is a brief sample of what I have so far: ```json { "data": [ { "q": "What kind of language is Python?", "a": "Compiled", "b": "Interpreted", "c": "Parsed", "d": "Elaborated", "r": "b" }, { "q": "Who invented Python?", "a": "Rasmus Lerdorf", "b": "Guido Van Rossum", "c": "Bill Gates", "d": "Linus Torvalds", "r": "b" } ] } ``` The app will read the q key to print the question, then present the four options (a, b, c and d). And the last key (r) will store the right answer. This is very much readable when viewed as a JSON file also. However, what I am thinking is that once the data-bank grows in size into hundreds/thousands of QA, a lot of space will be wasted by those keys (q,a,b,etc.) isn't it? In this case, an array like this is more efficient from storage perspective: ```json { "data": [ [ "What kind of language is Python?", "Compiled", "Interpreted", "Parsed", "Elaborated", 1 ], [ "Who invented Python?", "Rasmus Lerdorf", "Guido Van Rossum", "Bill Gates", "Linus Torvalds", 1 ] ] } ``` In this case, each array will have 6 items viz. the question, four choices and finally the index of the correct choice (1==Interpreted, etc.). Which of these two formats is better? Feel free to suggest any third format which is even better than these two.

Mike Hankey

I like the 1st better. The 2nd, for me anyway, would be harder to keep track of square brackets.

If you can't find time to do it right the first time, how are you going to find time to do it again? PartsBin an Electronics Part Organizer - Release Version 1.4.0 (Many new features) JaxCoder.com Latest Article: EventAggregator

PIEBALDconsult

Neither.

Mircea Neacsu

First, the question should probably be asked in "Design&Architecture[^]" forum. Second, my preference would be for something like:

{
"data": [
{ "q": "question",
"a": ["answer1", "answer2",...],
"ok": 1
},
...
]
}

--- Edit: My suggestion allows you to have a variable number of answers to each question. Not sure if it's important or not for your application.

Mircea

Jon McKee

Neither. Unless your married to the idea of encoding the indexing, I'd go with something like this personally based on the info given:

{
data: [
{
"question": "What kind of language is Python?",
"answers:": [
{ "answer": "Compiled", "correct": false },
{ "answer": "Interpreted", "correct": true },
{ "answer": "Parsed", "correct": false },
{ "answer": "Elaborated", "correct": true }
]
},
{
"question": "Who invented Python?",
"answers": [
{ "answer": "Rasmus Lerdorf", "correct": true },
{ "answer": "Guido Van Rossum", "correct": false },
{ "answer": "Bill Gates", "correct": false },
{ "answer": "Linus Torvalds", "correct": false }
]
}
]
}

Handles questions with multiple correct answers, and both questions and answers are easily expandable without breaking backwards compatibility. The biggest issue with your options is that the moment the requirements change (and requirements /always/ change) the format is going to get mangled in a non-backwards-compatible way. Your current format cleverly avoids objects/properties to save space by treating the head and tail of each array uniquely. What happens when requirements dictate that relationship can no longer hold? For example, questions with multiple correct answers. Cleverness is an avoidable dependency, so unless space is that critical of an issue I would tend to go with the more flexible option that gives me less headaches down the road.

PIEBALDconsult

Of the options provided so far, that gets my vote. And I'm too lazy to suggest my own.

Jeremy Falcon

Just to second what @Mircea-Neacsu said... you're effectively asking about the difference between using a hashmap/associative array to index or using a numerical array to index. Is this a NoSQL database we're talking about here? If so, the size difference is negligible. If it's a SQL database, don't use be using a JSON format at all... just don't. Given the size difference will be negligible even with a million records in a NoSQL database, the real issue becomes readability... especially when you using self-referential indices like that. There are exceptions to every rule, but the general rule of thumb is, if you need to have 1-N elements with zero predictability on how they are indexed then use an indexed array. If you can predict it (as in 1-26 letters of the alphabet) and especially when your data relies on it being predictable as in your example then use a hashmap/associative array. As in your case. Also, there's a more appropiate forum for this.

Jeremy Falcon

jmaida

#1

"A little time, a little trouble, your better day" Badfinger