Searching for the right database

Kogs 79

Hi there, as the title already say, I'm searching for the right database for my use case which should work with the following data: - 2 key/value "tables" One which holds a string as key and a normal number as value, the second table should hold also a string as key and a string as value. The first table should hold billions of string/number values, the second one should hold millions of string/string values. So there should be stored a huge amount of data. The only operations I need to do are the following - constantly add new entries in both tables - before adding a new entry, check if the key (string) is already stored - search for entries which share the same key in both tables. In a relational DB I would execute this statement (select * from tab1, tab2, where tab1.key = tab2.key) -> This search should be as fast as possible. I'm experienced in any kind of relational DB like oracle, mssql, mysql, postgresql.... As key/value database I only used redis so far. I think my use case is not good for relational databases. Some NoSQL databases meight be a better choice. Redis is not good as it is an in memory DB with data size limits of the physically memory. As I'm using lot of data I need something else. Which database would you recommend in my case? I'm excited to hear your suggestions. Thank you for your help and have a nice day, Kogs

Lost User

How about SQLite.

Lost User

Quote:

I think my use case is not good for relational databases

Why you are thinking a relational db is wrong here? For me it looks like a rdb fits very good. For example:

Quote:

before adding a new entry, check if the key (string) is already stored

For this you can define the key as Primary or at least Unique. So, before insert you do not Need to check whether it is allready in. Simply insert, the db will tell you then, whether it was allready in. The Advantages: No multiply checks for unique values(one from you explicit, one from DB while checking constraints) and last but not least, no race condition. [Edit] One Thing more, MS SQL with its "Clusterd Index" fits Performance whise perfect for key/value pair. There is only one Thing: "billions of string/number values" looks like you can not go for the free Version because of restricted db size.

Bernhard Hiller

Some points to consider: - are your keys case-sensitive? In MS SQL, string comparison is normally not case-sensitive, while with postgres or Oracle it is. - what is the relation ship between the two tables? Do I understand you correctly that there are keys which exist in Table1 only, and other keys which exist in Table2 only? - do you need some kind of reporting? I.e. how many different keys can be found for a value for keys exiting in both or only one table. Complex aggregation queries work fastest in MS SQL or Oracle, while mysql copes with simple aggregations only (but with two tables, that should still be ok). - What would you do if a key to be added already exists? Update the record or throw an exception?

Kogs 79

Thanks for your reply. - the keys are all in uppercase in both tables, so a case sensitive match works perfect. - The first table is the leading data store, the second table is more or less a lookup table. I want to check which entries in the lookup table exists also in the data store table. - I don't need to search for any values, I need to search only for keys. - If a key in the first table already exists, I need to update the value (increase the value) - If a key in the second table already exists, nothing happens, just continue with the next one. But inserting doublicated entries in the secend table are extremly unlikely, so there is no need to check them. It's no problem if for some reason there are really doublicates, to add them twice. But if I use a rdbms and define the key as PK, this check is done automatically. I will give it a try with a rdbms (not MS SQL because I want it to be available also on Linux systems). But I'm wondering if a simple NoSQL DB (simple key/value) store meight not be faster in this case. I don't have much experience with NoSQL dbs, but I think they exist for a reason. And I want a really lightning fast solution ;-) But if you think a normal sql db can be as fast (or faster) than I give it a try.

Kogs 79

Thanks for your suggestion. I did some research and it looks that SQLite can handle big data very well. So this could work for me. Nevertheless I think simple key/values store databases meight be faster. But I can also be wrong as I don't have much experience with NoSQL DBs.

Kogs 79

Thanks for your answer. I just thought that a key/values store meight be a better solution. They are usually designed to be fast with key/values... But I'm not very experienced with NoSQL DBs, so I meight be wrong with this. If I use a relational db I know how to use PK. For the first table I still need to read it first, because if the key already exists, I also need to update the value of this entry. For the second table it would work just to let the DB check the constraint if the entry already exists or not. MS SQL is not my first choice, because I also want it to work on Linux systems as well.

Jorgen Andersson

Your first point isn't a dealbreaker, it's easy enough to change collation in Sql Server or NLS-settings on an Oracle db.

Wrong is evil and must be defeated. - Jeff Ello