Looking for advice on key/value storage options
-
I need to store lots and lots and lots of timeseries data. Each series is keyed by a unique key, and once written it will almost never be updated. I'm literally after a key/value storage system that's persisted. I also don't want to use Azure. No offense, Microsoft, but... So my thoughts were Redis (not the safest), postgres (bit overkill), Cassandra (seems to like writes better than reads and my use case is the opposite), or mongoDB. Did I mention I want this to cost less than a coffee a week. A good coffee, but I'm not paying $50 a month for this. Total data will be < 1TB. Most likely this will be running on a Windows or Linux box against a .NET 5 app.
cheers Chris Maunder
-
I need to store lots and lots and lots of timeseries data. Each series is keyed by a unique key, and once written it will almost never be updated. I'm literally after a key/value storage system that's persisted. I also don't want to use Azure. No offense, Microsoft, but... So my thoughts were Redis (not the safest), postgres (bit overkill), Cassandra (seems to like writes better than reads and my use case is the opposite), or mongoDB. Did I mention I want this to cost less than a coffee a week. A good coffee, but I'm not paying $50 a month for this. Total data will be < 1TB. Most likely this will be running on a Windows or Linux box against a .NET 5 app.
cheers Chris Maunder
Chris Maunder wrote:
and once written it will almost never be updated
I love specs like this - NOT.
Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP
-
I need to store lots and lots and lots of timeseries data. Each series is keyed by a unique key, and once written it will almost never be updated. I'm literally after a key/value storage system that's persisted. I also don't want to use Azure. No offense, Microsoft, but... So my thoughts were Redis (not the safest), postgres (bit overkill), Cassandra (seems to like writes better than reads and my use case is the opposite), or mongoDB. Did I mention I want this to cost less than a coffee a week. A good coffee, but I'm not paying $50 a month for this. Total data will be < 1TB. Most likely this will be running on a Windows or Linux box against a .NET 5 app.
cheers Chris Maunder
Have you considered good old Berkeley DB? [Oracle Berkeley DB](https://www.oracle.com/database/technologies/related/berkeleydb.html) It's not a DBMS, so might not fit your use case.
Keep Calm and Carry On
-
Chris Maunder wrote:
and once written it will almost never be updated
I love specs like this - NOT.
Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP
I can promise you that X will never, ever, ever happen. Except sometimes randomly when I need it to happen.
cheers Chris Maunder
-
Have you considered good old Berkeley DB? [Oracle Berkeley DB](https://www.oracle.com/database/technologies/related/berkeleydb.html) It's not a DBMS, so might not fit your use case.
Keep Calm and Carry On
I heard the name Cassandra was a direct commentary on Oracle (Cassandra being a cursed Oracle, and all that)
cheers Chris Maunder
-
I need to store lots and lots and lots of timeseries data. Each series is keyed by a unique key, and once written it will almost never be updated. I'm literally after a key/value storage system that's persisted. I also don't want to use Azure. No offense, Microsoft, but... So my thoughts were Redis (not the safest), postgres (bit overkill), Cassandra (seems to like writes better than reads and my use case is the opposite), or mongoDB. Did I mention I want this to cost less than a coffee a week. A good coffee, but I'm not paying $50 a month for this. Total data will be < 1TB. Most likely this will be running on a Windows or Linux box against a .NET 5 app.
cheers Chris Maunder
Data and no "record counts" makes it hard to visualize a solution; or what needs to be done with the "time series data" subsequently; or how much of it there is for a given key. I used a database table to index a file system of postal carrier address labels (images). Mostly used as an audit trail. The label key was a GUID.
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food
-
Data and no "record counts" makes it hard to visualize a solution; or what needs to be done with the "time series data" subsequently; or how much of it there is for a given key. I used a database table to index a file system of postal carrier address labels (images). Mostly used as an audit trail. The label key was a GUID.
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food
Imagine you load a file that has 3 sets of timeseries data: attributeA, attributeB and attributeC. Each timseries is of the form
Array
. There will be between 1000 and 15,000 time/value pairs in each series, so maybe 8Kb to 120Kb in each series. I will never query the data in the array. I will only ever return it as a chunk of data (meaning I could compress it into a BLOB for higher storage efficiency at a tradeoff in load speed if I needed to) A classic database such as MySQL or SQL Server seems massive overkill for this.cheers Chris Maunder