What is the right database technology for this simple outlined BI tool use case?
-
Reaching out to the community to pressure test our internal thinking.
We are building a simplified business intelligence platform that will aggregate metrics (i.e. traffic, backlinks) and text list (i.e search keywords, used technologies) from several data providers.
The data will be somewhat loosely structured and may change over time with vendors potentially changing their response formats.
Data volume may be long term 100,000 rows x 25 input vectors.
Data would be updated and read continuously but not at massive concurrent volume.
We'd expect to need to do some ETL transformations on the gathered data from partners along the way to the UI (e.g show trending information over the past five captured data points).
We'd want to archive every single data snapshot (i.e. version it) vs just storing the most current data point.
The persistence technology should be readily available through AWS.
Our assumption is our requirements lend themselves best towards DynamoDB (vs Amazon Neptune or Redshift or Aurora).
Is that fair to assume? Are there any other questions / information I can provide to elicit input from this community?
-
Reaching out to the community to pressure test our internal thinking.
We are building a simplified business intelligence platform that will aggregate metrics (i.e. traffic, backlinks) and text list (i.e search keywords, used technologies) from several data providers.
The data will be somewhat loosely structured and may change over time with vendors potentially changing their response formats.
Data volume may be long term 100,000 rows x 25 input vectors.
Data would be updated and read continuously but not at massive concurrent volume.
We'd expect to need to do some ETL transformations on the gathered data from partners along the way to the UI (e.g show trending information over the past five captured data points).
We'd want to archive every single data snapshot (i.e. version it) vs just storing the most current data point.
The persistence technology should be readily available through AWS.
Our assumption is our requirements lend themselves best towards DynamoDB (vs Amazon Neptune or Redshift or Aurora).
Is that fair to assume? Are there any other questions / information I can provide to elicit input from this community?
Member 14070096 wrote:
Is that fair to assume
No, it is an assumption. Fair would be to evaluate them on their merits, and award points for each merit. My guess is that any NoSQL database would do.
Member 14070096 wrote:
The data will be somewhat loosely structured and may change over time with vendors potentially changing their response formats.
That's wrong; your format should depend on the data that you want to collect, not on the format of various datasources.
Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
-
Member 14070096 wrote:
Is that fair to assume
No, it is an assumption. Fair would be to evaluate them on their merits, and award points for each merit. My guess is that any NoSQL database would do.
Member 14070096 wrote:
The data will be somewhat loosely structured and may change over time with vendors potentially changing their response formats.
That's wrong; your format should depend on the data that you want to collect, not on the format of various datasources.
Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
Why a NoSQL database, I would have thought that a relational DB would serve the purpose better.
Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP
-
Reaching out to the community to pressure test our internal thinking.
We are building a simplified business intelligence platform that will aggregate metrics (i.e. traffic, backlinks) and text list (i.e search keywords, used technologies) from several data providers.
The data will be somewhat loosely structured and may change over time with vendors potentially changing their response formats.
Data volume may be long term 100,000 rows x 25 input vectors.
Data would be updated and read continuously but not at massive concurrent volume.
We'd expect to need to do some ETL transformations on the gathered data from partners along the way to the UI (e.g show trending information over the past five captured data points).
We'd want to archive every single data snapshot (i.e. version it) vs just storing the most current data point.
The persistence technology should be readily available through AWS.
Our assumption is our requirements lend themselves best towards DynamoDB (vs Amazon Neptune or Redshift or Aurora).
Is that fair to assume? Are there any other questions / information I can provide to elicit input from this community?
You will HAVE to have an ETL layer between your various sources and your database (assuming it is a relational DB). You need to get all your sources into a single format and deal with changing source structures which will need recoding the ETL to suit.
Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP
-
Why a NoSQL database, I would have thought that a relational DB would serve the purpose better.
Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP
Mycroft Holmes wrote:
Why a NoSQL database
Good question; his example of Dynamo is, but..
Mycroft Holmes wrote:
I would have thought that a relational DB would serve the purpose better.
..is probably true :thumbsup:
Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.