How to store data to be able to report on it.
-
I'll try to keep the example simple. I am wondering if we should store data differently so that reporting might be easier. For example, let's say we log when a user logs in and out. A normal sql table might look something like this:
user_id
action
log_date
1
Login
2021-01-01 8:00 AM
1
Logout
2021-01-01 5:00 PM
1
Login
2021-01-02 8:00 AM
1
Logout
2021-01-02 10:00 AM
1
Login
2021-01-02 12:00 PM
1
Logout
2021-01-01 5:00 PM
So on and so forth. Let's say the business needs a report of how long each user is logged in each day. I'm pretty sure I could figure out some sql that uses row_number and partitioning and subtract previous value to end up with the difference between each login and logout event and sum for each day. (Ignore night shifts that might start in the evening and run over to the next day.) However, we don't want to be writing reports but using an ad-hoc report engine, such as PowerBi. 1) Can a reporting engine take the table of data and allow the user to run a report that shows how long each user is logged in each day? 2) As developers, should we store the data differently to make those types of reports easier? Maybe have a separate summary table? I know PowerBi pretty well but I don't know how to do a report of how long each user is logged in based on just this table of data. Should we as developers store the data differently, or is it up to the report (BI) side to manipulate the raw data? Thoughts and opinions?
-
I'll try to keep the example simple. I am wondering if we should store data differently so that reporting might be easier. For example, let's say we log when a user logs in and out. A normal sql table might look something like this:
user_id
action
log_date
1
Login
2021-01-01 8:00 AM
1
Logout
2021-01-01 5:00 PM
1
Login
2021-01-02 8:00 AM
1
Logout
2021-01-02 10:00 AM
1
Login
2021-01-02 12:00 PM
1
Logout
2021-01-01 5:00 PM
So on and so forth. Let's say the business needs a report of how long each user is logged in each day. I'm pretty sure I could figure out some sql that uses row_number and partitioning and subtract previous value to end up with the difference between each login and logout event and sum for each day. (Ignore night shifts that might start in the evening and run over to the next day.) However, we don't want to be writing reports but using an ad-hoc report engine, such as PowerBi. 1) Can a reporting engine take the table of data and allow the user to run a report that shows how long each user is logged in each day? 2) As developers, should we store the data differently to make those types of reports easier? Maybe have a separate summary table? I know PowerBi pretty well but I don't know how to do a report of how long each user is logged in based on just this table of data. Should we as developers store the data differently, or is it up to the report (BI) side to manipulate the raw data? Thoughts and opinions?
It's the difference between an "operational" system versus an "informational" system. Operational / transaction systems are optimized for day-to-day usage. An informational system (e.g. a "data warehouse" / DW) is designed specifically for info retrieval and could include redundant data to facilitate querying. The data in the info system is extracted from the operational systems. The operational system may span a year, more or less. The DW could span decades.
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food
-
It's the difference between an "operational" system versus an "informational" system. Operational / transaction systems are optimized for day-to-day usage. An informational system (e.g. a "data warehouse" / DW) is designed specifically for info retrieval and could include redundant data to facilitate querying. The data in the info system is extracted from the operational systems. The operational system may span a year, more or less. The DW could span decades.
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food
-
If I understand you correctly, keep logging it the way we are but we may need a separate system (data warehouse) to make reporting easier. Thanks for the feedback.
Correct. You may also not want "ad hoc users" hammering the operational system with inefficient queries that impact overall performance of the day-to-day.
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food
-
If I understand you correctly, keep logging it the way we are but we may need a separate system (data warehouse) to make reporting easier. Thanks for the feedback.
And now it becomes a design issue, do you store the DW data in a denormalised structure (this is standard practice for a DW) do you then write 2 distinct reports targeting each system. How often do you transfer data to the DW? How often do you purge data from the production system?
Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP