Design an algorithm for generic database analytics

Michael Sterling

Hi all, I would like to build an algorithm that gets as an input a SQL database (for example SQL Server or mysql) and returns the most Important tables/views/fields in it . Now, important is described as follows: 1. Large tables (whereas Large is not not an absolute value , but rather larger in comparison to other tables) 2. Tables with many linked tables attached (for example orderHeaders orderDetails etc.) 3. Fields with large variance (fields that all values within are null or 0 NOT important) 4. Frequently changed fields ( If users updated table records yesterday and a day before it is important) 5. Fields with Meaningful text inside 6. more properties I'm not thinking about right now.... Ideally , I would like to tweak the importance of each property above and get different results How would you approach and design this kind of algorithm ? since it involves data modeling , analytics and AI all together... Any known service or APIs I can use to shorten development time ? Thank you all in advance Michael

Lost User

Michael Sterling wrote:

1. Large tables (whereas Large is not not an absolute value , but rather larger in comparison to other tables)

Like the list of country-codes, with their names? :D

Michael Sterling wrote:

2. Tables with many linked tables attached (for example orderHeaders orderDetails etc.)

Like the list of country-codes?

Michael Sterling wrote:

3. Fields with large variance (fields that all values within are null or 0 NOT important)

Tables with lots of null values are badly designed.

Michael Sterling wrote:

4. Frequently changed fields ( If users updated table records yesterday and a day before it is important)

Do you keep such information?

Michael Sterling wrote:

5. Fields with Meaningful text inside

I would love that; often you see a dot (.) for a required field in the db.

Michael Sterling wrote:

6. more properties I'm not thinking about right now....

Then they should not be on the list. If I order a car, I quote what I need, without adding the text that there is more coming that I don't know yet. Focus on what you do know, and you can extend it later.

Michael Sterling wrote:

How would you approach and design this kind of algorithm ? since it involves data modeling , analytics and AI all together...

I'd not use an AI, just to be able to say there is AI in there. Start with finding a way to get the amount of daily traffic from a record, and work from there :thumbsup:

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

Michael Sterling

I thought it was a design & architecture forum .... If you have nothing smart to say better not say at at all

Lost User

Noted, and with pleasure :)

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

Mycroft Holmes

The trouble here is that you are going to be querying the database architecture and that will change between each database type. I would also rank the importance of the tests to reduce the amount of work your system is going to have to do. Assuming SQL Server. 1. You can query the system to get the physical size and or the row count of each table using SMO 2. Query the system views to get the FK count (and possibly the links to the 1 tables) 3 As Eddy said 0 or null fields indicate a badly designed database. This is going to be costly to query against the large tables (if it is the small tables you REALLY have a problem) 4. Again Eddy has it right, do you store/audit the change information. 5. Sounds like 3 all over again - extend the definition of meaningful 6. In your dreams - if you can design something for forward requirements you are better than the rest of us. What you are proposing is a rules engine and they are excellent while the rule count is small, once it grows too large the entire thing becomes unsupportable. There are MANY commercial rules engines out there. I would approach this by demanding the business case for such a tool, what are the benefits and who is going to support, extend and pay for it. When that is not forthcoming I would shelve the entire thing. I can't see how AI would help here (my AI knowledge is zero) but it would be a major exercise for each database you are going to support. I would also break it into 2 major projects, the database querying project that will need to be extended with each one you support and the analytical project that should be generic, accepting data from all the database types.

Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP

Lost User

Just track the i/o. Reads Writes Bytes transferred By "application area", etc. Tables with a "lot of reads" relative to other tables would seem more important. Unless the systems are inefficient; then the numbers are still significant, but in the other direction. So, any "recommendations" would "depend". Not much chance of something smart without a lot of custom (client) code.

The Master said, 'Am I indeed possessed of knowledge? I am not knowing. But if a mean person, who appears quite empty-like, ask anything of me, I set it forth from one end to the other, and exhaust it.' ― Confucian Analects

Michael Sterling

Thanks for your answer . I like your thought process BTW, what commercial rules engines are you referring to ?

Mycroft Holmes

rules engines - Google Search[^] There is not a single one I could recommend, including the ones I have written.

Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP