Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Design and Architecture
  4. Design an algorithm for generic database analytics

Design an algorithm for generic database analytics

Scheduled Pinned Locked Moved Design and Architecture
databasemysqlsql-serverdesignsysadmin
8 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    Michael Sterling
    wrote on last edited by
    #1

    Hi all, I would like to build an algorithm that gets as an input a SQL database (for example SQL Server or mysql) and returns the most Important tables/views/fields in it . Now, important is described as follows: 1. Large tables (whereas Large is not not an absolute value , but rather larger in comparison to other tables) 2. Tables with many linked tables attached (for example orderHeaders orderDetails etc.) 3. Fields with large variance (fields that all values within are null or 0 NOT important) 4. Frequently changed fields ( If users updated table records yesterday and a day before it is important) 5. Fields with Meaningful text inside 6. more properties I'm not thinking about right now.... Ideally , I would like to tweak the importance of each property above and get different results How would you approach and design this kind of algorithm ? since it involves data modeling , analytics and AI all together... Any known service or APIs I can use to shorten development time ? Thank you all in advance Michael

    L M 3 Replies Last reply
    0
    • M Michael Sterling

      Hi all, I would like to build an algorithm that gets as an input a SQL database (for example SQL Server or mysql) and returns the most Important tables/views/fields in it . Now, important is described as follows: 1. Large tables (whereas Large is not not an absolute value , but rather larger in comparison to other tables) 2. Tables with many linked tables attached (for example orderHeaders orderDetails etc.) 3. Fields with large variance (fields that all values within are null or 0 NOT important) 4. Frequently changed fields ( If users updated table records yesterday and a day before it is important) 5. Fields with Meaningful text inside 6. more properties I'm not thinking about right now.... Ideally , I would like to tweak the importance of each property above and get different results How would you approach and design this kind of algorithm ? since it involves data modeling , analytics and AI all together... Any known service or APIs I can use to shorten development time ? Thank you all in advance Michael

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      Michael Sterling wrote:

      1. Large tables (whereas Large is not not an absolute value , but rather larger in comparison to other tables)

      Like the list of country-codes, with their names? :D

      Michael Sterling wrote:

      2. Tables with many linked tables attached (for example orderHeaders orderDetails etc.)

      Like the list of country-codes?

      Michael Sterling wrote:

      3. Fields with large variance (fields that all values within are null or 0 NOT important)

      Tables with lots of null values are badly designed.

      Michael Sterling wrote:

      4. Frequently changed fields ( If users updated table records yesterday and a day before it is important)

      Do you keep such information?

      Michael Sterling wrote:

      5. Fields with Meaningful text inside

      I would love that; often you see a dot (.) for a required field in the db.

      Michael Sterling wrote:

      6. more properties I'm not thinking about right now....

      Then they should not be on the list. If I order a car, I quote what I need, without adding the text that there is more coming that I don't know yet. Focus on what you do know, and you can extend it later.

      Michael Sterling wrote:

      How would you approach and design this kind of algorithm ? since it involves data modeling , analytics and AI all together...

      I'd not use an AI, just to be able to say there is AI in there. Start with finding a way to get the amount of daily traffic from a record, and work from there :thumbsup:

      Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

      M 1 Reply Last reply
      0
      • L Lost User

        Michael Sterling wrote:

        1. Large tables (whereas Large is not not an absolute value , but rather larger in comparison to other tables)

        Like the list of country-codes, with their names? :D

        Michael Sterling wrote:

        2. Tables with many linked tables attached (for example orderHeaders orderDetails etc.)

        Like the list of country-codes?

        Michael Sterling wrote:

        3. Fields with large variance (fields that all values within are null or 0 NOT important)

        Tables with lots of null values are badly designed.

        Michael Sterling wrote:

        4. Frequently changed fields ( If users updated table records yesterday and a day before it is important)

        Do you keep such information?

        Michael Sterling wrote:

        5. Fields with Meaningful text inside

        I would love that; often you see a dot (.) for a required field in the db.

        Michael Sterling wrote:

        6. more properties I'm not thinking about right now....

        Then they should not be on the list. If I order a car, I quote what I need, without adding the text that there is more coming that I don't know yet. Focus on what you do know, and you can extend it later.

        Michael Sterling wrote:

        How would you approach and design this kind of algorithm ? since it involves data modeling , analytics and AI all together...

        I'd not use an AI, just to be able to say there is AI in there. Start with finding a way to get the amount of daily traffic from a record, and work from there :thumbsup:

        Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

        M Offline
        M Offline
        Michael Sterling
        wrote on last edited by
        #3

        I thought it was a design & architecture forum .... If you have nothing smart to say better not say at at all

        L 1 Reply Last reply
        0
        • M Michael Sterling

          I thought it was a design & architecture forum .... If you have nothing smart to say better not say at at all

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          Noted, and with pleasure :)

          Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

          1 Reply Last reply
          0
          • M Michael Sterling

            Hi all, I would like to build an algorithm that gets as an input a SQL database (for example SQL Server or mysql) and returns the most Important tables/views/fields in it . Now, important is described as follows: 1. Large tables (whereas Large is not not an absolute value , but rather larger in comparison to other tables) 2. Tables with many linked tables attached (for example orderHeaders orderDetails etc.) 3. Fields with large variance (fields that all values within are null or 0 NOT important) 4. Frequently changed fields ( If users updated table records yesterday and a day before it is important) 5. Fields with Meaningful text inside 6. more properties I'm not thinking about right now.... Ideally , I would like to tweak the importance of each property above and get different results How would you approach and design this kind of algorithm ? since it involves data modeling , analytics and AI all together... Any known service or APIs I can use to shorten development time ? Thank you all in advance Michael

            M Offline
            M Offline
            Mycroft Holmes
            wrote on last edited by
            #5

            The trouble here is that you are going to be querying the database architecture and that will change between each database type. I would also rank the importance of the tests to reduce the amount of work your system is going to have to do. Assuming SQL Server. 1. You can query the system to get the physical size and or the row count of each table using SMO 2. Query the system views to get the FK count (and possibly the links to the 1 tables) 3 As Eddy said 0 or null fields indicate a badly designed database. This is going to be costly to query against the large tables (if it is the small tables you REALLY have a problem) 4. Again Eddy has it right, do you store/audit the change information. 5. Sounds like 3 all over again - extend the definition of meaningful 6. In your dreams - if you can design something for forward requirements you are better than the rest of us. What you are proposing is a rules engine and they are excellent while the rule count is small, once it grows too large the entire thing becomes unsupportable. There are MANY commercial rules engines out there. I would approach this by demanding the business case for such a tool, what are the benefits and who is going to support, extend and pay for it. When that is not forthcoming I would shelve the entire thing. I can't see how AI would help here (my AI knowledge is zero) but it would be a major exercise for each database you are going to support. I would also break it into 2 major projects, the database querying project that will need to be extended with each one you support and the analytical project that should be generic, accepting data from all the database types.

            Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP

            M 1 Reply Last reply
            0
            • M Michael Sterling

              Hi all, I would like to build an algorithm that gets as an input a SQL database (for example SQL Server or mysql) and returns the most Important tables/views/fields in it . Now, important is described as follows: 1. Large tables (whereas Large is not not an absolute value , but rather larger in comparison to other tables) 2. Tables with many linked tables attached (for example orderHeaders orderDetails etc.) 3. Fields with large variance (fields that all values within are null or 0 NOT important) 4. Frequently changed fields ( If users updated table records yesterday and a day before it is important) 5. Fields with Meaningful text inside 6. more properties I'm not thinking about right now.... Ideally , I would like to tweak the importance of each property above and get different results How would you approach and design this kind of algorithm ? since it involves data modeling , analytics and AI all together... Any known service or APIs I can use to shorten development time ? Thank you all in advance Michael

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              Just track the i/o. Reads Writes Bytes transferred By "application area", etc. Tables with a "lot of reads" relative to other tables would seem more important. Unless the systems are inefficient; then the numbers are still significant, but in the other direction. So, any "recommendations" would "depend". Not much chance of something smart without a lot of custom (client) code.

              The Master said, 'Am I indeed possessed of knowledge? I am not knowing. But if a mean person, who appears quite empty-like, ask anything of me, I set it forth from one end to the other, and exhaust it.' ― Confucian Analects

              1 Reply Last reply
              0
              • M Mycroft Holmes

                The trouble here is that you are going to be querying the database architecture and that will change between each database type. I would also rank the importance of the tests to reduce the amount of work your system is going to have to do. Assuming SQL Server. 1. You can query the system to get the physical size and or the row count of each table using SMO 2. Query the system views to get the FK count (and possibly the links to the 1 tables) 3 As Eddy said 0 or null fields indicate a badly designed database. This is going to be costly to query against the large tables (if it is the small tables you REALLY have a problem) 4. Again Eddy has it right, do you store/audit the change information. 5. Sounds like 3 all over again - extend the definition of meaningful 6. In your dreams - if you can design something for forward requirements you are better than the rest of us. What you are proposing is a rules engine and they are excellent while the rule count is small, once it grows too large the entire thing becomes unsupportable. There are MANY commercial rules engines out there. I would approach this by demanding the business case for such a tool, what are the benefits and who is going to support, extend and pay for it. When that is not forthcoming I would shelve the entire thing. I can't see how AI would help here (my AI knowledge is zero) but it would be a major exercise for each database you are going to support. I would also break it into 2 major projects, the database querying project that will need to be extended with each one you support and the analytical project that should be generic, accepting data from all the database types.

                Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP

                M Offline
                M Offline
                Michael Sterling
                wrote on last edited by
                #7

                Thanks for your answer . I like your thought process BTW, what commercial rules engines are you referring to ?

                M 1 Reply Last reply
                0
                • M Michael Sterling

                  Thanks for your answer . I like your thought process BTW, what commercial rules engines are you referring to ?

                  M Offline
                  M Offline
                  Mycroft Holmes
                  wrote on last edited by
                  #8

                  rules engines - Google Search[^] There is not a single one I could recommend, including the ones I have written.

                  Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups