Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
CODE PROJECT For Those Who Code
  • Home
  • Articles
  • FAQ
Community
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Question about simple pattern recognition (hopefully the solution is simple too...)

Question about simple pattern recognition (hopefully the solution is simple too...)

Scheduled Pinned Locked Moved Database
regexdatabasealgorithmstutorialquestion
7 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    James Shao
    wrote on last edited by
    #1

    Hi guys, I know strictly speaking this is not a database question, but since this has to do with massive data manipulation, and that from past experiences I know there are some really intelligent and creative people here, hopefully bouncing some ideas here could shed some light on a possible solution.   ;P Conceptually, I need to compare the pattern of object A, B, C, D, E, F.   Each object has 5 attributes (a, b, c, d, e), the value of which can range from 0 to 9.   Object     A     B     C     D     E     F a     6     9     1     9     9     3 b     8     3     5     1     1     3 c     7     0     8     3     5     8 d     7     5     6     5     5     2 e     8     9     2     2     8     0 The goal is to find out which objects have the highest match of attribute values.   For example, in the above table, D & E have 3 matches, B & D have 2 matches, C & D have 1 match...etc.   Of course a simple loop comparing each object's pattern to another would suffice if there's only 6 objects and 5 attributes like the above.   But my eventual goal is to categorize some 10000 objects with 20 attributes.   :rolleyes: For some reason I thought there must be an elegant solution to this, possibly involving some ingenious/creative data sorting using arrays.   But again I've been in programming long enough to know simple concepts sometimes require unbelievably complex programmming. I'd greatly appreciate any suggestions or ideas.   Many thanks again.   :)

    M I 2 Replies Last reply
    0
    • J James Shao

      Hi guys, I know strictly speaking this is not a database question, but since this has to do with massive data manipulation, and that from past experiences I know there are some really intelligent and creative people here, hopefully bouncing some ideas here could shed some light on a possible solution.   ;P Conceptually, I need to compare the pattern of object A, B, C, D, E, F.   Each object has 5 attributes (a, b, c, d, e), the value of which can range from 0 to 9.   Object     A     B     C     D     E     F a     6     9     1     9     9     3 b     8     3     5     1     1     3 c     7     0     8     3     5     8 d     7     5     6     5     5     2 e     8     9     2     2     8     0 The goal is to find out which objects have the highest match of attribute values.   For example, in the above table, D & E have 3 matches, B & D have 2 matches, C & D have 1 match...etc.   Of course a simple loop comparing each object's pattern to another would suffice if there's only 6 objects and 5 attributes like the above.   But my eventual goal is to categorize some 10000 objects with 20 attributes.   :rolleyes: For some reason I thought there must be an elegant solution to this, possibly involving some ingenious/creative data sorting using arrays.   But again I've been in programming long enough to know simple concepts sometimes require unbelievably complex programmming. I'd greatly appreciate any suggestions or ideas.   Many thanks again.   :)

      M Offline
      M Offline
      Mycroft Holmes
      wrote on last edited by
      #2

      E has either 2 or 4 matches, otherwise I have missed your logic. Are you counting each col that matches eg A has 1 match with 3 cols, is it one or 3. Do the values of the match impact on your requirement (3x9 is worth more than 3x5) or (4x2 < 2x5) I can think of a few way, all simple and reiterative so I'll not waste your time.

      Never underestimate the power of human stupidity RAH

      L J 2 Replies Last reply
      0
      • M Mycroft Holmes

        E has either 2 or 4 matches, otherwise I have missed your logic. Are you counting each col that matches eg A has 1 match with 3 cols, is it one or 3. Do the values of the match impact on your requirement (3x9 is worth more than 3x5) or (4x2 < 2x5) I can think of a few way, all simple and reiterative so I'll not waste your time.

        Never underestimate the power of human stupidity RAH

        L Offline
        L Offline
        Luc Pattyn
        wrote on last edited by
        #3

        seems to me he is comparing each possible pair of two columns, so for N columns, that's N*(N-1)/2 column comparions, where identical field values constitute a match. And he is contemplating N=1000 :omg: and it looks like a huge job, possibly not a good approach to reach whatever the higher goal might be. :)

        Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]


        I only read code that is properly formatted, adding PRE tags is the easiest way to obtain that.


        1 Reply Last reply
        0
        • J James Shao

          Hi guys, I know strictly speaking this is not a database question, but since this has to do with massive data manipulation, and that from past experiences I know there are some really intelligent and creative people here, hopefully bouncing some ideas here could shed some light on a possible solution.   ;P Conceptually, I need to compare the pattern of object A, B, C, D, E, F.   Each object has 5 attributes (a, b, c, d, e), the value of which can range from 0 to 9.   Object     A     B     C     D     E     F a     6     9     1     9     9     3 b     8     3     5     1     1     3 c     7     0     8     3     5     8 d     7     5     6     5     5     2 e     8     9     2     2     8     0 The goal is to find out which objects have the highest match of attribute values.   For example, in the above table, D & E have 3 matches, B & D have 2 matches, C & D have 1 match...etc.   Of course a simple loop comparing each object's pattern to another would suffice if there's only 6 objects and 5 attributes like the above.   But my eventual goal is to categorize some 10000 objects with 20 attributes.   :rolleyes: For some reason I thought there must be an elegant solution to this, possibly involving some ingenious/creative data sorting using arrays.   But again I've been in programming long enough to know simple concepts sometimes require unbelievably complex programmming. I'd greatly appreciate any suggestions or ideas.   Many thanks again.   :)

          I Offline
          I Offline
          i j russell
          wrote on last edited by
          #4

          You could do what you want using Sql.

          USE tempdb
          GO

          CREATE TABLE #Data
          (
          ObjectId INT NOT NULL,
          AttributeId TINYINT NOT NULL,
          Value TINYINT NOT NULL
          )
          GO

          --Object A B C D E F
          --a 6 9 1 9 9 3
          --b 8 3 5 1 1 3
          --c 7 0 8 3 5 8
          --d 7 5 6 5 5 2
          --e 8 9 2 2 8 0

          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (1, 1, 6);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (1, 2, 8);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (1, 3, 7);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (1, 4, 7);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (1, 5, 8);

          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (2, 1, 9);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (2, 2, 3);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (2, 3, 0);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (2, 4, 5);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (2, 5, 9);

          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (3, 1, 1);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (3, 2, 5);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (3, 3, 8);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (3, 4, 6);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (3, 5, 2);

          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (4, 1, 9);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (4, 2, 1);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (4, 3, 3);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (4, 4, 5);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (4, 5, 2);

          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (5, 1, 9);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (5, 2, 1);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (5, 3, 5);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (5, 4, 5);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (5, 5, 8);

          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (6, 1, 3);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (6, 2, 3);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (6, 3, 8);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (6, 4, 2);
          INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (6, 5, 0);
          GO

          SELECT SourceId,
          MatchId,
          COUNT(*) AS TotalMatches
          FROM
          (
          SELECT

          J 1 Reply Last reply
          0
          • M Mycroft Holmes

            E has either 2 or 4 matches, otherwise I have missed your logic. Are you counting each col that matches eg A has 1 match with 3 cols, is it one or 3. Do the values of the match impact on your requirement (3x9 is worth more than 3x5) or (4x2 < 2x5) I can think of a few way, all simple and reiterative so I'll not waste your time.

            Never underestimate the power of human stupidity RAH

            J Offline
            J Offline
            James Shao
            wrote on last edited by
            #5

            Thank you Holmes for the reply.   :) In essence I am trying to compare one column's pattern against each other, so if column A and column B have three attriutes that are the same, then that's 3 matches for that pair. Luc is right in saying there'll be n(n+1)/2 comparisons, given n columns.   Given roughly 10000 columns, that's about roughly 50 million comparisons.   However, that is assuming that during each comparison loop you can only compare 1 versus 1 at a time.   Now I am trying to figure out whether there's an ingenious way to solve this problem efficiently (1 hour of process time is okay by me). Ultimately, the purpose of this exercise is to write a code that can categorize various random patterns into groups.   So for example, after this is done, you can observe and say, "Group 1 has higher attribute a, lower attribute b, lower attribute c", and "Group 2 has below average attrite a, but very high attribute d"...etc.   :-D

            1 Reply Last reply
            0
            • I i j russell

              You could do what you want using Sql.

              USE tempdb
              GO

              CREATE TABLE #Data
              (
              ObjectId INT NOT NULL,
              AttributeId TINYINT NOT NULL,
              Value TINYINT NOT NULL
              )
              GO

              --Object A B C D E F
              --a 6 9 1 9 9 3
              --b 8 3 5 1 1 3
              --c 7 0 8 3 5 8
              --d 7 5 6 5 5 2
              --e 8 9 2 2 8 0

              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (1, 1, 6);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (1, 2, 8);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (1, 3, 7);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (1, 4, 7);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (1, 5, 8);

              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (2, 1, 9);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (2, 2, 3);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (2, 3, 0);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (2, 4, 5);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (2, 5, 9);

              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (3, 1, 1);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (3, 2, 5);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (3, 3, 8);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (3, 4, 6);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (3, 5, 2);

              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (4, 1, 9);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (4, 2, 1);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (4, 3, 3);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (4, 4, 5);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (4, 5, 2);

              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (5, 1, 9);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (5, 2, 1);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (5, 3, 5);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (5, 4, 5);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (5, 5, 8);

              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (6, 1, 3);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (6, 2, 3);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (6, 3, 8);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (6, 4, 2);
              INSERT INTO #Data (ObjectId, AttributeId, Value) VALUES (6, 5, 0);
              GO

              SELECT SourceId,
              MatchId,
              COUNT(*) AS TotalMatches
              FROM
              (
              SELECT

              J Offline
              J Offline
              James Shao
              wrote on last edited by
              #6

              Wow it really worked!   Thank you so much for the help Russell.   :-D This result looks awesome: Object   Object   #ofMatches 4     5     3 2     4     2 2     5     2 3     4     1 1     5     1 2     6     1 3     6     1 Just curious, did it take you a long time to figure out how to do this?

              I 1 Reply Last reply
              0
              • J James Shao

                Wow it really worked!   Thank you so much for the help Russell.   :-D This result looks awesome: Object   Object   #ofMatches 4     5     3 2     4     2 2     5     2 3     4     1 1     5     1 2     6     1 3     6     1 Just curious, did it take you a long time to figure out how to do this?

                I Offline
                I Offline
                i j russell
                wrote on last edited by
                #7

                You're welcome. It took me more time to type in the sample code than it did to work out how to to it and run it. :-\ The Entity-Attribute-Value (EAV) pattern is one that is well known in database design. Many people have tried to use it instead of the standard normalized approach. It is not generally something that I would recommend but it suits your requirements perfectly. One thing to remember is to have a Clustered Index on the ObjectId and AttributeId to prevent table scans. This isn't really an issue with small data sets but will become an issue when you have more objects and attributes. Ian

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • World
                • Users
                • Groups