Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Visual Basic
  4. Algorithm for matching database records?

Algorithm for matching database records?

Scheduled Pinned Locked Moved Visual Basic
csharpdatabasealgorithmshelpquestion
3 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • W Offline
    W Offline
    Wombaticus
    wrote on last edited by
    #1

    Apologoes if this the wrong forum – could be in “Database”, or even C# - it’s a bit of a cross-over... The scenario: I have a database table holding records of items that are flagged according to various key terms held in another table – eg: keywords table: ID – integer sKeyword – string There are about a dozen of these, which won’t change, with ID’s from 1 to 12. items table: ID – integer iKeywords – Double (other fields) The iKeywords number is calculated simply as the sum of 2 ^ keywords.ID for all relevant keywords. A record may be associated with multiple keywords. Thus it is easy to select records matching keywords criteria, and updating records is similarly easy. The problem: Give the iKeyword value of a particular item, find other items which are matched with some or all of the keywords of this record, ordered by the number of matching keywords. The solution...? Apart from an obvious brute-force approach, I am sure there must be an elegant algorithm (c# or vb .NET preferably) that can do it.... even better a really smart SQL statement, but that’s probably hoping for too much. Or maybe I should be approaching the database structure differently?

    Richard DeemingR 1 Reply Last reply
    0
    • W Wombaticus

      Apologoes if this the wrong forum – could be in “Database”, or even C# - it’s a bit of a cross-over... The scenario: I have a database table holding records of items that are flagged according to various key terms held in another table – eg: keywords table: ID – integer sKeyword – string There are about a dozen of these, which won’t change, with ID’s from 1 to 12. items table: ID – integer iKeywords – Double (other fields) The iKeywords number is calculated simply as the sum of 2 ^ keywords.ID for all relevant keywords. A record may be associated with multiple keywords. Thus it is easy to select records matching keywords criteria, and updating records is similarly easy. The problem: Give the iKeyword value of a particular item, find other items which are matched with some or all of the keywords of this record, ordered by the number of matching keywords. The solution...? Apart from an obvious brute-force approach, I am sure there must be an elegant algorithm (c# or vb .NET preferably) that can do it.... even better a really smart SQL statement, but that’s probably hoping for too much. Or maybe I should be approaching the database structure differently?

      Richard DeemingR Offline
      Richard DeemingR Offline
      Richard Deeming
      wrote on last edited by
      #2

      Why is your iKeywords field is a double if you only have 12 keywords and they're not going to change? A 16-bit integer (smallint) would be sufficient. The design feels wrong; the standard approach would be to have a many-to-many relationship between the keywords and the items:

      Keywords: KeywordID [PK], Keyword
      Items: ItemID [PK], ...
      ItemKeywords: ItemID [PK, FK], KeywordID [PK, FK]

      This would lead to a fairly simple solution:

      SELECT
      B.ItemID,
      Count(1) As MatchedKeywordCount
      FROM
      ItemKeywords As A
      INNER JOIN ItemKeywords As B
      ON A.KeywordID = B.KeywordID
      WHERE
      A.ItemID = @ItemToMatch
      And
      B.ItemID != @ItemToMatch
      GROUP BY
      B.ItemID
      ORDER BY
      MatchedKeywordCount DESC
      ;

      With your current implementation (updated to use an integer type for iKeywords), you can identify matching products using bitwise-AND[^]:

      SELECT
      ...
      FROM
      Items
      WHERE
      (iKeywords & @KeywordsToMatch) != 0
      ;

      Counting the number of matching keywords is slightly more complicated. For example:

      WITH N0 (X) As -- 2 rows
      (
      SELECT 1
      UNION ALL
      SELECT 1
      ),
      N1 (X) As -- 4 rows
      (
      SELECT 1
      FROM N0 A CROSS JOIN N0 B
      ),
      N2 (X) As -- 16 rows
      (
      SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1
      FROM N1 A CROSS JOIN N1 B
      ),
      PowersOfTwo (Value) As
      (
      SELECT Power(2, X)
      FROM N2
      ),
      MatchingItems As
      (
      SELECT
      I.ItemID,
      I.iKeywords & @KeywordsToMatch As MatchedKeywords
      FROM
      Items As I
      WHERE
      (I.iKeywords & @KeywordsToMatch) != 0
      )
      SELECT
      I.ItemID,
      Count(1) As MatchedKeywordCount
      FROM
      MatchingItems As I
      INNER JOIN PowersOfTwo As P
      ON (I.MatchedKeywords & P.Value) != 0
      GROUP BY
      I.ItemID
      ;


      "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

      "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

      W 1 Reply Last reply
      0
      • Richard DeemingR Richard Deeming

        Why is your iKeywords field is a double if you only have 12 keywords and they're not going to change? A 16-bit integer (smallint) would be sufficient. The design feels wrong; the standard approach would be to have a many-to-many relationship between the keywords and the items:

        Keywords: KeywordID [PK], Keyword
        Items: ItemID [PK], ...
        ItemKeywords: ItemID [PK, FK], KeywordID [PK, FK]

        This would lead to a fairly simple solution:

        SELECT
        B.ItemID,
        Count(1) As MatchedKeywordCount
        FROM
        ItemKeywords As A
        INNER JOIN ItemKeywords As B
        ON A.KeywordID = B.KeywordID
        WHERE
        A.ItemID = @ItemToMatch
        And
        B.ItemID != @ItemToMatch
        GROUP BY
        B.ItemID
        ORDER BY
        MatchedKeywordCount DESC
        ;

        With your current implementation (updated to use an integer type for iKeywords), you can identify matching products using bitwise-AND[^]:

        SELECT
        ...
        FROM
        Items
        WHERE
        (iKeywords & @KeywordsToMatch) != 0
        ;

        Counting the number of matching keywords is slightly more complicated. For example:

        WITH N0 (X) As -- 2 rows
        (
        SELECT 1
        UNION ALL
        SELECT 1
        ),
        N1 (X) As -- 4 rows
        (
        SELECT 1
        FROM N0 A CROSS JOIN N0 B
        ),
        N2 (X) As -- 16 rows
        (
        SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1
        FROM N1 A CROSS JOIN N1 B
        ),
        PowersOfTwo (Value) As
        (
        SELECT Power(2, X)
        FROM N2
        ),
        MatchingItems As
        (
        SELECT
        I.ItemID,
        I.iKeywords & @KeywordsToMatch As MatchedKeywords
        FROM
        Items As I
        WHERE
        (I.iKeywords & @KeywordsToMatch) != 0
        )
        SELECT
        I.ItemID,
        Count(1) As MatchedKeywordCount
        FROM
        MatchingItems As I
        INNER JOIN PowersOfTwo As P
        ON (I.MatchedKeywords & P.Value) != 0
        GROUP BY
        I.ItemID
        ;


        "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

        W Offline
        W Offline
        Wombaticus
        wrote on last edited by
        #3

        Many thanks for this - should be enough to get me going! I used a double only because when I began I wasns't sure of the exact keyword list, so played safe in case there were more. I agree the design "looks wrong" as it is clearly limited to only a relatively small number of keywords. I'll have a play with doing it right!

        1 Reply Last reply
        0
        Reply
        • Reply as topic
        Log in to reply
        • Oldest to Newest
        • Newest to Oldest
        • Most Votes


        • Login

        • Don't have an account? Register

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • World
        • Users
        • Groups