Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. On Strategies to Spell Check the Structure of a Database.

On Strategies to Spell Check the Structure of a Database.

Scheduled Pinned Locked Moved Database
databasehelpquestionlounge
13 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • B Brady Kelly

    This week I very quickly put together a simply database of some fifty tables and about three hundred columns all told, with every table having at least one foreign key. Then I noticed some spelling mistakes and or typos in some table and column names, and this inspired me to develop a tool that can perform a spelling check on all identifiers used in all database objects. This is no mean feat. My rought draft strategy is to 1. Build a list of all identifiers in the database. To start with, I will limit this to table, column, view, and key and index names. 2. Use patterns and heuristics to split identifiers into word parts and ignore parts such as 'FK_' etc. 3. Check spellings of split identifiers, and those not split, and mark possible errors. 4. Where an error is e.g. found and corrected in a column name, correct all other occurrences of that column name in the DB. I know none of these is simple, but I think this could make an interesting hobby project and an article or two, if I start simple and incrementally make the app more intelligent. What warnings, criticisms, suggestions, or general comments do you have?

    S Offline
    S Offline
    Simon_Whale
    wrote on last edited by
    #3

    I would be very happy to read such an article.

    Lobster Thermidor aux crevettes with a Mornay sauce, served in a Provençale manner with shallots and aubergines, garnished with truffle pate, brandy and a fried egg on top and Spam - Monty Python Spam Sketch

    1 Reply Last reply
    0
    • B Brady Kelly

      This week I very quickly put together a simply database of some fifty tables and about three hundred columns all told, with every table having at least one foreign key. Then I noticed some spelling mistakes and or typos in some table and column names, and this inspired me to develop a tool that can perform a spelling check on all identifiers used in all database objects. This is no mean feat. My rought draft strategy is to 1. Build a list of all identifiers in the database. To start with, I will limit this to table, column, view, and key and index names. 2. Use patterns and heuristics to split identifiers into word parts and ignore parts such as 'FK_' etc. 3. Check spellings of split identifiers, and those not split, and mark possible errors. 4. Where an error is e.g. found and corrected in a column name, correct all other occurrences of that column name in the DB. I know none of these is simple, but I think this could make an interesting hobby project and an article or two, if I start simple and incrementally make the app more intelligent. What warnings, criticisms, suggestions, or general comments do you have?

      M Offline
      M Offline
      Mycroft Holmes
      wrote on last edited by
      #4

      Are you going to look into only key fields or all field names, for us dyslexics who can spell ModifiedBy in multiple ways. All fields would actually be simpler, build a distinct from sysobjects, then do some matching and soundex comparisons might be just as useful. It is going to totally fai on some naming conventions but for us who use sensible naming it could be good.

      Never underestimate the power of human stupidity RAH

      B 1 Reply Last reply
      0
      • M Mycroft Holmes

        Are you going to look into only key fields or all field names, for us dyslexics who can spell ModifiedBy in multiple ways. All fields would actually be simpler, build a distinct from sysobjects, then do some matching and soundex comparisons might be just as useful. It is going to totally fai on some naming conventions but for us who use sensible naming it could be good.

        Never underestimate the power of human stupidity RAH

        B Offline
        B Offline
        Brady Kelly
        wrote on last edited by
        #5

        I'm going to do all columns. My naming conventions nearlly always use full names, e.g.

        SecondaryEducation ExamNumber
        SecondaryEducation HighestLevelId
        SecondaryEducation HighestLevelOther
        SecondaryEducation SchoolName
        SecondaryEducation EnglishProficiencyLevelId
        SecondaryEducation EnglishProficiencyInstitution
        SecondaryEducationLevel Id
        SecondaryEducationLevel Description

        I'll be using a set of algorithms to split names in various ways, hopefully into separate English words, as my example above would do quite nicely. Then I'll have a configurable set of standard non- or semi-English tokens that can be checked for deviation from their definitions, and good old ignore lists.

        M P 2 Replies Last reply
        0
        • B Brady Kelly

          I'm going to do all columns. My naming conventions nearlly always use full names, e.g.

          SecondaryEducation ExamNumber
          SecondaryEducation HighestLevelId
          SecondaryEducation HighestLevelOther
          SecondaryEducation SchoolName
          SecondaryEducation EnglishProficiencyLevelId
          SecondaryEducation EnglishProficiencyInstitution
          SecondaryEducationLevel Id
          SecondaryEducationLevel Description

          I'll be using a set of algorithms to split names in various ways, hopefully into separate English words, as my example above would do quite nicely. Then I'll have a configurable set of standard non- or semi-English tokens that can be checked for deviation from their definitions, and good old ignore lists.

          M Offline
          M Offline
          Mycroft Holmes
          wrote on last edited by
          #6

          I have some vb code around somewhere that split words fairly reliably, used to make DGV headers readable before MS built it into their DGV.

          Never underestimate the power of human stupidity RAH

          B 1 Reply Last reply
          0
          • M Mycroft Holmes

            I have some vb code around somewhere that split words fairly reliably, used to make DGV headers readable before MS built it into their DGV.

            Never underestimate the power of human stupidity RAH

            B Offline
            B Offline
            Brady Kelly
            wrote on last edited by
            #7

            Maybe you should article it. There, how's that for a really dirty verbing? :suss:

            M 1 Reply Last reply
            0
            • B Brady Kelly

              Maybe you should article it. There, how's that for a really dirty verbing? :suss:

              M Offline
              M Offline
              Mycroft Holmes
              wrote on last edited by
              #8

              That entire phrase belongs in the nasty buzzword forum. It would barely make a tip IIRC. I'll try and dig it up if you feel you have a need.

              Never underestimate the power of human stupidity RAH

              B 1 Reply Last reply
              0
              • M Mycroft Holmes

                That entire phrase belongs in the nasty buzzword forum. It would barely make a tip IIRC. I'll try and dig it up if you feel you have a need.

                Never underestimate the power of human stupidity RAH

                B Offline
                B Offline
                Brady Kelly
                wrote on last edited by
                #9

                I've found quite a list of splitting algorithms, so no need for you to go digging for it now, but thanks anyway.

                1 Reply Last reply
                0
                • B Brady Kelly

                  I'm going to do all columns. My naming conventions nearlly always use full names, e.g.

                  SecondaryEducation ExamNumber
                  SecondaryEducation HighestLevelId
                  SecondaryEducation HighestLevelOther
                  SecondaryEducation SchoolName
                  SecondaryEducation EnglishProficiencyLevelId
                  SecondaryEducation EnglishProficiencyInstitution
                  SecondaryEducationLevel Id
                  SecondaryEducationLevel Description

                  I'll be using a set of algorithms to split names in various ways, hopefully into separate English words, as my example above would do quite nicely. Then I'll have a configurable set of standard non- or semi-English tokens that can be checked for deviation from their definitions, and good old ignore lists.

                  P Offline
                  P Offline
                  PIEBALDconsult
                  wrote on last edited by
                  #10

                  Brady Kelly wrote:

                  into separate English words

                  I would check for consistency first -- if a token is repeated many times, then it is probably what you meant and a token with a small Levenshtein Distance from it is probably misspelled. On one project I decided that (by golly) I was going to spell "threshold" with three Hs (threshhold), as it's pronounced! I later changed it back, but before doing so I'd want the dozen "threshhold"s to outweigh an outlying "threshold". Likewise, an incorrectly named item could be a valid word.

                  B 1 Reply Last reply
                  0
                  • P PIEBALDconsult

                    Brady Kelly wrote:

                    into separate English words

                    I would check for consistency first -- if a token is repeated many times, then it is probably what you meant and a token with a small Levenshtein Distance from it is probably misspelled. On one project I decided that (by golly) I was going to spell "threshold" with three Hs (threshhold), as it's pronounced! I later changed it back, but before doing so I'd want the dozen "threshhold"s to outweigh an outlying "threshold". Likewise, an incorrectly named item could be a valid word.

                    B Offline
                    B Offline
                    Brady Kelly
                    wrote on last edited by
                    #11

                    PIEBALDconsult wrote:

                    I would check for consistency first -- if a token is repeated many times, then it is probably what you meant and a token with a small Levenshtein Distance from it is probably misspelled.

                    True, but e.g. using the designer to add foreign keys, where a misspelled column or table name is then used in the key name and DDL.

                    PIEBALDconsult wrote:

                    Likewise, an incorrectly named item could be a valid word.

                    I plan to present a list of potential misspellings, with options to add to dictionary, ignore, propagate correction, etc. eventually.

                    P 1 Reply Last reply
                    0
                    • B Brady Kelly

                      PIEBALDconsult wrote:

                      I would check for consistency first -- if a token is repeated many times, then it is probably what you meant and a token with a small Levenshtein Distance from it is probably misspelled.

                      True, but e.g. using the designer to add foreign keys, where a misspelled column or table name is then used in the key name and DDL.

                      PIEBALDconsult wrote:

                      Likewise, an incorrectly named item could be a valid word.

                      I plan to present a list of potential misspellings, with options to add to dictionary, ignore, propagate correction, etc. eventually.

                      P Offline
                      P Offline
                      PIEBALDconsult
                      wrote on last edited by
                      #12

                      Aside: I just found a table named ReportGalley rather than ReportGallery ! :laugh: (The team lead was in the Navy, maybe this is something he cooked up, but it certainly is a mess.)

                      B 1 Reply Last reply
                      0
                      • P PIEBALDconsult

                        Aside: I just found a table named ReportGalley rather than ReportGallery ! :laugh: (The team lead was in the Navy, maybe this is something he cooked up, but it certainly is a mess.)

                        B Offline
                        B Offline
                        Brady Kelly
                        wrote on last edited by
                        #13

                        Groan! :laugh:

                        1 Reply Last reply
                        0
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Don't have an account? Register

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • World
                        • Users
                        • Groups