Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Bless SQL

Bless SQL

Scheduled Pinned Locked Moved The Lounge
databaselinux
38 Posts 13 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • N NeverJustHere

    It's a shame you're targeting MS SQL, it is weak in many areas compared to other SQL offerings. Postgres supports regex natively, as well as having a bunch more capability with string manipulation functions. Is there an option for you to use .Net assemblies in MS SQL, or perhaps the R or Python environments? It seems MS is ignoring extending the base SQL capability, but offering these extensions. Regardless, kudos for the tenacity required to do what your doing. How are you finding the performance of the regex proc's in SQL? Also, remember SQL is designed way back, before syntax highlighting/intellisense and many other modern expectations that would influence its design were around. It's time to create SQLNext, and I've seen the odd concept put forward, but it would be really hard to shift the masses now. Perhaps it'd best be done with a translation from SQLNext to the various SQL variants. But ideally would have to exist closer to the database than the client application and eventually become a native database feature. Unfortunately, would be easiest to place in the client application - I'd hate to get into any DB Driver level stuff. Remember ODBC - yuck.

    H Offline
    H Offline
    honey the codewitch
    wrote on last edited by
    #11

    I intend to target others. I had to start somewhere, and I already had MSSQL, it's fairly popular, I'm familiar with it, and its many limitations, which actually made it a good place to start. After all, I stuck as close to the SQL spec as a could, so I can port it to other databases, and I will be adding some sort of "flavor" switch to the app to specify which SQL to target. It's impossible to target SQL92 for this (i think) or I would.

    Real programmers use butterflies

    1 Reply Last reply
    0
    • N NeverJustHere

      It's a shame you're targeting MS SQL, it is weak in many areas compared to other SQL offerings. Postgres supports regex natively, as well as having a bunch more capability with string manipulation functions. Is there an option for you to use .Net assemblies in MS SQL, or perhaps the R or Python environments? It seems MS is ignoring extending the base SQL capability, but offering these extensions. Regardless, kudos for the tenacity required to do what your doing. How are you finding the performance of the regex proc's in SQL? Also, remember SQL is designed way back, before syntax highlighting/intellisense and many other modern expectations that would influence its design were around. It's time to create SQLNext, and I've seen the odd concept put forward, but it would be really hard to shift the masses now. Perhaps it'd best be done with a translation from SQLNext to the various SQL variants. But ideally would have to exist closer to the database than the client application and eventually become a native database feature. Unfortunately, would be easiest to place in the client application - I'd hate to get into any DB Driver level stuff. Remember ODBC - yuck.

      H Offline
      H Offline
      honey the codewitch
      wrote on last edited by
      #12

      I didn't even respond to the rest of your questions! Whoops, let me try again. I haven't tested the performance specifically. I wasn't impressed with the execution plan for the table version though. The compiled ones should perform "okay" once MSSQL gets done putting them into native code, but it won't be near as fast as outside of a DB. I wouldn't use .NET's engine because it's not available everywhere. With Reggie you write the spec once, and it runs identically across any supported target. That's important for business schemas or even just application behavior consistency, if you're doing like, triple tier validation for example. Also I don't think you can use the CLR in the database without turning it on explicitly, and I could see a lot of people (if not most) leaving it off. The takeaway here is Reggie will probably run where a .NET solution wouldn't, and it won't perform as well in the DB, unless you take the C# code reggie generates and run *that* in the database, which i suppose is doable as well. So for a DB, this is best used for validation during updates that aren't intended to be super fast, full metal writes, though you can probably use this modestly in such a scenario for normalizing snatches of JSON and such. I think I may solve my SQL problems with having structured chunks of sql that can render to multiple targets (like "advance the input position and move along a transition, with capture") I've never heard of SQLNext.

      Real programmers use butterflies

      1 Reply Last reply
      0
      • H honey the codewitch

        There's a very legitimate use case for it when you want to validate denormalized string fields like phone numbers if they're not stored numerically just as an example - in practice databases are messy in the real world often with fields that are denormalized lists of ints, json data, xml, etc. Doing string manipulation in a database function or stored procedure allows you to reject data that comes in in an invalid format before it gets into the database. Particularly these days, it's popular to store denormalized JSON in the database, for better or worse. With Norm and Reggie, you can target parsing code to SQL that will allow you to validate or normalize your string data at the database level. The reason is not for performance, but for security and data integrity. You wouldn't want to run something like this as part of a query. You primarily use it on data update where performance can be less of an issue depending on the scenario.

        Real programmers use butterflies

        P Offline
        P Offline
        PIEBALDconsult
        wrote on last edited by
        #13

        But you should do that up-front, before it hits the database. Or, more likely, when populating a landing/staging table. With what I'm doing now (mostly ETL), we land the data and then go through doing stuff like that, making IP addresses binary, splitting FQDNs into their parts, checking dates, etc. One of the things I challenge you to do in SQL is take big-endian DNS names and reformat them as little-endian FQDNs. Examples: .com.acme.southwest.anvils -- anvils.southwest.acme.com. .com.widgetco.sales -- sales.widgetco.com. This is something I need to do and I use a CLR function to do it. It's more complex than these examples. (This is not a programming question.)

        H B 2 Replies Last reply
        0
        • P PIEBALDconsult

          But you should do that up-front, before it hits the database. Or, more likely, when populating a landing/staging table. With what I'm doing now (mostly ETL), we land the data and then go through doing stuff like that, making IP addresses binary, splitting FQDNs into their parts, checking dates, etc. One of the things I challenge you to do in SQL is take big-endian DNS names and reformat them as little-endian FQDNs. Examples: .com.acme.southwest.anvils -- anvils.southwest.acme.com. .com.widgetco.sales -- sales.widgetco.com. This is something I need to do and I use a CLR function to do it. It's more complex than these examples. (This is not a programming question.)

          H Offline
          H Offline
          honey the codewitch
          wrote on last edited by
          #14

          Of course you "should" - there are a lot of things that people "should" and "shouldn't" do that don't actually happen the way they "should" in the real world. The result is almost always a database that accepts denormalized data for some of its content. Over a long enough timeline / large enough project this is an inevitability. That means in an enterprise sized application of any significant age, you're dealing with denormalized data. Furthermore, in some cases, with an RDBMS it is more efficient to accept denormalized data and normalize it in a stored proc to get results based on that. A good example is passing an array of integer cache ids so you only get particular rows from the database (such as updating a stale list) - there is simply no good (standard SQL) way to pass a small array of ints to the database, so 9 times out of 10, you'll see a procedure like this that takes a string or even a varbinary that has the data encoded it in. This is far more efficient than updating some kind of local session table you use to talk to your routines with (the other way to get a list of rows by ids) With a tool like this you can harden that, and the bottleneck is network, not DB CPU, and not row locks (unless you're using the table based version, which I actually don't recommend) I'm not sure about reformatting your domain names. They're easy enough to parse, but I don't know enough about the allowable formats to discern the logic for it without delving into RFCs.

          Real programmers use butterflies

          1 Reply Last reply
          0
          • H honey the codewitch

            I feel like Structured Query Language was designed by someone who felt like everything should be *work* or it's not worth the effort. The syntax is inexplicable. Not so much designed by committee as designed by a committee on a bender in 'Vegas. It's impossible to remember it without using it every day, and that possibility makes me want to bash my head bloody on the nearest hard surface. Plus just doing simple things you'd be able to expect to do in other languages like type conversion or heaven forbid, string manipulation is like pulling teeth. Now I know there are reasons for *some* of this, but most of it just seems like it was a result of throwing things at a wall to see what sticks.

            Real programmers use butterflies

            Kornfeld Eliyahu PeterK Offline
            Kornfeld Eliyahu PeterK Offline
            Kornfeld Eliyahu Peter
            wrote on last edited by
            #15

            honey the codewitch wrote:

            It's impossible to remember it without using it every day

            I USE SQL every day a week (Ok 5 out of 7)... Still there are things I have to look-up...

            "The only place where Success comes before Work is in the dictionary." Vidal Sassoon, 1928 - 2012

            "It never ceases to amaze me that a spacecraft launched in 1977 can be fixed remotely from Earth." ― Brian Cox

            1 Reply Last reply
            0
            • H honey the codewitch

              My next article will give SQL people fits, I'm sure.

              Real programmers use butterflies

              J Offline
              J Offline
              Jorgen Andersson
              wrote on last edited by
              #16

              Unless it gives you the fits first. Doing advanced string manipulation in SQL-Server is a sure sign of masochism. :laugh:

              Wrong is evil and must be defeated. - Jeff Ello

              H 1 Reply Last reply
              0
              • J Jorgen Andersson

                Unless it gives you the fits first. Doing advanced string manipulation in SQL-Server is a sure sign of masochism. :laugh:

                Wrong is evil and must be defeated. - Jeff Ello

                H Offline
                H Offline
                honey the codewitch
                wrote on last edited by
                #17

                That's why I'm writing a tool to generate SQL for the regex matching. I at least only have to write it once. =)

                Real programmers use butterflies

                J 1 Reply Last reply
                0
                • H honey the codewitch

                  That's why I'm writing a tool to generate SQL for the regex matching. I at least only have to write it once. =)

                  Real programmers use butterflies

                  J Offline
                  J Offline
                  Jorgen Andersson
                  wrote on last edited by
                  #18

                  In many cases you won't need to write it at all. Oracle, Postgres, DB2 and even MySQL have full support for Regex. For the big ones it's basically SQL-Server that's the exception. And for SQL-Server I wrote a CLR-function to take care of that. (REGEXP_INSTR, REGEXP_REPLACE and REGEXP_SUBSTR like on most other implementations) And yes, you can write CLR also for Linux, since Linux work fine with .Net Core.

                  Wrong is evil and must be defeated. - Jeff Ello

                  H 1 Reply Last reply
                  0
                  • H honey the codewitch

                    I feel like Structured Query Language was designed by someone who felt like everything should be *work* or it's not worth the effort. The syntax is inexplicable. Not so much designed by committee as designed by a committee on a bender in 'Vegas. It's impossible to remember it without using it every day, and that possibility makes me want to bash my head bloody on the nearest hard surface. Plus just doing simple things you'd be able to expect to do in other languages like type conversion or heaven forbid, string manipulation is like pulling teeth. Now I know there are reasons for *some* of this, but most of it just seems like it was a result of throwing things at a wall to see what sticks.

                    Real programmers use butterflies

                    M Offline
                    M Offline
                    Member_15329613
                    wrote on last edited by
                    #19

                    honey the codewitch wrote:

                    The syntax is inexplicable

                    :confused:

                    honey the codewitch wrote:

                    It's impossible to remember it without using it every day,

                    As a long time web developer I feel that's true for most all languages. As a web developer I do html, css, javascript, jquery, xml, json, c#, and t-sql. I often have to google for syntax. But SQL is the one I probably look up least often.

                    H 1 Reply Last reply
                    0
                    • P PIEBALDconsult

                      But you should do that up-front, before it hits the database. Or, more likely, when populating a landing/staging table. With what I'm doing now (mostly ETL), we land the data and then go through doing stuff like that, making IP addresses binary, splitting FQDNs into their parts, checking dates, etc. One of the things I challenge you to do in SQL is take big-endian DNS names and reformat them as little-endian FQDNs. Examples: .com.acme.southwest.anvils -- anvils.southwest.acme.com. .com.widgetco.sales -- sales.widgetco.com. This is something I need to do and I use a CLR function to do it. It's more complex than these examples. (This is not a programming question.)

                      B Offline
                      B Offline
                      BryanFazekas
                      wrote on last edited by
                      #20

                      PIEBALDconsult wrote:

                      But you should do that up-front, before it hits the database.

                      True, we should. But in the real world, this doesn't necessarily happen. Like when dealing with a legacy Oracle system written in C++ (appears to be VS97 or VS6), where the DB design defies rational logic. Or any logic. My queries have as many as 20 nested REGEXP_REPLACE statements to make sense out of the mess that some fields present. Ugly as sin, but it gets the job done. I'm counting the days until the replacement system goes online!

                      1 Reply Last reply
                      0
                      • H honey the codewitch

                        There's a very legitimate use case for it when you want to validate denormalized string fields like phone numbers if they're not stored numerically just as an example - in practice databases are messy in the real world often with fields that are denormalized lists of ints, json data, xml, etc. Doing string manipulation in a database function or stored procedure allows you to reject data that comes in in an invalid format before it gets into the database. Particularly these days, it's popular to store denormalized JSON in the database, for better or worse. With Norm and Reggie, you can target parsing code to SQL that will allow you to validate or normalize your string data at the database level. The reason is not for performance, but for security and data integrity. You wouldn't want to run something like this as part of a query. You primarily use it on data update where performance can be less of an issue depending on the scenario.

                        Real programmers use butterflies

                        L Offline
                        L Offline
                        Lost User
                        wrote on last edited by
                        #21

                        honey the codewitch wrote:

                        There's a very legitimate use case for it when you want to validate denormalized string fields like phone numbers if they're not stored numerically just as an example - in practice databases are messy in the real world often with fields that are denormalized lists of ints, json data, xml, etc.

                        In the real world, normalization means something different. I go to BCNF.

                        honey the codewitch wrote:

                        Doing string manipulation in a database function or stored procedure allows you to reject data that comes in in an invalid format before it gets into the database.

                        Means you storing it wrong. There's no format in storage. "Normalization" might be different from your idea. Go look it up :) --edit I didn't see your name; I didn't intend to challenge you. Yet, explain me, if that is normal (ization) to you?

                        Bastard Programmer from Hell :suss: "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

                        H 1 Reply Last reply
                        0
                        • H honey the codewitch

                          There's a very legitimate use case for it when you want to validate denormalized string fields like phone numbers if they're not stored numerically just as an example - in practice databases are messy in the real world often with fields that are denormalized lists of ints, json data, xml, etc. Doing string manipulation in a database function or stored procedure allows you to reject data that comes in in an invalid format before it gets into the database. Particularly these days, it's popular to store denormalized JSON in the database, for better or worse. With Norm and Reggie, you can target parsing code to SQL that will allow you to validate or normalize your string data at the database level. The reason is not for performance, but for security and data integrity. You wouldn't want to run something like this as part of a query. You primarily use it on data update where performance can be less of an issue depending on the scenario.

                          Real programmers use butterflies

                          R Offline
                          R Offline
                          realJSOP
                          wrote on last edited by
                          #22

                          Avoid using user-defined sql functions in a sql query. It absolutly destroys the performance of the query. Just get the data and return it to the calling app, and let the app do the validation/correction (C#/C++ is much better for that than is SQL). If you want to get really fancy, you can perform the validation, and re-save the corrected data so that the next time you do the query, it will return fewer invalid records.

                          ".45 ACP - because shooting twice is just silly" - JSOP, 2010
                          -----
                          You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
                          -----
                          When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

                          H 1 Reply Last reply
                          0
                          • H honey the codewitch

                            I feel like Structured Query Language was designed by someone who felt like everything should be *work* or it's not worth the effort. The syntax is inexplicable. Not so much designed by committee as designed by a committee on a bender in 'Vegas. It's impossible to remember it without using it every day, and that possibility makes me want to bash my head bloody on the nearest hard surface. Plus just doing simple things you'd be able to expect to do in other languages like type conversion or heaven forbid, string manipulation is like pulling teeth. Now I know there are reasons for *some* of this, but most of it just seems like it was a result of throwing things at a wall to see what sticks.

                            Real programmers use butterflies

                            M Offline
                            M Offline
                            Mike Hankey
                            wrote on last edited by
                            #23

                            I'm thinking that the people that designed COBOL had a hand in SQL?

                            The less you need, the more you have. Even a blind squirrel gets a nut...occasionally. JaxCoder.com

                            1 Reply Last reply
                            0
                            • L Lost User

                              honey the codewitch wrote:

                              There's a very legitimate use case for it when you want to validate denormalized string fields like phone numbers if they're not stored numerically just as an example - in practice databases are messy in the real world often with fields that are denormalized lists of ints, json data, xml, etc.

                              In the real world, normalization means something different. I go to BCNF.

                              honey the codewitch wrote:

                              Doing string manipulation in a database function or stored procedure allows you to reject data that comes in in an invalid format before it gets into the database.

                              Means you storing it wrong. There's no format in storage. "Normalization" might be different from your idea. Go look it up :) --edit I didn't see your name; I didn't intend to challenge you. Yet, explain me, if that is normal (ization) to you?

                              Bastard Programmer from Hell :suss: "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

                              H Offline
                              H Offline
                              honey the codewitch
                              wrote on last edited by
                              #24

                              According to a cursory google search: > Normalization or normalisation refers to a process that makes something more normal or regular. That's what I'm referring to. Normal, as in regular, as in consistently structured, as in i can look impose a structure on it. Further googling it sounds like there's a technical use for it that has to do with databases specifically, and how to impose some sort of notion of a "standard" order over the mess that is an RDBMS. In practice I can see why nobody cares. Which is probably also why I don't care about that, as nobody has ever paid me to care about it. If I can star cluster my DB and get it to perform, nobody cares about BCNF, at least they never have in my 20+ years of development.

                              Real programmers use butterflies

                              L 1 Reply Last reply
                              0
                              • R realJSOP

                                Avoid using user-defined sql functions in a sql query. It absolutly destroys the performance of the query. Just get the data and return it to the calling app, and let the app do the validation/correction (C#/C++ is much better for that than is SQL). If you want to get really fancy, you can perform the validation, and re-save the corrected data so that the next time you do the query, it will return fewer invalid records.

                                ".45 ACP - because shooting twice is just silly" - JSOP, 2010
                                -----
                                You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
                                -----
                                When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

                                H Offline
                                H Offline
                                honey the codewitch
                                wrote on last edited by
                                #25

                                I didn't implement them as functions, but I could have. They are stored procedures in order to discourage their use in views and such. This isn't for performance. This is for data integrity. Often those two things are somewhat at odds, meaning there are many situations where you have to rob Peter to pay Paul.

                                Real programmers use butterflies

                                1 Reply Last reply
                                0
                                • M Member_15329613

                                  honey the codewitch wrote:

                                  The syntax is inexplicable

                                  :confused:

                                  honey the codewitch wrote:

                                  It's impossible to remember it without using it every day,

                                  As a long time web developer I feel that's true for most all languages. As a web developer I do html, css, javascript, jquery, xml, json, c#, and t-sql. I often have to google for syntax. But SQL is the one I probably look up least often.

                                  H Offline
                                  H Offline
                                  honey the codewitch
                                  wrote on last edited by
                                  #26

                                  The web is even worse. And it's not true of most languages. Duck typing is for people that like spending all day debugging a typo. Hard types are for closers. Compilers keep people honest.

                                  Real programmers use butterflies

                                  1 Reply Last reply
                                  0
                                  • J Jorgen Andersson

                                    In many cases you won't need to write it at all. Oracle, Postgres, DB2 and even MySQL have full support for Regex. For the big ones it's basically SQL-Server that's the exception. And for SQL-Server I wrote a CLR-function to take care of that. (REGEXP_INSTR, REGEXP_REPLACE and REGEXP_SUBSTR like on most other implementations) And yes, you can write CLR also for Linux, since Linux work fine with .Net Core.

                                    Wrong is evil and must be defeated. - Jeff Ello

                                    H Offline
                                    H Offline
                                    honey the codewitch
                                    wrote on last edited by
                                    #27

                                    But can they tokenize? Most regular expression engines leave that out. Also, is their regex a different flavor for every DB vendor? If so, they may as well not have it at all, because for real world regular expressions translating one form of one to another reliably is a bear. The code is nasty. I've tried. It's honestly easier just to generate matching code that is consistent across platforms.

                                    Real programmers use butterflies

                                    J 1 Reply Last reply
                                    0
                                    • H honey the codewitch

                                      But can they tokenize? Most regular expression engines leave that out. Also, is their regex a different flavor for every DB vendor? If so, they may as well not have it at all, because for real world regular expressions translating one form of one to another reliably is a bear. The code is nasty. I've tried. It's honestly easier just to generate matching code that is consistent across platforms.

                                      Real programmers use butterflies

                                      J Offline
                                      J Offline
                                      Jorgen Andersson
                                      wrote on last edited by
                                      #28

                                      Do I need it?

                                      Wrong is evil and must be defeated. - Jeff Ello

                                      H 1 Reply Last reply
                                      0
                                      • J Jorgen Andersson

                                        Do I need it?

                                        Wrong is evil and must be defeated. - Jeff Ello

                                        H Offline
                                        H Offline
                                        honey the codewitch
                                        wrote on last edited by
                                        #29

                                        It depends on what you are doing. For column validation in most cases you wouldn't need it. However, if you intended to say, store chunks of JSON in your database (a surprisingly common practice these days) you could use a tokenizer (along with a parser generated by Norm targeting SQL) to validate or even normalize the JSON content so that invalid JSON does not get stored in the database but rather is rejected. Typically this is done in the middleware, but such validation should in theory (where possible) take place in each of the "three tiers", which would include the database, for maximum assurance of data integrity.. Another use for it might be to allow you to search long structured text for particular things, like search an HTML document for all <B> tags and yield those as a table, which could be useful for things like CMS systems.

                                        Real programmers use butterflies

                                        1 Reply Last reply
                                        0
                                        • H honey the codewitch

                                          According to a cursory google search: > Normalization or normalisation refers to a process that makes something more normal or regular. That's what I'm referring to. Normal, as in regular, as in consistently structured, as in i can look impose a structure on it. Further googling it sounds like there's a technical use for it that has to do with databases specifically, and how to impose some sort of notion of a "standard" order over the mess that is an RDBMS. In practice I can see why nobody cares. Which is probably also why I don't care about that, as nobody has ever paid me to care about it. If I can star cluster my DB and get it to perform, nobody cares about BCNF, at least they never have in my 20+ years of development.

                                          Real programmers use butterflies

                                          L Offline
                                          L Offline
                                          Lost User
                                          wrote on last edited by
                                          #30

                                          honey the codewitch wrote:

                                          Which is probably also why I don't care about that, as nobody has ever paid me to care about it. If I can star cluster my DB and get it to perform, nobody cares about BCNF, at least they never have in my 20+ years of development.

                                          The fact that you had to look it up says enough :) And, well, I do.

                                          Bastard Programmer from Hell :suss: "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

                                          H 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups