Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Regular Expressions
  4. MySQL Select query that chooses something dot something between certain underscores

MySQL Select query that chooses something dot something between certain underscores

Scheduled Pinned Locked Moved Regular Expressions
databasemysqlcomregexquestion
16 Posts 3 Posters 21 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L Offline
    L Offline
    Lost User
    wrote on last edited by
    #1

    I have a MySQL column with underscores and between those underscores a domain name (which is anything with a dot, except number.number), and I like to use REGEXP_SUBSTR to extract that domain, but only if there are at least 2 preceding underscores before it. Examples: 1. Something_Else_Type_XY_Z_26.04.23_website.com => website.com 2. Something_Else_Type_XY_Z_website.com_26.04.23 => website.com 3. Something_Else_Type_XY_Z_26.04.23_website.com_Comment => website.com 4. Something_website.com_Else_Type_XY_Z_26.04.23_Comment => NULL (due to not having 2 preceding underscores first) 5. Something_Else_Type_XY_Z_26.04.23_Comment => NULL (due to not having something dot something that isn't number.number) 6. Something_Else_Type_XY_dfd6869_3_21.12.22_website.com_ZU => website.com Which regex could do so instead of ... in SELECT REGEXP_SUBSTR(full_name, ...) as domain_name?

    R J 2 Replies Last reply
    0
    • L Lost User

      I have a MySQL column with underscores and between those underscores a domain name (which is anything with a dot, except number.number), and I like to use REGEXP_SUBSTR to extract that domain, but only if there are at least 2 preceding underscores before it. Examples: 1. Something_Else_Type_XY_Z_26.04.23_website.com => website.com 2. Something_Else_Type_XY_Z_website.com_26.04.23 => website.com 3. Something_Else_Type_XY_Z_26.04.23_website.com_Comment => website.com 4. Something_website.com_Else_Type_XY_Z_26.04.23_Comment => NULL (due to not having 2 preceding underscores first) 5. Something_Else_Type_XY_Z_26.04.23_Comment => NULL (due to not having something dot something that isn't number.number) 6. Something_Else_Type_XY_dfd6869_3_21.12.22_website.com_ZU => website.com Which regex could do so instead of ... in SELECT REGEXP_SUBSTR(full_name, ...) as domain_name?

      R Offline
      R Offline
      RedDk
      wrote on last edited by
      #2

      One of these then: \b([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\b ^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}$ \b((xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\b \b((?=[a-z0-9-]{1,63}\.)(xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,63}\b \b((?=[a-z0-9-]{1,63}\.)[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,63}\b Footnote: these make no sense of leading BLANK as your requirement states but I throw them out there anyway for your testing pleasure.

      L 1 Reply Last reply
      0
      • L Lost User

        I have a MySQL column with underscores and between those underscores a domain name (which is anything with a dot, except number.number), and I like to use REGEXP_SUBSTR to extract that domain, but only if there are at least 2 preceding underscores before it. Examples: 1. Something_Else_Type_XY_Z_26.04.23_website.com => website.com 2. Something_Else_Type_XY_Z_website.com_26.04.23 => website.com 3. Something_Else_Type_XY_Z_26.04.23_website.com_Comment => website.com 4. Something_website.com_Else_Type_XY_Z_26.04.23_Comment => NULL (due to not having 2 preceding underscores first) 5. Something_Else_Type_XY_Z_26.04.23_Comment => NULL (due to not having something dot something that isn't number.number) 6. Something_Else_Type_XY_dfd6869_3_21.12.22_website.com_ZU => website.com Which regex could do so instead of ... in SELECT REGEXP_SUBSTR(full_name, ...) as domain_name?

        J Offline
        J Offline
        jschell
        wrote on last edited by
        #3

        You should keep in mind that this is going to be really slow when compared to other types of searches. So if you are using large volumes of data and/or many searches you should keep that in mind. You should also keep in mind that there are valid domain names that will have more than one dot (period).

        1 Reply Last reply
        0
        • R RedDk

          One of these then: \b([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\b ^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}$ \b((xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\b \b((?=[a-z0-9-]{1,63}\.)(xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,63}\b \b((?=[a-z0-9-]{1,63}\.)[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,63}\b Footnote: these make no sense of leading BLANK as your requirement states but I throw them out there anyway for your testing pleasure.

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          Unfortunately none of them worked at all. REGEXP_SUBSTR(field_name, '^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}$') immediately timed out (#3699 - Timeout exceeded in regular expression match), and the rest just returned NULL for everything. Any idea why?

          R 1 Reply Last reply
          0
          • L Lost User

            Unfortunately none of them worked at all. REGEXP_SUBSTR(field_name, '^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}$') immediately timed out (#3699 - Timeout exceeded in regular expression match), and the rest just returned NULL for everything. Any idea why?

            R Offline
            R Offline
            RedDk
            wrote on last edited by
            #5

            Well, first of all those regex were std::regx not MySQL. What exactly have timing out and error #3699 to do with this "SELECT" query? I'm not sure why none of them work because I'm not sure how you're leveraging this database. REGEX_SUBSTR? Is that a function? Probably Oracle, he said. I know nothing about Oracle, sorry.

            L 1 Reply Last reply
            0
            • R RedDk

              Well, first of all those regex were std::regx not MySQL. What exactly have timing out and error #3699 to do with this "SELECT" query? I'm not sure why none of them work because I'm not sure how you're leveraging this database. REGEX_SUBSTR? Is that a function? Probably Oracle, he said. I know nothing about Oracle, sorry.

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              There's nothing about Oracle here other than the company name. It's just a standard MySQL command, see https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr[^] and MySQL regexp_substr() function - javatpoint[^]. All I asked was which input to type there to get the domain name from a column that puts the domain between underscores (or just end of line) and after at least 2 underscores (examples in the OP).

              L 1 Reply Last reply
              0
              • L Lost User

                There's nothing about Oracle here other than the company name. It's just a standard MySQL command, see https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr[^] and MySQL regexp_substr() function - javatpoint[^]. All I asked was which input to type there to get the domain name from a column that puts the domain between underscores (or just end of line) and after at least 2 underscores (examples in the OP).

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #7

                I just tried a quick test at RegExr: Learn, Build, & Test RegEx[^] and the following regex finds all the web addresses in the set below (shown in bold). The only thing I cannot correct is that it includes the leading underscore. However it may be a starting point to help you (I have not actually tried it with MySQL).

                _[a-zA-Z]*\.[a-zA-Z]*|\.[a-zA-Z].

                Something_Else_Type_XY_Z_26.04.23**_website.com**
                Something_Else_Type_XY_Z**_website.co.uk**_26.04.23
                Something_Else_Type_XY_Z_26.04.23**_website.com**_Comment
                Something**_website.com**_Else_Type_XY_Z_26.04.23_Comment
                Something_Else_Type_XY_Z_26.04.23_Comment
                Something_Else_Type_XY_dfd6869_3_21.12.22**_website.com**_ZU

                L 1 Reply Last reply
                0
                • L Lost User

                  I just tried a quick test at RegExr: Learn, Build, & Test RegEx[^] and the following regex finds all the web addresses in the set below (shown in bold). The only thing I cannot correct is that it includes the leading underscore. However it may be a starting point to help you (I have not actually tried it with MySQL).

                  _[a-zA-Z]*\.[a-zA-Z]*|\.[a-zA-Z].

                  Something_Else_Type_XY_Z_26.04.23**_website.com**
                  Something_Else_Type_XY_Z**_website.co.uk**_26.04.23
                  Something_Else_Type_XY_Z_26.04.23**_website.com**_Comment
                  Something**_website.com**_Else_Type_XY_Z_26.04.23_Comment
                  Something_Else_Type_XY_Z_26.04.23_Comment
                  Something_Else_Type_XY_dfd6869_3_21.12.22**_website.com**_ZU

                  L Offline
                  L Offline
                  Lost User
                  wrote on last edited by
                  #8

                  Unfortunately REGEXP_SUBSTR(field_name, '_[a-zA-Z]*\.[a-zA-Z]*|\.[a-zA-Z]') just gives me the first 2 letters of the field_name ("So" in this case).

                  L R 2 Replies Last reply
                  0
                  • L Lost User

                    Unfortunately REGEXP_SUBSTR(field_name, '_[a-zA-Z]*\.[a-zA-Z]*|\.[a-zA-Z]') just gives me the first 2 letters of the field_name ("So" in this case).

                    L Offline
                    L Offline
                    Lost User
                    wrote on last edited by
                    #9

                    Yes, I got similar results, which does not make a lot of sense. However, on reading the documentation (https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr[^]) closely I discovered why. Hidden down near the bottom of the page is the following:

                    Quote:

                    To use a literal instance of a special character in a regular expression, precede it by two backslash (\) characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string 1+2 that contains the special + character, only the last of the following regular expressions is the correct one:

                    So I tried with that, and a slight modification to my original pattern, and successfully extracted the fields as shown in the following results:

                    mysql> SELECT REGEXP_SUBSTR(Title, '_[a-zA-Z]+(\\.[a-zA-Z]+)+') from Books;
                    +---------------------------------------------------+
                    | REGEXP_SUBSTR(Title, '_[a-zA-Z]+(\\.[a-zA-Z]+)+') |
                    +---------------------------------------------------+
                    | NULL |
                    | NULL |
                    | NULL |
                    | NULL |
                    | _website.com |
                    | NULL |
                    | _website.com |
                    | _website.com |
                    | _website.co.uk |
                    | _website.com |
                    | NULL |
                    | NULL |
                    +---------------------------------------------------+

                    The NULL entries are records that do not contain a web address. [EDIT] I just tried without the leading underscore in the regular expression and added some numbers into the website names and got the following:

                    mysql> SELECT REGEXP_SUBSTR(Title, '[a-zA-Z][[:alnum:]]+(\\.[a-zA-Z]+)+') from Books;
                    +-------------------------------------------------------------+
                    | REGEXP_SUBSTR(Title, '[a-zA-Z][[:alnum:]]+(\\.[a-zA-Z]+)+') |
                    +-----------------------------------------------

                    L R 2 Replies Last reply
                    0
                    • L Lost User

                      Unfortunately REGEXP_SUBSTR(field_name, '_[a-zA-Z]*\.[a-zA-Z]*|\.[a-zA-Z]') just gives me the first 2 letters of the field_name ("So" in this case).

                      R Offline
                      R Offline
                      RedDk
                      wrote on last edited by
                      #10

                      Ok, (didn't read the doc link but thanks for that). Using your new regex:

                      _[a-zA-Z]*\.[a-zA-Z]*|\.[a-zA-Z]

                      on the sample of your OP:

                      1. Something_Else_Type_XY_Z_26.04.23_website.com => _website.com
                      2. Something_Else_Type_XY_Z_website.com_26.04.23 => _website.com
                      3. Something_Else_Type_XY_Z_26.04.23_website.com_Comment => _website.com
                      4. Something_website.com_Else_Type_XY_Z_26.04.23_Comment => NULL
                      5. Something_Else_Type_XY_Z_26.04.23_Comment => NULL
                      6. Something_Else_Type_XY_dfd6869_3_21.12.22_website.com_ZU = _website.com

                      Which isn't quite what you wanted (because of that leading "_"). And as for RM's take on it, as-written (without any _[a-zA-Z]*\.) THAT string returns no matches here either (I use a tool called RegexBuddy and after yesterday's experience, with MySql selected as the input language not std::regex). So although I can confirm today's discovery is close it'll take me some more sleuthing to run down that leading character ascii 95 ... {EDIT] I'm seeing a warning when I try to tackle that underscore using "shorthand character classes" \w to the tune of "MySql doesn't support blah blah blah" so ... this could be a while. [END EDIT]

                      L 1 Reply Last reply
                      0
                      • R RedDk

                        Ok, (didn't read the doc link but thanks for that). Using your new regex:

                        _[a-zA-Z]*\.[a-zA-Z]*|\.[a-zA-Z]

                        on the sample of your OP:

                        1. Something_Else_Type_XY_Z_26.04.23_website.com => _website.com
                        2. Something_Else_Type_XY_Z_website.com_26.04.23 => _website.com
                        3. Something_Else_Type_XY_Z_26.04.23_website.com_Comment => _website.com
                        4. Something_website.com_Else_Type_XY_Z_26.04.23_Comment => NULL
                        5. Something_Else_Type_XY_Z_26.04.23_Comment => NULL
                        6. Something_Else_Type_XY_dfd6869_3_21.12.22_website.com_ZU = _website.com

                        Which isn't quite what you wanted (because of that leading "_"). And as for RM's take on it, as-written (without any _[a-zA-Z]*\.) THAT string returns no matches here either (I use a tool called RegexBuddy and after yesterday's experience, with MySql selected as the input language not std::regex). So although I can confirm today's discovery is close it'll take me some more sleuthing to run down that leading character ascii 95 ... {EDIT] I'm seeing a warning when I try to tackle that underscore using "shorthand character classes" \w to the tune of "MySql doesn't support blah blah blah" so ... this could be a while. [END EDIT]

                        L Offline
                        L Offline
                        Lost User
                        wrote on last edited by
                        #11

                        See my update to yesterdays answer above.

                        1 Reply Last reply
                        0
                        • L Lost User

                          Yes, I got similar results, which does not make a lot of sense. However, on reading the documentation (https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr[^]) closely I discovered why. Hidden down near the bottom of the page is the following:

                          Quote:

                          To use a literal instance of a special character in a regular expression, precede it by two backslash (\) characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string 1+2 that contains the special + character, only the last of the following regular expressions is the correct one:

                          So I tried with that, and a slight modification to my original pattern, and successfully extracted the fields as shown in the following results:

                          mysql> SELECT REGEXP_SUBSTR(Title, '_[a-zA-Z]+(\\.[a-zA-Z]+)+') from Books;
                          +---------------------------------------------------+
                          | REGEXP_SUBSTR(Title, '_[a-zA-Z]+(\\.[a-zA-Z]+)+') |
                          +---------------------------------------------------+
                          | NULL |
                          | NULL |
                          | NULL |
                          | NULL |
                          | _website.com |
                          | NULL |
                          | _website.com |
                          | _website.com |
                          | _website.co.uk |
                          | _website.com |
                          | NULL |
                          | NULL |
                          +---------------------------------------------------+

                          The NULL entries are records that do not contain a web address. [EDIT] I just tried without the leading underscore in the regular expression and added some numbers into the website names and got the following:

                          mysql> SELECT REGEXP_SUBSTR(Title, '[a-zA-Z][[:alnum:]]+(\\.[a-zA-Z]+)+') from Books;
                          +-------------------------------------------------------------+
                          | REGEXP_SUBSTR(Title, '[a-zA-Z][[:alnum:]]+(\\.[a-zA-Z]+)+') |
                          +-----------------------------------------------

                          L Offline
                          L Offline
                          Lost User
                          wrote on last edited by
                          #12

                          Did you test according to my examples? What about:

                          Something_website.com_Else_Type_XY_Z_26.04.23_Comment => NULL (due to not having 2 preceding underscores first)

                          Your regex returns website.com there instead of NULL.

                          L 1 Reply Last reply
                          0
                          • L Lost User

                            Did you test according to my examples? What about:

                            Something_website.com_Else_Type_XY_Z_26.04.23_Comment => NULL (due to not having 2 preceding underscores first)

                            Your regex returns website.com there instead of NULL.

                            L Offline
                            L Offline
                            Lost User
                            wrote on last edited by
                            #13

                            No, I am afraid I did not work that one out; I leave that as an exercise for you.

                            1 Reply Last reply
                            0
                            • L Lost User

                              Yes, I got similar results, which does not make a lot of sense. However, on reading the documentation (https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr[^]) closely I discovered why. Hidden down near the bottom of the page is the following:

                              Quote:

                              To use a literal instance of a special character in a regular expression, precede it by two backslash (\) characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string 1+2 that contains the special + character, only the last of the following regular expressions is the correct one:

                              So I tried with that, and a slight modification to my original pattern, and successfully extracted the fields as shown in the following results:

                              mysql> SELECT REGEXP_SUBSTR(Title, '_[a-zA-Z]+(\\.[a-zA-Z]+)+') from Books;
                              +---------------------------------------------------+
                              | REGEXP_SUBSTR(Title, '_[a-zA-Z]+(\\.[a-zA-Z]+)+') |
                              +---------------------------------------------------+
                              | NULL |
                              | NULL |
                              | NULL |
                              | NULL |
                              | _website.com |
                              | NULL |
                              | _website.com |
                              | _website.com |
                              | _website.co.uk |
                              | _website.com |
                              | NULL |
                              | NULL |
                              +---------------------------------------------------+

                              The NULL entries are records that do not contain a web address. [EDIT] I just tried without the leading underscore in the regular expression and added some numbers into the website names and got the following:

                              mysql> SELECT REGEXP_SUBSTR(Title, '[a-zA-Z][[:alnum:]]+(\\.[a-zA-Z]+)+') from Books;
                              +-------------------------------------------------------------+
                              | REGEXP_SUBSTR(Title, '[a-zA-Z][[:alnum:]]+(\\.[a-zA-Z]+)+') |
                              +-----------------------------------------------

                              R Offline
                              R Offline
                              RedDk
                              wrote on last edited by
                              #14

                              Concur here, using RegexBuddy and this string: [a-zA-Z][[:alnum:]]*\.[a-zA-Z]*|\.[a-zA-Z] Nothing but net (colloquial expression eliding formal disdain without turning away in disgust from self-realized talentlessness). Thanks for this.

                              L 1 Reply Last reply
                              0
                              • R RedDk

                                Concur here, using RegexBuddy and this string: [a-zA-Z][[:alnum:]]*\.[a-zA-Z]*|\.[a-zA-Z] Nothing but net (colloquial expression eliding formal disdain without turning away in disgust from self-realized talentlessness). Thanks for this.

                                L Offline
                                L Offline
                                Lost User
                                wrote on last edited by
                                #15

                                MySQL's RE engine has one or two quirks of its own(after all, why make it easy?). The only part I could not figure out is how to ensure at least two underscores in the earlier part of the string. I bet @OriginalGriff or @RichardDeeming could do it with their eyes closed.

                                R 1 Reply Last reply
                                0
                                • L Lost User

                                  MySQL's RE engine has one or two quirks of its own(after all, why make it easy?). The only part I could not figure out is how to ensure at least two underscores in the earlier part of the string. I bet @OriginalGriff or @RichardDeeming could do it with their eyes closed.

                                  R Offline
                                  R Offline
                                  RedDk
                                  wrote on last edited by
                                  #16

                                  Using RegexBuddy (yeah, I guess I am falling into a hole here) the POSIX notation is perfectly acceptable, hence your [:alnum:]. I can substitute [:alpha:] there as well and I suspect others in the list are good actors. My brain infarct occurs though when I use Herr Gevertz' table of translations and try to replace the POSIX with his ASCII ([a-zA-Z0-9]) or for that matter UNICODE ([\p{L}\p{Nl}\p{Nd}]), neither of which work in his interface. Grand tool, RegexBuddy ... it's one of those feature-rich interfaces which swim with example/sample in the help volume. My only complaint is that searching through documentation for words has no highlight of the found set so I've got to print out the page and use my pdf search to locate all instances!

                                  1 Reply Last reply
                                  0
                                  Reply
                                  • Reply as topic
                                  Log in to reply
                                  • Oldest to Newest
                                  • Newest to Oldest
                                  • Most Votes


                                  • Login

                                  • Don't have an account? Register

                                  • Login or register to search.
                                  • First post
                                    Last post
                                  0
                                  • Categories
                                  • Recent
                                  • Tags
                                  • Popular
                                  • World
                                  • Users
                                  • Groups