Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Algorithms
  4. Wildcard Matching Routine

Wildcard Matching Routine

Scheduled Pinned Locked Moved Algorithms
regexquestion
14 Posts 7 Posters 34 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Richard Andrew x64R Offline
    Richard Andrew x64R Offline
    Richard Andrew x64
    wrote on last edited by
    #1

    I'm writing a small routine to match strings using simple wildcards, namely "?" and "*". So I have this dilemma: Take the wildcard string "abc*xyz". In your opinion, do you think that string should match only strings that begin with "abc" and end with "xyz", with any number of characters between? Or, should that string match anything that begins with "abc" and the "xyz" at the end is irrelevant because the star matches anything that comes after "abc"? The second case would be easy to code, and I imagine the first case would be more difficult. Which way would YOU expect the matching to work?

    The difficult we do right away... ...the impossible takes slightly longer.

    L K E 3 Replies Last reply
    0
    • Richard Andrew x64R Richard Andrew x64

      I'm writing a small routine to match strings using simple wildcards, namely "?" and "*". So I have this dilemma: Take the wildcard string "abc*xyz". In your opinion, do you think that string should match only strings that begin with "abc" and end with "xyz", with any number of characters between? Or, should that string match anything that begins with "abc" and the "xyz" at the end is irrelevant because the star matches anything that comes after "abc"? The second case would be easy to code, and I imagine the first case would be more difficult. Which way would YOU expect the matching to work?

      The difficult we do right away... ...the impossible takes slightly longer.

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      Coding an ending mask (xyz) and then having it ignored seems illogical. To me, * represent 0 or one or many; ? represents any one match.

      "Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I

      Richard Andrew x64R A 2 Replies Last reply
      0
      • L Lost User

        Coding an ending mask (xyz) and then having it ignored seems illogical. To me, * represent 0 or one or many; ? represents any one match.

        "Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I

        Richard Andrew x64R Offline
        Richard Andrew x64R Offline
        Richard Andrew x64
        wrote on last edited by
        #3

        Thanks for your opinion. I tend to agree. Happy holidays!

        The difficult we do right away... ...the impossible takes slightly longer.

        L 1 Reply Last reply
        0
        • Richard Andrew x64R Richard Andrew x64

          I'm writing a small routine to match strings using simple wildcards, namely "?" and "*". So I have this dilemma: Take the wildcard string "abc*xyz". In your opinion, do you think that string should match only strings that begin with "abc" and end with "xyz", with any number of characters between? Or, should that string match anything that begins with "abc" and the "xyz" at the end is irrelevant because the star matches anything that comes after "abc"? The second case would be easy to code, and I imagine the first case would be more difficult. Which way would YOU expect the matching to work?

          The difficult we do right away... ...the impossible takes slightly longer.

          K Offline
          K Offline
          k5054
          wrote on last edited by
          #4

          Traditionally, the * wildcard in a regular expression is "zero or more instances of the preceding character", so the regex "abc*def" would match "abcdef", "abccccdef" and even "abdef" (0 c's), but not "abccxccdef". It might even match "Hello abcccdef there", depending on whether the regex is considered to be anchored or not. In a traditional regex the '.' wildcard is to match any character. However, if you're writing your own regex parser, you're free to place any meaning on the wildcard characters you want. The functionality you seem to be trying to reproduce seems very like unix file globbing. If that's what you're trying to do - and you are on a unix like system, you probably have access to glob(3), which does all the heavy lifting for you. There's probably similar functionality for windows. But maybe you're writing a regex parser for your own purposes. In which case you're free to make up the rules you want. Either way, I'd expect any literals in the regex to be present in any matches found.

          Keep Calm and Carry On

          Richard Andrew x64R 1 Reply Last reply
          0
          • Richard Andrew x64R Richard Andrew x64

            Thanks for your opinion. I tend to agree. Happy holidays!

            The difficult we do right away... ...the impossible takes slightly longer.

            L Offline
            L Offline
            Lost User
            wrote on last edited by
            #5

            You too! And best in the new year.

            "Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I

            1 Reply Last reply
            0
            • K k5054

              Traditionally, the * wildcard in a regular expression is "zero or more instances of the preceding character", so the regex "abc*def" would match "abcdef", "abccccdef" and even "abdef" (0 c's), but not "abccxccdef". It might even match "Hello abcccdef there", depending on whether the regex is considered to be anchored or not. In a traditional regex the '.' wildcard is to match any character. However, if you're writing your own regex parser, you're free to place any meaning on the wildcard characters you want. The functionality you seem to be trying to reproduce seems very like unix file globbing. If that's what you're trying to do - and you are on a unix like system, you probably have access to glob(3), which does all the heavy lifting for you. There's probably similar functionality for windows. But maybe you're writing a regex parser for your own purposes. In which case you're free to make up the rules you want. Either way, I'd expect any literals in the regex to be present in any matches found.

              Keep Calm and Carry On

              Richard Andrew x64R Offline
              Richard Andrew x64R Offline
              Richard Andrew x64
              wrote on last edited by
              #6

              Thank you for your input. I'm actually not going for regular expressions, just simple wildcard matching like in MS-DOS. Happy holidays!

              The difficult we do right away... ...the impossible takes slightly longer.

              1 Reply Last reply
              0
              • L Lost User

                Coding an ending mask (xyz) and then having it ignored seems illogical. To me, * represent 0 or one or many; ? represents any one match.

                "Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I

                A Offline
                A Offline
                arshad hussain 2022
                wrote on last edited by
                #7

                Given a text and a wildcard pattern, implement wildcard pattern matching algorithm that finds if wildcard pattern is matched with text. The matching should cover the entire text (not partial text).
                The wildcard pattern can include the characters ‘?’ and ‘*’by clicking on [](<u>https://codeprozone.com</u>)codeprozone.
                ‘?’ – matches any single character
                ‘*’ – Matches any sequence of characters (including the empty sequence)

                1 Reply Last reply
                0
                • Richard Andrew x64R Richard Andrew x64

                  I'm writing a small routine to match strings using simple wildcards, namely "?" and "*". So I have this dilemma: Take the wildcard string "abc*xyz". In your opinion, do you think that string should match only strings that begin with "abc" and end with "xyz", with any number of characters between? Or, should that string match anything that begins with "abc" and the "xyz" at the end is irrelevant because the star matches anything that comes after "abc"? The second case would be easy to code, and I imagine the first case would be more difficult. Which way would YOU expect the matching to work?

                  The difficult we do right away... ...the impossible takes slightly longer.

                  E Offline
                  E Offline
                  englebart
                  wrote on last edited by
                  #8

                  String must start with abc and end with xyz I also think that abc*xyz should match these: abcxyxyz abcxxyz abcxyz abcxxxxyyyyxyz Add them to your unit tests. The nice thing with this algorithm is that there is not a lot of state to track. Very simple to implement in any language.

                  T 1 Reply Last reply
                  0
                  • E englebart

                    String must start with abc and end with xyz I also think that abc*xyz should match these: abcxyxyz abcxxyz abcxyz abcxxxxyyyyxyz Add them to your unit tests. The nice thing with this algorithm is that there is not a lot of state to track. Very simple to implement in any language.

                    T Offline
                    T Offline
                    trønderen
                    wrote on last edited by
                    #9

                    Slightly more complex if you want to extend it to a path of (back)slash separated directory levels with a directory name '**' indicating zero or more directory levels. I find this so useful that I would recommend that you include it from the very beginning. (Assuming, of course, that your wildcard routine is intended for file names, or similarly structured name strings.)

                    E 1 Reply Last reply
                    0
                    • T trønderen

                      Slightly more complex if you want to extend it to a path of (back)slash separated directory levels with a directory name '**' indicating zero or more directory levels. I find this so useful that I would recommend that you include it from the very beginning. (Assuming, of course, that your wildcard routine is intended for file names, or similarly structured name strings.)

                      E Offline
                      E Offline
                      englebart
                      wrote on last edited by
                      #10

                      If * matched any character, then it would already match a slash. You would have to make * not match a slash to need ** as another symbol. I have seen and used the ** in a lot of utilities including Ant xml file sets. I have never had to write the logic for it, though.

                      L T 2 Replies Last reply
                      0
                      • E englebart

                        If * matched any character, then it would already match a slash. You would have to make * not match a slash to need ** as another symbol. I have seen and used the ** in a lot of utilities including Ant xml file sets. I have never had to write the logic for it, though.

                        L Offline
                        L Offline
                        Lost User
                        wrote on last edited by
                        #11

                        Try MSDOS 5.0. That's what I expect. Nothing more. Nothing less.

                        Bastard Programmer from Hell :suss: "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

                        1 Reply Last reply
                        0
                        • E englebart

                          If * matched any character, then it would already match a slash. You would have to make * not match a slash to need ** as another symbol. I have seen and used the ** in a lot of utilities including Ant xml file sets. I have never had to write the logic for it, though.

                          T Offline
                          T Offline
                          trønderen
                          wrote on last edited by
                          #12

                          If you write a matching routine for file/path names, you should at least as an option treat the path separators differently. For a general matching routine, the (set of) separator character(s) should be a parameter, so the same routine can be used in different contexts, e.g. different file systems. I guess that writing a match for ** that doesn't use recursion would require more effort and the code would be more difficult to comprehend than to do it recursively. I wouldn't ever consider flattening that recursive matching routine I use in my code. (But of course, like in all recursion, I take care to reduce the stack frame to a minimum.)

                          E 1 Reply Last reply
                          0
                          • T trønderen

                            If you write a matching routine for file/path names, you should at least as an option treat the path separators differently. For a general matching routine, the (set of) separator character(s) should be a parameter, so the same routine can be used in different contexts, e.g. different file systems. I guess that writing a match for ** that doesn't use recursion would require more effort and the code would be more difficult to comprehend than to do it recursively. I wouldn't ever consider flattening that recursive matching routine I use in my code. (But of course, like in all recursion, I take care to reduce the stack frame to a minimum.)

                            E Offline
                            E Offline
                            englebart
                            wrote on last edited by
                            #13

                            The original post does not mention file system anywhere. I was thinking more in terms of a pure programming exercise. For traversing directory structures, I use a Visitor pattern with methods of beginDir/endDir/foundFile. This allows easy reuse of the recursive algorithm across a dozen utilities that process file trees. Keeps the stack lean for the recursion, any bloat ends up in the Visitor’s heap memory. (Including a few utilities that needed their own stack data structure to perform their job)

                            K 1 Reply Last reply
                            0
                            • E englebart

                              The original post does not mention file system anywhere. I was thinking more in terms of a pure programming exercise. For traversing directory structures, I use a Visitor pattern with methods of beginDir/endDir/foundFile. This allows easy reuse of the recursive algorithm across a dozen utilities that process file trees. Keeps the stack lean for the recursion, any bloat ends up in the Visitor’s heap memory. (Including a few utilities that needed their own stack data structure to perform their job)

                              K Offline
                              K Offline
                              kalberts
                              wrote on last edited by
                              #14

                              englebart wrote:

                              The original post does not mention file system anywhere.

                              Sure. That is why I explicitly point out that my comments are valid if the application is a file system directory search. That doesn't mean "if and only if", though. I think my remarks are valid even for other cases where the string somehow expresses a hierarchical structure or classification. It would be equally valid for, say, the military way (at least in Norway) of referring to items in ways such as "pants under long white", with space as the level separator. You may of course argue "But when there is no structure at all, then this talk about separators is irrelevant" - which is perfectly true. I just took it one step further, generalising it a little bit so that it also included those case where a hierarchical structure is implied. In quite a lot of the cases you encoounter in software systems, that is indeed the case!

                              1 Reply Last reply
                              0
                              Reply
                              • Reply as topic
                              Log in to reply
                              • Oldest to Newest
                              • Newest to Oldest
                              • Most Votes


                              • Login

                              • Don't have an account? Register

                              • Login or register to search.
                              • First post
                                Last post
                              0
                              • Categories
                              • Recent
                              • Tags
                              • Popular
                              • World
                              • Users
                              • Groups