Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Inverse regex matching

Inverse regex matching

Scheduled Pinned Locked Moved C#
regexcsharpperlcomhelp
5 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • W Offline
    W Offline
    WillemM
    wrote on last edited by
    #1

    This one is tricky, that's for sure. I'm looking for a way to generate strings based on a regular expression. I tried several searches on google and the only stuff that turned up was a library called Grail+ and some perl module genex. Before I'm going to hack my way into integrating perl into my C# application I want to make sure that I haven't missed a library that does what I need. Is there anyone who has done this before or knows a library which can help me with this?

    WM. What about weapons of mass-construction? "What? Its an Apple MacBook Pro. They are sexy!" - Paul Watson My blog

    S 1 Reply Last reply
    0
    • W WillemM

      This one is tricky, that's for sure. I'm looking for a way to generate strings based on a regular expression. I tried several searches on google and the only stuff that turned up was a library called Grail+ and some perl module genex. Before I'm going to hack my way into integrating perl into my C# application I want to make sure that I haven't missed a library that does what I need. Is there anyone who has done this before or knows a library which can help me with this?

      WM. What about weapons of mass-construction? "What? Its an Apple MacBook Pro. They are sexy!" - Paul Watson My blog

      S Offline
      S Offline
      Skippums
      wrote on last edited by
      #2

      Are you just trying to generate random strings that match the regex? What could this possibly be used for? I doubt such functionality is already in the .Net framework, but I am not certain. You could create your own state machine based on a regex, then getting a random string is done trivially by randomly choosing paths from one state to another until you arrive at one of the success states (unless you want to bound or specify the resulting string's length). Anyway, good luck, and let us know if you find a solution elsewhere.

      Sounds like somebody's got a case of the Mondays -Jeff

      W 1 Reply Last reply
      0
      • S Skippums

        Are you just trying to generate random strings that match the regex? What could this possibly be used for? I doubt such functionality is already in the .Net framework, but I am not certain. You could create your own state machine based on a regex, then getting a random string is done trivially by randomly choosing paths from one state to another until you arrive at one of the success states (unless you want to bound or specify the resulting string's length). Anyway, good luck, and let us know if you find a solution elsewhere.

        Sounds like somebody's got a case of the Mondays -Jeff

        W Offline
        W Offline
        WillemM
        wrote on last edited by
        #3

        Indeed, I want to generate random strings that match the regex. I'm generating random data for a database which has some fields that need to match to a specific pattern. I'm pretty sure .NET doesn't have the functionality I'm looking for. I think creating my own state machine is indeed the best way to go around this, however it's not going to be easy and there is a problem with the max length of strings. I need to find a way to limit the length of the data generated by the * and + operator otherwise this is impossible to do. Thanks nonetheless

        WM. What about weapons of mass-construction? "What? Its an Apple MacBook Pro. They are sexy!" - Paul Watson My blog

        S 1 Reply Last reply
        0
        • W WillemM

          Indeed, I want to generate random strings that match the regex. I'm generating random data for a database which has some fields that need to match to a specific pattern. I'm pretty sure .NET doesn't have the functionality I'm looking for. I think creating my own state machine is indeed the best way to go around this, however it's not going to be easy and there is a problem with the max length of strings. I need to find a way to limit the length of the data generated by the * and + operator otherwise this is impossible to do. Thanks nonetheless

          WM. What about weapons of mass-construction? "What? Its an Apple MacBook Pro. They are sexy!" - Paul Watson My blog

          S Offline
          S Offline
          Skippums
          wrote on last edited by
          #4

          Here is how I would implement it, if I were going to do what you describe (where 'n' is the desired length of the result):

          1. Build a directed graph of your regex with nodes==state and edges==allowed state transitions
          2. Build a structure containing the number of steps that can be made from
          any one state to any other that is <= the length of your result (2d matrix of lists)
          3. Randomly choose an acceptance state (named 'as') where your connectivity matrix
          indicates you can go from the start state to the final state in exactly n transitions.
          4. Iterate through the following (initialize cur = startStateNode) until your string is found:
          a. Set 'n' to 'n-1'
          b. Out of all of the nodes reachable from 'cur' in one move,
          eliminate those that can not reach 'as' in 'n' moves
          c. Choose one of the resulting nodes in the collection found in (b)

          Once you finish with this process, you will have a randomly generated string matching your regex. Note, however, that strings produced do NOT have equal probability of being chosen. For example, if your regex is, say "(Becky|a\w*)", then "Becky" will result half of the time, whereas words beginning with the letter "a" (this set is clearly much larger) would take the other half. Choosing one at random with equal probability would require you to find ALL matching strings of your given length first, then choosing one of those strings at random (or, you could use graph theory and weight each choice by the number of possible ways to arrive at the destination in the required number of moves). Whatever you decide, I hope you can see that this is clearly not a trivial computation, and it will require a significant (IMO) source of man-hours to code. Good luck,

          Sounds like somebody's got a case of the Mondays -Jeff

          W 1 Reply Last reply
          0
          • S Skippums

            Here is how I would implement it, if I were going to do what you describe (where 'n' is the desired length of the result):

            1. Build a directed graph of your regex with nodes==state and edges==allowed state transitions
            2. Build a structure containing the number of steps that can be made from
            any one state to any other that is <= the length of your result (2d matrix of lists)
            3. Randomly choose an acceptance state (named 'as') where your connectivity matrix
            indicates you can go from the start state to the final state in exactly n transitions.
            4. Iterate through the following (initialize cur = startStateNode) until your string is found:
            a. Set 'n' to 'n-1'
            b. Out of all of the nodes reachable from 'cur' in one move,
            eliminate those that can not reach 'as' in 'n' moves
            c. Choose one of the resulting nodes in the collection found in (b)

            Once you finish with this process, you will have a randomly generated string matching your regex. Note, however, that strings produced do NOT have equal probability of being chosen. For example, if your regex is, say "(Becky|a\w*)", then "Becky" will result half of the time, whereas words beginning with the letter "a" (this set is clearly much larger) would take the other half. Choosing one at random with equal probability would require you to find ALL matching strings of your given length first, then choosing one of those strings at random (or, you could use graph theory and weight each choice by the number of possible ways to arrive at the destination in the required number of moves). Whatever you decide, I hope you can see that this is clearly not a trivial computation, and it will require a significant (IMO) source of man-hours to code. Good luck,

            Sounds like somebody's got a case of the Mondays -Jeff

            W Offline
            W Offline
            WillemM
            wrote on last edited by
            #5

            Thanks for the outline, it got me thinking about how regex parsers are actually build. I found some old material on how to write your own regex parser and I'm going to use that for my library. My plan is to implement a partial regex parser. This parser will be constructing an NFA of the regex expression and convert it to a DFA. After that I'm going to determine which parts of the DFA will have a fixed length, those parts will need to be generated first. When these parts are generated I will be filling up the rest of the string with the other variable length parts, starting with the operator + (You need at least one if these) and after that the * operator.

            WM. What about weapons of mass-construction? "What? Its an Apple MacBook Pro. They are sexy!" - Paul Watson My blog

            1 Reply Last reply
            0
            Reply
            • Reply as topic
            Log in to reply
            • Oldest to Newest
            • Newest to Oldest
            • Most Votes


            • Login

            • Don't have an account? Register

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • World
            • Users
            • Groups