Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. RegEx to match formula groups [modified]

RegEx to match formula groups [modified]

Scheduled Pinned Locked Moved C#
regexquestiondata-structureshelptutorial
7 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • B Offline
    B Offline
    Bjorn T J M Spruit
    wrote on last edited by
    #1

    Forgive me if my question is unclear, I'll do my best to clarify it as best as I can. What I'm trying to do is "Tokenize" nested formula's by using RegEx. Consider the following formula in LaTeX:

    LaTeX Example 1: text before \frac{ \frac{ SubPart 1 }{ SubPart 2 } }{ Part 2 } text after

    What I need is a RegEx that can tokenize this LaTeX formula to output the following tokenlist: Token[0] = text before Token[1] = \frac{ \frac{ SubPart 1 }{ SubPart 2 } }{ Part 2 } Token[2] = text after Looking at Token[1], it has a nested fracture (\frac{...}) in its top-part. That's the way I need it to be to build an object tree. In a sense, the fracture in the top-part is a child to it's parent fracture. Consider the following formula in LaTeX:

    LaTeX Example 2: text before \frac{ \left ( SubPart 1 + SubPart 2 \ right ) }{ Part 2 } text after

    This should result in the following tokenlist: Token[0] = text before Token[1] = \frac{ \left ( SubPart 1 + SubPart 2 \ right ) }{ Part 2 } Token[2] = text after Again, looking at Token[1], it has a nested subformula in it. In this case, the subformula is a child, associated to the toppart of the fracture, to the fracture object. Final example (and then I assume you catch my drift):

    text before \left ( \frac{ SubPart 1 }{ SubPart 2 } \ right ) + X text after

    This should result in the following tokenlist: Token[0] = text before Token[1] = \left ( \frac{ SubPart 1 }{ SubPart 2 } \ right ) Token[2] = + X text after The nested-/sub-formulas will be processed when the parent formula is being constructed, so we need not worry about that part of it here. What I need, is a RegEx that can handle formula dimensions to the nth degree. Right now I do this by stepping through the string and seeing if I have a match in the sub-string for a particular item. When found, I add 1 to a level counter and run an internal loop to find the final closing "}" that brings the level back to 0; This works fine, but I've been commented upon by using this method and not using tokens, begotten with the use of RegEx. If anyone has a suggestion of how I should construct my regex, then please let me know. I've gotten as far as: Fracture : (\\frac{[\W].*?((}\s)|(}$)))* Subformula : (\\left \([\W].*?\\right \))* But as you would guess, this fails miserably when having to deal with nested or multidimensional formulas. Any help and/or insight on the matter would be greatl

    P OriginalGriffO 2 Replies Last reply
    0
    • B Bjorn T J M Spruit

      Forgive me if my question is unclear, I'll do my best to clarify it as best as I can. What I'm trying to do is "Tokenize" nested formula's by using RegEx. Consider the following formula in LaTeX:

      LaTeX Example 1: text before \frac{ \frac{ SubPart 1 }{ SubPart 2 } }{ Part 2 } text after

      What I need is a RegEx that can tokenize this LaTeX formula to output the following tokenlist: Token[0] = text before Token[1] = \frac{ \frac{ SubPart 1 }{ SubPart 2 } }{ Part 2 } Token[2] = text after Looking at Token[1], it has a nested fracture (\frac{...}) in its top-part. That's the way I need it to be to build an object tree. In a sense, the fracture in the top-part is a child to it's parent fracture. Consider the following formula in LaTeX:

      LaTeX Example 2: text before \frac{ \left ( SubPart 1 + SubPart 2 \ right ) }{ Part 2 } text after

      This should result in the following tokenlist: Token[0] = text before Token[1] = \frac{ \left ( SubPart 1 + SubPart 2 \ right ) }{ Part 2 } Token[2] = text after Again, looking at Token[1], it has a nested subformula in it. In this case, the subformula is a child, associated to the toppart of the fracture, to the fracture object. Final example (and then I assume you catch my drift):

      text before \left ( \frac{ SubPart 1 }{ SubPart 2 } \ right ) + X text after

      This should result in the following tokenlist: Token[0] = text before Token[1] = \left ( \frac{ SubPart 1 }{ SubPart 2 } \ right ) Token[2] = + X text after The nested-/sub-formulas will be processed when the parent formula is being constructed, so we need not worry about that part of it here. What I need, is a RegEx that can handle formula dimensions to the nth degree. Right now I do this by stepping through the string and seeing if I have a match in the sub-string for a particular item. When found, I add 1 to a level counter and run an internal loop to find the final closing "}" that brings the level back to 0; This works fine, but I've been commented upon by using this method and not using tokens, begotten with the use of RegEx. If anyone has a suggestion of how I should construct my regex, then please let me know. I've gotten as far as: Fracture : (\\frac{[\W].*?((}\s)|(}$)))* Subformula : (\\left \([\W].*?\\right \))* But as you would guess, this fails miserably when having to deal with nested or multidimensional formulas. Any help and/or insight on the matter would be greatl

      P Offline
      P Offline
      PIEBALDconsult
      wrote on last edited by
      #2

      For what purpose? Do your have example input and output?

      B 1 Reply Last reply
      0
      • P PIEBALDconsult

        For what purpose? Do your have example input and output?

        B Offline
        B Offline
        Bjorn T J M Spruit
        wrote on last edited by
        #3

        The purpose is to create an object tree of the formula. The examples may be found in my previous post. These are simplified examples, mind you, but they are representative of the challenge. Simply put: Parent object is Formula. Formula has a collection of child objects of the base type FormulaItem. FormulaItem[0] could be of type SubFormula (\left (...\right )). FormulaItem[0], being of the type SubFormula, has one collection of child objects of the base type FormulaItem. FormulaItem[0].FormulaItem[0] could be of type Fracture (\frac{...}). FormulaItem[0].FormulaItem[0], being of the type Fracture, has two collections (top-part and bottom-part) of the base type FormulaItem. Etc... I hope to have clarified the "why" in this.

        P 1 Reply Last reply
        0
        • B Bjorn T J M Spruit

          Forgive me if my question is unclear, I'll do my best to clarify it as best as I can. What I'm trying to do is "Tokenize" nested formula's by using RegEx. Consider the following formula in LaTeX:

          LaTeX Example 1: text before \frac{ \frac{ SubPart 1 }{ SubPart 2 } }{ Part 2 } text after

          What I need is a RegEx that can tokenize this LaTeX formula to output the following tokenlist: Token[0] = text before Token[1] = \frac{ \frac{ SubPart 1 }{ SubPart 2 } }{ Part 2 } Token[2] = text after Looking at Token[1], it has a nested fracture (\frac{...}) in its top-part. That's the way I need it to be to build an object tree. In a sense, the fracture in the top-part is a child to it's parent fracture. Consider the following formula in LaTeX:

          LaTeX Example 2: text before \frac{ \left ( SubPart 1 + SubPart 2 \ right ) }{ Part 2 } text after

          This should result in the following tokenlist: Token[0] = text before Token[1] = \frac{ \left ( SubPart 1 + SubPart 2 \ right ) }{ Part 2 } Token[2] = text after Again, looking at Token[1], it has a nested subformula in it. In this case, the subformula is a child, associated to the toppart of the fracture, to the fracture object. Final example (and then I assume you catch my drift):

          text before \left ( \frac{ SubPart 1 }{ SubPart 2 } \ right ) + X text after

          This should result in the following tokenlist: Token[0] = text before Token[1] = \left ( \frac{ SubPart 1 }{ SubPart 2 } \ right ) Token[2] = + X text after The nested-/sub-formulas will be processed when the parent formula is being constructed, so we need not worry about that part of it here. What I need, is a RegEx that can handle formula dimensions to the nth degree. Right now I do this by stepping through the string and seeing if I have a match in the sub-string for a particular item. When found, I add 1 to a level counter and run an internal loop to find the final closing "}" that brings the level back to 0; This works fine, but I've been commented upon by using this method and not using tokens, begotten with the use of RegEx. If anyone has a suggestion of how I should construct my regex, then please let me know. I've gotten as far as: Fracture : (\\frac{[\W].*?((}\s)|(}$)))* Subformula : (\\left \([\W].*?\\right \))* But as you would guess, this fails miserably when having to deal with nested or multidimensional formulas. Any help and/or insight on the matter would be greatl

          OriginalGriffO Offline
          OriginalGriffO Offline
          OriginalGriff
          wrote on last edited by
          #4

          I'm not going to try and work it out myself, but you may find this handy: Expresso[^] - examines and generates Regular expressions. Best bit is it break it down and explains it in English!

          No trees were harmed in the sending of this message; however, a significant number of electrons were slightly inconvenienced. This message is made of fully recyclable Zeros and Ones

          "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
          "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

          B 1 Reply Last reply
          0
          • B Bjorn T J M Spruit

            The purpose is to create an object tree of the formula. The examples may be found in my previous post. These are simplified examples, mind you, but they are representative of the challenge. Simply put: Parent object is Formula. Formula has a collection of child objects of the base type FormulaItem. FormulaItem[0] could be of type SubFormula (\left (...\right )). FormulaItem[0], being of the type SubFormula, has one collection of child objects of the base type FormulaItem. FormulaItem[0].FormulaItem[0] could be of type Fracture (\frac{...}). FormulaItem[0].FormulaItem[0], being of the type Fracture, has two collections (top-part and bottom-part) of the base type FormulaItem. Etc... I hope to have clarified the "why" in this.

            P Offline
            P Offline
            PIEBALDconsult
            wrote on last edited by
            #5

            Björn T.J.M. Spruit wrote:

            in my previous post.

            I didn't see it and I'm not going to go look for it.

            Björn T.J.M. Spruit wrote:

            I hope to have clarified the "why" in this.

            Nope.

            modified on Tuesday, November 3, 2009 6:21 PM

            B 1 Reply Last reply
            0
            • P PIEBALDconsult

              Björn T.J.M. Spruit wrote:

              in my previous post.

              I didn't see it and I'm not going to go look for it.

              Björn T.J.M. Spruit wrote:

              I hope to have clarified the "why" in this.

              Nope.

              modified on Tuesday, November 3, 2009 6:21 PM

              B Offline
              B Offline
              Bjorn T J M Spruit
              wrote on last edited by
              #6

              Hmmm, not very friendly then. :wtf: No worries, I'm always optimistic and don't mind a 'challenge' when I come across one. I'll give you a more extended LaTeX formula example:

              \left ( Availability~ \right ) \times \left ( Performance~ \right ) \times \left ( Quality~ \right ) \times 100 ~ =~ \left (\frac{ I}{II} \right ) \times \left (\frac{ III}{ \left ( IV \times I \right ) } \right ) \times \left (\frac{ V}{III} \right ) \times 100

              This is an example from how it's actually being used at this very moment. I'm not sure what more information you need to clarify the "why"? "Nope" isn't a very articulate way of asking me for the information you require to clarify this to you. So please let me know what it is you require of me other than what I've told you in order to clarify the "why" more accurately. Just to recap, I need to objectify a LaTeX formula. All I need, is a regex that is able to work with nested and multi-dimensional formulas as explained in the previous posts. The reason I'm exploring this, is because I've been commented upon that I didn't use tokenization by regex to get the formula elements. If there's anybody who knows how to tokenize a string that is a formula with nested and/or multi-dimensional elements, please let me know, otherwise I'll set this aside as:"Not a viable option, can't be done within a reasonable amount of time."

              modified on Wednesday, November 4, 2009 12:25 PM

              1 Reply Last reply
              0
              • OriginalGriffO OriginalGriff

                I'm not going to try and work it out myself, but you may find this handy: Expresso[^] - examines and generates Regular expressions. Best bit is it break it down and explains it in English!

                No trees were harmed in the sending of this message; however, a significant number of electrons were slightly inconvenienced. This message is made of fully recyclable Zeros and Ones

                B Offline
                B Offline
                Bjorn T J M Spruit
                wrote on last edited by
                #7

                Thank you for your advise. The application is a good one and I would certainly advise it to anyone who's going to work with regular expressions. Limited, controlled recursions of a finite count, can be tackled with regular expressions, though it makes the regular expressions cumbersome. Infinite and intuitive recursions with regular expressions can't be done in .NET as of yet. http://badassery.blogspot.com/2006/03/regex-recursion-without-balancing.html[^] I'm still investigating this and will post my findings if there's anything interesting to report.

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • World
                • Users
                • Groups