Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. regex help

regex help

Scheduled Pinned Locked Moved C#
htmldatabaseregexhelp
13 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V vivasaayi

    The string you presented is a valid XML (root element is missing). If performance is not an issue, first add a root element and then you can use XmlDocument or XmlReader to extract the information you needed.

    U Offline
    U Offline
    uglyeyes
    wrote on last edited by
    #4

    no only the text i have provided you is valid. the content of the entire csv files has text like ----1---- my contents.. ----2---- my content2.. ... ... so could you please help how to extract the data i want? i have this regex that seem to work to get text between div tag

    ?<=\<div class=""middlead""\>).*?(?=\</div\>

    but i need to get description of apple too. i tried to use below code

    string fName = @"data.txt";//path to text file
    StreamReader testTxt = new StreamReader(fName);
    string allRead = testTxt.ReadToEnd();//Reads the whole text file to the end
    testTxt.Close(); //Closes the text file after it is fully read.

            //Regex rx = new Regex(@"(?<=\\<div class=""middlead""\\>).\*?(?=\\</div\\>)", RegexOptions.Singleline);
            Regex rx1 = new Regex(@"(?<=\\<p\\>&nbsp;&nbsp;&nbsp;\\</p\\>).\*?(?=\\</p\\>)", RegexOptions.Singleline);
    
                
    
            //MatchCollection matches = rx.Matches(allRead);
            MatchCollection matches1 = rx1.Matches(allRead);
    
            StreamWriter sw = new StreamWriter(@"realdata.txt");
            int count = 0;
            foreach (Match match in matches1)
            {
                sw.WriteLine(count.ToString());
                sw.WriteLine(match.ToString());
    
                foreach (Match match1 in matches1)
                {
                    sw.WriteLine(match1.ToString());
                }
                count++;
    
            }
            sw.Close();
    
    
          
            
        }
    

    but some how regex rx1 is not only giving text that i want but its doing greedy matching and try to match everything that has

    could you please help as to how can i extract the description of those products.

    realJSOPR 1 Reply Last reply
    0
    • U uglyeyes

      no only the text i have provided you is valid. the content of the entire csv files has text like ----1---- my contents.. ----2---- my content2.. ... ... so could you please help how to extract the data i want? i have this regex that seem to work to get text between div tag

      ?<=\<div class=""middlead""\>).*?(?=\</div\>

      but i need to get description of apple too. i tried to use below code

      string fName = @"data.txt";//path to text file
      StreamReader testTxt = new StreamReader(fName);
      string allRead = testTxt.ReadToEnd();//Reads the whole text file to the end
      testTxt.Close(); //Closes the text file after it is fully read.

              //Regex rx = new Regex(@"(?<=\\<div class=""middlead""\\>).\*?(?=\\</div\\>)", RegexOptions.Singleline);
              Regex rx1 = new Regex(@"(?<=\\<p\\>&nbsp;&nbsp;&nbsp;\\</p\\>).\*?(?=\\</p\\>)", RegexOptions.Singleline);
      
                  
      
              //MatchCollection matches = rx.Matches(allRead);
              MatchCollection matches1 = rx1.Matches(allRead);
      
              StreamWriter sw = new StreamWriter(@"realdata.txt");
              int count = 0;
              foreach (Match match in matches1)
              {
                  sw.WriteLine(count.ToString());
                  sw.WriteLine(match.ToString());
      
                  foreach (Match match1 in matches1)
                  {
                      sw.WriteLine(match1.ToString());
                  }
                  count++;
      
              }
              sw.Close();
      
      
            
              
          }
      

      but some how regex rx1 is not only giving text that i want but its doing greedy matching and try to match everything that has

      could you please help as to how can i extract the description of those products.

      realJSOPR Offline
      realJSOPR Offline
      realJSOP
      wrote on last edited by
      #5

      Once again, html IS xml. Just use Linq-To-XML to parse it - using regex is a WASTE OF TIME. It's easy - really. All you have to do is man-up and do some frakking research. It's just a few lines of code.

      .45 ACP - because shooting twice is just silly
      -----
      "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
      -----
      "The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

      U B 2 Replies Last reply
      0
      • realJSOPR realJSOP

        Once again, html IS xml. Just use Linq-To-XML to parse it - using regex is a WASTE OF TIME. It's easy - really. All you have to do is man-up and do some frakking research. It's just a few lines of code.

        .45 ACP - because shooting twice is just silly
        -----
        "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
        -----
        "The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

        U Offline
        U Offline
        uglyeyes
        wrote on last edited by
        #6

        please note the descript content doesnt have ID so i cant really use dom. any suggestions as to have to get text inside

        with no id?

        1 Reply Last reply
        0
        • realJSOPR realJSOP

          Once again, html IS xml. Just use Linq-To-XML to parse it - using regex is a WASTE OF TIME. It's easy - really. All you have to do is man-up and do some frakking research. It's just a few lines of code.

          .45 ACP - because shooting twice is just silly
          -----
          "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
          -----
          "The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

          B Offline
          B Offline
          basantakumar
          wrote on last edited by
          #7

          Hi, Please use below Regex Pattern to get your result. .*?>([a-zA-Z0-9].*?)< Which will return the collection of apple red delicious banana riped banana chives fresh green chives Please let me know if you have any doubt.

          U 1 Reply Last reply
          0
          • B basantakumar

            Hi, Please use below Regex Pattern to get your result. .*?>([a-zA-Z0-9].*?)< Which will return the collection of apple red delicious banana riped banana chives fresh green chives Please let me know if you have any doubt.

            U Offline
            U Offline
            uglyeyes
            wrote on last edited by
            #8

            Hi thanks but the text i provided was just an example real text looks something like below and i need a regex for this one below <p>   </p> <p>red delicious.</p> <div class="central"><p>   </p> <FORM action="product.asp" method="post" > <INPUT type="hidden" name="ss" value="3"> <INPUT type="hidden" name="xx" value="xx"> <INPUT type="hidden" name="yy" value="yy"> <input type="image" src="./main/apple.png" value="Click here" /> </form><p>   </p> <p>   </p> <p>riped pear.</p> <div class="central"><p>   </p> <FORM action="product.asp" method="post" > <INPUT type="hidden" name="ss" value="3"> <INPUT type="hidden" name="xx" value="xx"> <INPUT type="hidden" name="yy" value="yy"> <input type="image" src="./main/pear.png" value="Click here" /> </form><p>   </p> this one is not working for me as its getting more text than i need <pre> (?<=\s\s\<p\>&nbsp;&nbsp;&nbsp;\</p\>\n\t\t\s\s\<p\>).*?(?=\\</p\>\n\t\t\s\s\<div class=""central""\> </pre>

            modified on Wednesday, January 27, 2010 6:14 PM

            U A 2 Replies Last reply
            0
            • U uglyeyes

              Hi thanks but the text i provided was just an example real text looks something like below and i need a regex for this one below <p>   </p> <p>red delicious.</p> <div class="central"><p>   </p> <FORM action="product.asp" method="post" > <INPUT type="hidden" name="ss" value="3"> <INPUT type="hidden" name="xx" value="xx"> <INPUT type="hidden" name="yy" value="yy"> <input type="image" src="./main/apple.png" value="Click here" /> </form><p>   </p> <p>   </p> <p>riped pear.</p> <div class="central"><p>   </p> <FORM action="product.asp" method="post" > <INPUT type="hidden" name="ss" value="3"> <INPUT type="hidden" name="xx" value="xx"> <INPUT type="hidden" name="yy" value="yy"> <input type="image" src="./main/pear.png" value="Click here" /> </form><p>   </p> this one is not working for me as its getting more text than i need <pre> (?<=\s\s\<p\>&nbsp;&nbsp;&nbsp;\</p\>\n\t\t\s\s\<p\>).*?(?=\\</p\>\n\t\t\s\s\<div class=""central""\> </pre>

              modified on Wednesday, January 27, 2010 6:14 PM

              U Offline
              U Offline
              uglyeyes
              wrote on last edited by
              #9

              not sure why my below regex fails in visual studio editor (?<=\<p\>   \</p\>\n\t+:b+).*(\n\t+:b+\<div class="central"\>) please note \t+ and :b+ are added because there is exactly 2 tab spaces and 2 white spaces in between the matching text. if I only use <p\>   \</p\>\n\t+:b+ it highlights the preceeding text of the matching text. not sure why by select between group is not working in visual studio. I am running out of ideas please help???

              U 1 Reply Last reply
              0
              • U uglyeyes

                not sure why my below regex fails in visual studio editor (?<=\<p\>   \</p\>\n\t+:b+).*(\n\t+:b+\<div class="central"\>) please note \t+ and :b+ are added because there is exactly 2 tab spaces and 2 white spaces in between the matching text. if I only use <p\>   \</p\>\n\t+:b+ it highlights the preceeding text of the matching text. not sure why by select between group is not working in visual studio. I am running out of ideas please help???

                U Offline
                U Offline
                uglyeyes
                wrote on last edited by
                #10

                I tested using regexbuddy for text "apple" with my regex (?<=a).*?(?=e) returns "ppl" now i want to get text in between using below regex (?<=\<p\>   \</p\>).*?(?=\<div class="central"\>) <p>   </p> <p>red delicious.</p> <div class="central"><p>   </p> <FORM action="x.asp" method="post" > <INPUT type="hidden" name="oradvertiser" value="3"> <INPUT type="hidden" name="xx" value="test"> <INPUT type="hidden" name="xy" value="test"> <input type="image" src="./main/apple.png" value="Click here" onmouseout="this.style.border='5px solid silver';" /> </form><p>   </p> <p>   </p> <p>riped pear.</p> <div class="central"><p>   </p> <FORM action="x.asp" method="post" > <INPUT type="hidden" name="or" value="3"> <INPUT type="hidden" name="xx" value="test"> <INPUT type="hidden" name="xy" value="test"> <input type="image" src="./main/pear.png" value="Click here" onmouseout="this.style.border='5px solid silver';" /> </form><p>   </p> <p>dummy text</p> but its not giving me "red delicious" and "riped pear" could you please help?

                U 1 Reply Last reply
                0
                • U uglyeyes

                  I tested using regexbuddy for text "apple" with my regex (?<=a).*?(?=e) returns "ppl" now i want to get text in between using below regex (?<=\<p\>   \</p\>).*?(?=\<div class="central"\>) <p>   </p> <p>red delicious.</p> <div class="central"><p>   </p> <FORM action="x.asp" method="post" > <INPUT type="hidden" name="oradvertiser" value="3"> <INPUT type="hidden" name="xx" value="test"> <INPUT type="hidden" name="xy" value="test"> <input type="image" src="./main/apple.png" value="Click here" onmouseout="this.style.border='5px solid silver';" /> </form><p>   </p> <p>   </p> <p>riped pear.</p> <div class="central"><p>   </p> <FORM action="x.asp" method="post" > <INPUT type="hidden" name="or" value="3"> <INPUT type="hidden" name="xx" value="test"> <INPUT type="hidden" name="xy" value="test"> <input type="image" src="./main/pear.png" value="Click here" onmouseout="this.style.border='5px solid silver';" /> </form><p>   </p> <p>dummy text</p> but its not giving me "red delicious" and "riped pear" could you please help?

                  U Offline
                  U Offline
                  uglyeyes
                  wrote on last edited by
                  #11

                  this works <p>&nbsp;&nbsp;&nbsp;</p>\s+<p>(?<content>.*?)</p>\s+<div class="central">

                  R 1 Reply Last reply
                  0
                  • U uglyeyes

                    this works <p>&nbsp;&nbsp;&nbsp;</p>\s+<p>(?<content>.*?)</p>\s+<div class="central">

                    R Offline
                    R Offline
                    Ravi Sant
                    wrote on last edited by
                    #12

                    good :thumbsup:

                    ♫ 99 little bugs in the code, 99 bugs in the code We fix a bug, compile it again 101 little bugs in the code ♫

                    1 Reply Last reply
                    0
                    • U uglyeyes

                      Hi thanks but the text i provided was just an example real text looks something like below and i need a regex for this one below <p>   </p> <p>red delicious.</p> <div class="central"><p>   </p> <FORM action="product.asp" method="post" > <INPUT type="hidden" name="ss" value="3"> <INPUT type="hidden" name="xx" value="xx"> <INPUT type="hidden" name="yy" value="yy"> <input type="image" src="./main/apple.png" value="Click here" /> </form><p>   </p> <p>   </p> <p>riped pear.</p> <div class="central"><p>   </p> <FORM action="product.asp" method="post" > <INPUT type="hidden" name="ss" value="3"> <INPUT type="hidden" name="xx" value="xx"> <INPUT type="hidden" name="yy" value="yy"> <input type="image" src="./main/pear.png" value="Click here" /> </form><p>   </p> this one is not working for me as its getting more text than i need <pre> (?<=\s\s\<p\>&nbsp;&nbsp;&nbsp;\</p\>\n\t\t\s\s\<p\>).*?(?=\\</p\>\n\t\t\s\s\<div class=""central""\> </pre>

                      modified on Wednesday, January 27, 2010 6:14 PM

                      A Offline
                      A Offline
                      ahmed_elshiwy
                      wrote on last edited by
                      #13

                      try to use labelname.refrsh() after the line u changed text property of the lable

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups