regex help [modified]
-
Hi, I need to strip out text from a big file and my condition is: I need to strip out text from below <pre> <div class="a">apple</div> <p> </p> <p>red delicious</p> <div class="b">banana</div> <p> </p> <p>riped banana</p> <div class="c">chives</div> <p> </p> <p>fresh green chives</p> </pre> to below 'apple', 'red delicious' 'banana', 'riped banana' 'chives', 'fresh green chives' so that i can enter each of them to database. I would really appreciate if you could please provide me a regex that could do this. thanks for your help!!!
modified on Tuesday, January 26, 2010 11:43 PM
-
Hi, I need to strip out text from a big file and my condition is: I need to strip out text from below <pre> <div class="a">apple</div> <p> </p> <p>red delicious</p> <div class="b">banana</div> <p> </p> <p>riped banana</p> <div class="c">chives</div> <p> </p> <p>fresh green chives</p> </pre> to below 'apple', 'red delicious' 'banana', 'riped banana' 'chives', 'fresh green chives' so that i can enter each of them to database. I would really appreciate if you could please provide me a regex that could do this. thanks for your help!!!
modified on Tuesday, January 26, 2010 11:43 PM
you can edit (i.e. modify) an earlier message if you decide you need to fix an error, improve formatting, or add information. Now please go and delete your other messages before someone attempts to read and answer them. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read code that is properly formatted, adding PRE tags is the easiest way to obtain that.
[The QA section does it automatically now, I hope we soon get it on regular forums as well]
-
Hi, I need to strip out text from a big file and my condition is: I need to strip out text from below <pre> <div class="a">apple</div> <p> </p> <p>red delicious</p> <div class="b">banana</div> <p> </p> <p>riped banana</p> <div class="c">chives</div> <p> </p> <p>fresh green chives</p> </pre> to below 'apple', 'red delicious' 'banana', 'riped banana' 'chives', 'fresh green chives' so that i can enter each of them to database. I would really appreciate if you could please provide me a regex that could do this. thanks for your help!!!
modified on Tuesday, January 26, 2010 11:43 PM
At the very least, you could use Linq-to-XML to do this. Don't use Regex to parse HTML. The class you're going to want to look at is
XElement
..45 ACP - because shooting twice is just silly
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001 -
At the very least, you could use Linq-to-XML to do this. Don't use Regex to parse HTML. The class you're going to want to look at is
XElement
..45 ACP - because shooting twice is just silly
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001Hello, this is one off process and the html is stored in a csv file. I crawl a site, store each web page in csv. the url is something like www.mysite.com.au/product/product.asp?id={0} now i am storing all html for each product page in one csv. now i want to delete all the text except from the one that i wanted. could you please help how can i acheive this with regex?
-
Hello, this is one off process and the html is stored in a csv file. I crawl a site, store each web page in csv. the url is something like www.mysite.com.au/product/product.asp?id={0} now i am storing all html for each product page in one csv. now i want to delete all the text except from the one that i wanted. could you please help how can i acheive this with regex?
-
At the very least, you could use Linq-to-XML to do this. Don't use Regex to parse HTML. The class you're going to want to look at is
XElement
..45 ACP - because shooting twice is just silly
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001 -
but its not a valid html or xml file. its a csv file and there is lots of work to get the html or xml validation to work. any other idea or suggestions?
It doesn't have to be a xml/html file. It just needs to be a properly formatted XML string. Trust me - regex is not the answer.
.45 ACP - because shooting twice is just silly
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001 -
It doesn't have to be a xml/html file. It just needs to be a properly formatted XML string. Trust me - regex is not the answer.
.45 ACP - because shooting twice is just silly
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001