How to parse an HTML-like file??
-
hello, I need to parse a text file and take some tag as or , I need it to be written in C, do you think I better use an XML parser ( if yes, which one should I use?, I premitt that I've not got much experience with XML) or should I better so it sequentially by doing
if( strcmp(line,"<cover..............)???
thanks in advance Paolo -
hello, I need to parse a text file and take some tag as or , I need it to be written in C, do you think I better use an XML parser ( if yes, which one should I use?, I premitt that I've not got much experience with XML) or should I better so it sequentially by doing
if( strcmp(line,"<cover..............)???
thanks in advance PaoloEasiest will be to process the input char-by-char. Use a state machine, switching states when you reach a '<' or a '>', adding the finished tags or between-tag-content to arrays as appropriate. As a matter of a fact, you should have state changes for spaces and '='s inside the tags as well.
-
hello, I need to parse a text file and take some tag as or , I need it to be written in C, do you think I better use an XML parser ( if yes, which one should I use?, I premitt that I've not got much experience with XML) or should I better so it sequentially by doing
if( strcmp(line,"<cover..............)???
thanks in advance Paolo -
hello, I need to parse a text file and take some tag as or , I need it to be written in C, do you think I better use an XML parser ( if yes, which one should I use?, I premitt that I've not got much experience with XML) or should I better so it sequentially by doing
if( strcmp(line,"<cover..............)???
thanks in advance PaoloA SAX-parser would probably be the best solution for you. They're fast and they don't require lots of memory. Parsing XML using SAX-parsers is kind of like recursive descent parsing if I'm not mistaken. When the parser finds an element
<img
, it'll call your callback, notifying you of that. Then when it findshref="img"
, it'll call your callback notifying you of that. So, basically you need two (possibly three) callbacks. One for notifying you that you've<img
has begun, one that tells youhref="img"
was found, and possibly one that says</img>
. Here's one such parser: libxml2[^]. It's licensed under the MIT License, so there's no problem using it in a closed source/commercial application. -- Ich bin Joachim von Hassel, und ich bin Pilot der Bundeswehr. Welle: Erdball - F104-G Starfighter