extracting data from .TXT file

benjamin yap

Hi Guys, I got a txt file which contains html codes in it. <td class="ticker_name"><a href="http://finance.yahoo.com/q;\_ylt=Al7he7j4xkWAM99.PPB7UVFO7sMF;\_ylu=X3oDMTE5cnE2OWVuBHBvcwMzBHNlYwNtYXJrZXRTdW1tYXJ5SW5kaWNlcwRzbGsDbmFzZGFx?s=^IXIC" >Nasdaq</a></td><td>2,147.35</td><td class="ticker_down">-31.65</td><td class="right_cell ticker_down">-1.45% how do i extract the value example 2,147.35 and -31.65 of NASDAQ

Lost User

You could write your own parser or adapt from this article[^].

MVP 2010 - are they mad?

enhzflep

My immediate approach would be to: (a) Scan for and replace all comas with nothing. I.e "," --> "" (b) Scan for the text ixic"> (c) if string not found, then exit loop - jump to (g) (d) Advance the returned pointer by the length of the search string (6 bytes) (e) Do a scanf, asking for a float (f) Return to (b) (g) ... Perhaps a little something like this?

#include <stdlib.h>
#include<stdio.h>
#include<string.h>
#include <string>

using namespace std;

string& str_replace(const string &search, const string &replace, string &subject)
{
string buffer;

int sealeng = search.length();
int strleng = subject.length();

if (sealeng==0)
    return subject;//no change

for(int i=0, j=0; i<strleng; j=0 )
{
    while (i+j<strleng && j<sealeng && subject\[i+j\]==search\[j\])
        j++;
    if (j==sealeng)//found 'search'
    {
        buffer.append(replace);
        i+=sealeng;
    }
    else
    {
        buffer.append( &subject\[i++\], 1);
    }
}
subject = buffer;
return subject;

}

int main()
{
FILE *fp;
char *htmlStr, *tmp, *filename="infile.html";
// char *findMe = "ixic\">";
char *pos1, *pos2, *pos3;
float retrievedNum;
long fileSize;
string findMe = "ixic\">";

fp = fopen(filename, "r+b");
fseek(fp, 0, SEEK\_END);
fileSize = ftell(fp);
fseek(fp, 0, SEEK\_SET);
htmlStr = new char\[fileSize+1\];
htmlStr\[fileSize\] = 0;
fread(htmlStr, sizeof(char), fileSize, fp);
string tmpS = htmlStr;

string find = ",";
string replace = "";

tmpS = str\_replace(find, replace, tmpS);

printf("%s\\n\\n", tmpS.c\_str() );
fclose(fp);
strcpy(htmlStr, tmpS.c\_str() );
pos1 = htmlStr;

while (pos1 = strstr(pos1, findMe.c\_str()))
{
        pos1 += strlen(findMe.c\_str());
        sscanf(pos1, "%f", &retrievedNum);
        printf("Retrieved: %f\\n", retrievedNum);
}

delete htmlStr;

}

Yeah, the code's not winning any beauty pageants. :-O

Moak

benjamin yap wrote:

how do i extract the value example 2,147.35 and -31.65 of NASDAQ

Have a look at regular expressions (RE). There are various libraries for C++, see Boost or CodeProject articles about it. You could scan line by line trough your HTML-input and with a regular expression test/extract the wanted information. Hope this helps, M

Webchat in Europe :java: (only 4K)