Parasing website data : intermitten rubbish characters retrived
-
Parasing website data : intermitten rubbish characters retrived hi, i am making a program to parse data from website, to do that i need to download the file Step1: download file
CString Data; //CString Buffer; DeleteUrlCacheEntry(url);// delete the old stupid cache HINTERNET IntOpen = ::InternetOpen("Sample", LOCAL_INTERNET_ACCESS, NULL, 0, 0); HINTERNET handle = ::InternetOpenUrl(IntOpen, url, NULL, NULL, NULL, NULL); HANDLE hFile = ::CreateFile("c:\\index.txt", GENERIC_WRITE, NULL, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); char Buffer[1024]; DWORD dwRead =0; while(::InternetReadFile(handle, Buffer, sizeof(Buffer), &dwRead) == TRUE) { if ( dwRead == 0) break; DWORD dwWrite = 0; ::WriteFile(hFile, Buffer, dwRead, &dwWrite, NULL); Data+=Buffer; } ::CloseHandle(hFile); ::InternetCloseHandle(handle);
the Cstring "Data" contains the website in a plain text step2 : parse the data using brackets because a lot of data in within <> brackets, this can be used to reference the desired data// this function look for the text and removes "bracket_distance" number of <>, then return the result // eg. "dsfsd<><><><>6.35<>", item = dsfsd, bracket_distance = 4 CString Mydialog::Parse_Backets(CString file_string, CString item, int bracket_distance) { file_string.ReleaseBuffer(); int start_index; int end_index; start_index = file_string.Find(item); if(start_index == -1) { CString error_string = "Error"; error_flag = 1; return error_string; } for(int i =0; i ",start_index)+1; } end_index = file_string.Find("<",start_index) - 1; file_string=file_string.Mid(start_index, end_index-start_index+1 ); return file_string; }
now the problem is once in a while i get rubbish characters. Like the actual value when i browse to the website, should be 0.55 , i get 0.aj5m5, or even 0.1595 the website is http://stquote.sgx.com/live/st/STStock.asp?stk=G does anyone knows how to solve this problem? using: - mfc - VC6.0 -
Parasing website data : intermitten rubbish characters retrived hi, i am making a program to parse data from website, to do that i need to download the file Step1: download file
CString Data; //CString Buffer; DeleteUrlCacheEntry(url);// delete the old stupid cache HINTERNET IntOpen = ::InternetOpen("Sample", LOCAL_INTERNET_ACCESS, NULL, 0, 0); HINTERNET handle = ::InternetOpenUrl(IntOpen, url, NULL, NULL, NULL, NULL); HANDLE hFile = ::CreateFile("c:\\index.txt", GENERIC_WRITE, NULL, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); char Buffer[1024]; DWORD dwRead =0; while(::InternetReadFile(handle, Buffer, sizeof(Buffer), &dwRead) == TRUE) { if ( dwRead == 0) break; DWORD dwWrite = 0; ::WriteFile(hFile, Buffer, dwRead, &dwWrite, NULL); Data+=Buffer; } ::CloseHandle(hFile); ::InternetCloseHandle(handle);
the Cstring "Data" contains the website in a plain text step2 : parse the data using brackets because a lot of data in within <> brackets, this can be used to reference the desired data// this function look for the text and removes "bracket_distance" number of <>, then return the result // eg. "dsfsd<><><><>6.35<>", item = dsfsd, bracket_distance = 4 CString Mydialog::Parse_Backets(CString file_string, CString item, int bracket_distance) { file_string.ReleaseBuffer(); int start_index; int end_index; start_index = file_string.Find(item); if(start_index == -1) { CString error_string = "Error"; error_flag = 1; return error_string; } for(int i =0; i ",start_index)+1; } end_index = file_string.Find("<",start_index) - 1; file_string=file_string.Mid(start_index, end_index-start_index+1 ); return file_string; }
now the problem is once in a while i get rubbish characters. Like the actual value when i browse to the website, should be 0.55 , i get 0.aj5m5, or even 0.1595 the website is http://stquote.sgx.com/live/st/STStock.asp?stk=G does anyone knows how to solve this problem? using: - mfc - VC6.0There is no indication in the documentation for InternetReadFile that the buffer will be null terminated on return, therefore your line Data+=Buffer; will inevitably add extra rubbish characters to Data.
-
There is no indication in the documentation for InternetReadFile that the buffer will be null terminated on return, therefore your line Data+=Buffer; will inevitably add extra rubbish characters to Data.
-
awah wrote:
how do i solve this problem?
By ensuring that
Buffer
is terminated. There are two ways of doing this. Either callZeroMemory()
ormemset()
before callingWriteFile()
, or terminateBuffer
at the point equal to the return value ofWriteFile()
.
"A good athlete is the result of a good and worthy opponent." - David Crow
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
-
awah wrote:
how do i solve this problem?
By ensuring that
Buffer
is terminated. There are two ways of doing this. Either callZeroMemory()
ormemset()
before callingWriteFile()
, or terminateBuffer
at the point equal to the return value ofWriteFile()
.
"A good athlete is the result of a good and worthy opponent." - David Crow
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
The first sentence is correct. The second two incorrect for the code in the original question. The problem occurs because of the locally declared char Buffer[1024]; and the InternetReadFile(handle, Buffer, sizeof(Buffer), &dwRead). The given Buffer in fact cant be terminated without potential loss of characters read because it has been potentially completely filled by the InternetReadFile. To correctly terminate the Buffer for later appending to a CString would require a Buffer[dwRead] = '\0'; where for most iterations of the loop dwRead = 1024, which would either invalidate one of the other local variables or corrupt the stack in some other way. The correct way would be InternetReadFile(handle, Buffer, sizeof(Buffer)-1, &dwRead) and then a Buffer[dwRead] = '\0'; inside the while loop.