Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Parasing website data : intermitten rubbish characters retrived

Parasing website data : intermitten rubbish characters retrived

Scheduled Pinned Locked Moved C / C++ / MFC
helpc++databasecomtutorial
5 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    awah
    wrote on last edited by
    #1

    Parasing website data : intermitten rubbish characters retrived hi, i am making a program to parse data from website, to do that i need to download the file Step1: download file CString Data; //CString Buffer; DeleteUrlCacheEntry(url);// delete the old stupid cache HINTERNET IntOpen = ::InternetOpen("Sample", LOCAL_INTERNET_ACCESS, NULL, 0, 0); HINTERNET handle = ::InternetOpenUrl(IntOpen, url, NULL, NULL, NULL, NULL); HANDLE hFile = ::CreateFile("c:\\index.txt", GENERIC_WRITE, NULL, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); char Buffer[1024]; DWORD dwRead =0; while(::InternetReadFile(handle, Buffer, sizeof(Buffer), &dwRead) == TRUE) { if ( dwRead == 0) break; DWORD dwWrite = 0; ::WriteFile(hFile, Buffer, dwRead, &dwWrite, NULL); Data+=Buffer; } ::CloseHandle(hFile); ::InternetCloseHandle(handle); the Cstring "Data" contains the website in a plain text step2 : parse the data using brackets because a lot of data in within <> brackets, this can be used to reference the desired data // this function look for the text and removes "bracket_distance" number of <>, then return the result // eg. "dsfsd<><><><>6.35<>", item = dsfsd, bracket_distance = 4 CString Mydialog::Parse_Backets(CString file_string, CString item, int bracket_distance) { file_string.ReleaseBuffer(); int start_index; int end_index; start_index = file_string.Find(item); if(start_index == -1) { CString error_string = "Error"; error_flag = 1; return error_string; } for(int i =0; i ",start_index)+1; } end_index = file_string.Find("<",start_index) - 1; file_string=file_string.Mid(start_index, end_index-start_index+1 ); return file_string; } now the problem is once in a while i get rubbish characters. Like the actual value when i browse to the website, should be 0.55 , i get 0.aj5m5, or even 0.1595 the website is http://stquote.sgx.com/live/st/STStock.asp?stk=G does anyone knows how to solve this problem? using: - mfc - VC6.0

    R 1 Reply Last reply
    0
    • A awah

      Parasing website data : intermitten rubbish characters retrived hi, i am making a program to parse data from website, to do that i need to download the file Step1: download file CString Data; //CString Buffer; DeleteUrlCacheEntry(url);// delete the old stupid cache HINTERNET IntOpen = ::InternetOpen("Sample", LOCAL_INTERNET_ACCESS, NULL, 0, 0); HINTERNET handle = ::InternetOpenUrl(IntOpen, url, NULL, NULL, NULL, NULL); HANDLE hFile = ::CreateFile("c:\\index.txt", GENERIC_WRITE, NULL, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); char Buffer[1024]; DWORD dwRead =0; while(::InternetReadFile(handle, Buffer, sizeof(Buffer), &dwRead) == TRUE) { if ( dwRead == 0) break; DWORD dwWrite = 0; ::WriteFile(hFile, Buffer, dwRead, &dwWrite, NULL); Data+=Buffer; } ::CloseHandle(hFile); ::InternetCloseHandle(handle); the Cstring "Data" contains the website in a plain text step2 : parse the data using brackets because a lot of data in within <> brackets, this can be used to reference the desired data // this function look for the text and removes "bracket_distance" number of <>, then return the result // eg. "dsfsd<><><><>6.35<>", item = dsfsd, bracket_distance = 4 CString Mydialog::Parse_Backets(CString file_string, CString item, int bracket_distance) { file_string.ReleaseBuffer(); int start_index; int end_index; start_index = file_string.Find(item); if(start_index == -1) { CString error_string = "Error"; error_flag = 1; return error_string; } for(int i =0; i ",start_index)+1; } end_index = file_string.Find("<",start_index) - 1; file_string=file_string.Mid(start_index, end_index-start_index+1 ); return file_string; } now the problem is once in a while i get rubbish characters. Like the actual value when i browse to the website, should be 0.55 , i get 0.aj5m5, or even 0.1595 the website is http://stquote.sgx.com/live/st/STStock.asp?stk=G does anyone knows how to solve this problem? using: - mfc - VC6.0

      R Offline
      R Offline
      Roger Broomfield
      wrote on last edited by
      #2

      There is no indication in the documentation for InternetReadFile that the buffer will be null terminated on return, therefore your line Data+=Buffer; will inevitably add extra rubbish characters to Data.

      A 1 Reply Last reply
      0
      • R Roger Broomfield

        There is no indication in the documentation for InternetReadFile that the buffer will be null terminated on return, therefore your line Data+=Buffer; will inevitably add extra rubbish characters to Data.

        A Offline
        A Offline
        awah
        wrote on last edited by
        #3

        yes, how do i solve this problem? it doesnt put any null character

        D 1 Reply Last reply
        0
        • A awah

          yes, how do i solve this problem? it doesnt put any null character

          D Offline
          D Offline
          David Crow
          wrote on last edited by
          #4

          awah wrote:

          how do i solve this problem?

          By ensuring that Buffer is terminated. There are two ways of doing this. Either call ZeroMemory() or memset() before calling WriteFile(), or terminate Buffer at the point equal to the return value of WriteFile().


          "A good athlete is the result of a good and worthy opponent." - David Crow

          "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

          R 1 Reply Last reply
          0
          • D David Crow

            awah wrote:

            how do i solve this problem?

            By ensuring that Buffer is terminated. There are two ways of doing this. Either call ZeroMemory() or memset() before calling WriteFile(), or terminate Buffer at the point equal to the return value of WriteFile().


            "A good athlete is the result of a good and worthy opponent." - David Crow

            "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

            R Offline
            R Offline
            Roger Broomfield
            wrote on last edited by
            #5

            The first sentence is correct. The second two incorrect for the code in the original question. The problem occurs because of the locally declared char Buffer[1024]; and the InternetReadFile(handle, Buffer, sizeof(Buffer), &dwRead). The given Buffer in fact cant be terminated without potential loss of characters read because it has been potentially completely filled by the InternetReadFile. To correctly terminate the Buffer for later appending to a CString would require a Buffer[dwRead] = '\0'; where for most iterations of the loop dwRead = 1024, which would either invalidate one of the other local variables or corrupt the stack in some other way. The correct way would be InternetReadFile(handle, Buffer, sizeof(Buffer)-1, &dwRead) and then a Buffer[dwRead] = '\0'; inside the while loop.

            1 Reply Last reply
            0
            Reply
            • Reply as topic
            Log in to reply
            • Oldest to Newest
            • Newest to Oldest
            • Most Votes


            • Login

            • Don't have an account? Register

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • World
            • Users
            • Groups