Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. How to speed up copy of .txt files into arrays?

How to speed up copy of .txt files into arrays?

Scheduled Pinned Locked Moved C / C++ / MFC
tutorialc++cssperformancehelp
16 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Arris74

    Thnks. CFile::Seek() doesn't help because actually I do not know at which line to start and finish. I just know a starting and ending date and time. So I need to read the String and to compare it with a variable. Do you have a code sample on how to use CStdioFile with CMemFile together? Which functions are used for parsing? I just saw CStdioFile:: ReadString() which cannot be used with CMemFile.

    D Offline
    D Offline
    David Crow
    wrote on last edited by
    #7

    Arris7 wrote:

    CFile::Seek() doesn't help because actually I do not know at which line to start and finish. I just know a starting and ending date and time. So I need to read the String and to compare it with a variable.

    Fair enough, but you can still do it via a simple calculation, rather than using strtok() to find each line. strtok() is slowing you down as it has to examine each character to find the one you want. Since each line of the file is 18-19 characters in length, just compare the first 13 of those. Something like:

    void main( void )
    {
    char *szBuffer = "20000103\t1658\t351\n"
    "20000103\t1659\t352\n"
    "20000103\t1700\t350\n"
    "20000103\t1701\t352\n"
    "20000103\t1702\t355\n"
    "20000104\t0900\t354\n"
    "20000104\t0901\t352\n"
    "20000104\t0902\t350\n";
    char *p = szBuffer;

    while (p != NULL && \*p != '\\0')
    {
        if (strncmp(p, "20000104\\t0900", 13) == 0)
        {
            printf("Found it!\\n");
            break;;
        }
    
        // advance to the next 'line'
        p += 18;
    }   
    

    }


    "Approved Workmen Are Not Ashamed" - 2 Timothy 2:15

    "Judge not by the eye but by the heart." - Native American Proverb

    A PJ ArendsP 2 Replies Last reply
    0
    • L led mike

      Arris7 wrote:

      actually I do not know at which line to start and finish. I just know a starting and ending date and time. So I need to read the String and to compare it with a variable.

      That contradicts your first post:

      Arris7 wrote:

      For example a just need to start at line 250000 and stop at line 3000000

      So which is it?

      led mike

      A Offline
      A Offline
      Arris74
      wrote on last edited by
      #8

      Sorry I was not clear in my first post. Actually I just know the date and the time where to start. for example the starting date and time can be located at the line 250000 and I have to find the line.

      M 1 Reply Last reply
      0
      • D David Crow

        Arris7 wrote:

        CFile::Seek() doesn't help because actually I do not know at which line to start and finish. I just know a starting and ending date and time. So I need to read the String and to compare it with a variable.

        Fair enough, but you can still do it via a simple calculation, rather than using strtok() to find each line. strtok() is slowing you down as it has to examine each character to find the one you want. Since each line of the file is 18-19 characters in length, just compare the first 13 of those. Something like:

        void main( void )
        {
        char *szBuffer = "20000103\t1658\t351\n"
        "20000103\t1659\t352\n"
        "20000103\t1700\t350\n"
        "20000103\t1701\t352\n"
        "20000103\t1702\t355\n"
        "20000104\t0900\t354\n"
        "20000104\t0901\t352\n"
        "20000104\t0902\t350\n";
        char *p = szBuffer;

        while (p != NULL && \*p != '\\0')
        {
            if (strncmp(p, "20000104\\t0900", 13) == 0)
            {
                printf("Found it!\\n");
                break;;
            }
        
            // advance to the next 'line'
            p += 18;
        }   
        

        }


        "Approved Workmen Are Not Ashamed" - 2 Timothy 2:15

        "Judge not by the eye but by the heart." - Native American Proverb

        A Offline
        A Offline
        Arris74
        wrote on last edited by
        #9

        many thanks It sounds great. I gonna try it and let you know the results.

        1 Reply Last reply
        0
        • D David Crow

          Arris7 wrote:

          CFile::Seek() doesn't help because actually I do not know at which line to start and finish. I just know a starting and ending date and time. So I need to read the String and to compare it with a variable.

          Fair enough, but you can still do it via a simple calculation, rather than using strtok() to find each line. strtok() is slowing you down as it has to examine each character to find the one you want. Since each line of the file is 18-19 characters in length, just compare the first 13 of those. Something like:

          void main( void )
          {
          char *szBuffer = "20000103\t1658\t351\n"
          "20000103\t1659\t352\n"
          "20000103\t1700\t350\n"
          "20000103\t1701\t352\n"
          "20000103\t1702\t355\n"
          "20000104\t0900\t354\n"
          "20000104\t0901\t352\n"
          "20000104\t0902\t350\n";
          char *p = szBuffer;

          while (p != NULL && \*p != '\\0')
          {
              if (strncmp(p, "20000104\\t0900", 13) == 0)
              {
                  printf("Found it!\\n");
                  break;;
              }
          
              // advance to the next 'line'
              p += 18;
          }   
          

          }


          "Approved Workmen Are Not Ashamed" - 2 Timothy 2:15

          "Judge not by the eye but by the heart." - Native American Proverb

          PJ ArendsP Offline
          PJ ArendsP Offline
          PJ Arends
          wrote on last edited by
          #10

          For very large files (millions of lines) this will still take a while. Seeing how the file is sorted by date and time I would use your original idea of using CFile::Seek to do a binary search of the file. A binary search will be slower if the required data is right at the start of the file, but a heck of a lot faster if the data is anywhere else.


          You may be right
          I may be crazy
          -- Billy Joel --

          Within you lies the power for good, use it!!!

          Within you lies the power for good; Use it!

          D A 2 Replies Last reply
          0
          • PJ ArendsP PJ Arends

            For very large files (millions of lines) this will still take a while. Seeing how the file is sorted by date and time I would use your original idea of using CFile::Seek to do a binary search of the file. A binary search will be slower if the required data is right at the start of the file, but a heck of a lot faster if the data is anywhere else.


            You may be right
            I may be crazy
            -- Billy Joel --

            Within you lies the power for good, use it!!!

            D Offline
            D Offline
            David Crow
            wrote on last edited by
            #11

            PJ Arends wrote:

            Seeing how the file is sorted by date and time...

            Was that a guarantee? If so, then a binary search via Seek() is the way to go.


            "Approved Workmen Are Not Ashamed" - 2 Timothy 2:15

            "Judge not by the eye but by the heart." - Native American Proverb

            L A 2 Replies Last reply
            0
            • D David Crow

              PJ Arends wrote:

              Seeing how the file is sorted by date and time...

              Was that a guarantee? If so, then a binary search via Seek() is the way to go.


              "Approved Workmen Are Not Ashamed" - 2 Timothy 2:15

              "Judge not by the eye but by the heart." - Native American Proverb

              L Offline
              L Offline
              led mike
              wrote on last edited by
              #12

              If it is not sorted a sorted solution would likely be optimal for 4 million records. Either sorting the orginal file or creating and index file or memory based index, perhaps a Database should be considered.

              led mike

              1 Reply Last reply
              0
              • D David Crow

                PJ Arends wrote:

                Seeing how the file is sorted by date and time...

                Was that a guarantee? If so, then a binary search via Seek() is the way to go.


                "Approved Workmen Are Not Ashamed" - 2 Timothy 2:15

                "Judge not by the eye but by the heart." - Native American Proverb

                A Offline
                A Offline
                Arris74
                wrote on last edited by
                #13

                Many thanks for your solutions. I think that your idea of using strncmp(p, "20000104\t0900", 13) == 0) in a binary search is a good answer. I'll try it right now. thnks again

                1 Reply Last reply
                0
                • PJ ArendsP PJ Arends

                  For very large files (millions of lines) this will still take a while. Seeing how the file is sorted by date and time I would use your original idea of using CFile::Seek to do a binary search of the file. A binary search will be slower if the required data is right at the start of the file, but a heck of a lot faster if the data is anywhere else.


                  You may be right
                  I may be crazy
                  -- Billy Joel --

                  Within you lies the power for good, use it!!!

                  A Offline
                  A Offline
                  Arris74
                  wrote on last edited by
                  #14

                  Thnks for this solution. yes the file is sorted by date and time so Binary search is a good answer to my question. thnks again

                  1 Reply Last reply
                  0
                  • A Arris74

                    Sorry I was not clear in my first post. Actually I just know the date and the time where to start. for example the starting date and time can be located at the line 250000 and I have to find the line.

                    M Offline
                    M Offline
                    malaugh
                    wrote on last edited by
                    #15

                    Are the dates in order? If the file starts with the earliest date and finishes with the latest date, then use the following method: 1) Fseek to the middle of the file 2) If the data is larger fseek to one quter of the way through the file, if smaller, fseek to 3 quarters of the way though the file. 3) Repeat. Like guessing a number, if you ask someone select a number between 0 and 15, the quickest way to find the number is to ask, Is it less than 8 If yes then ask is it less than 4 If no than ask is it less than 6 I'm sure you get the idea. Its a common technique.

                    A 1 Reply Last reply
                    0
                    • M malaugh

                      Are the dates in order? If the file starts with the earliest date and finishes with the latest date, then use the following method: 1) Fseek to the middle of the file 2) If the data is larger fseek to one quter of the way through the file, if smaller, fseek to 3 quarters of the way though the file. 3) Repeat. Like guessing a number, if you ask someone select a number between 0 and 15, the quickest way to find the number is to ask, Is it less than 8 If yes then ask is it less than 4 If no than ask is it less than 6 I'm sure you get the idea. Its a common technique.

                      A Offline
                      A Offline
                      Arris74
                      wrote on last edited by
                      #16

                      thanks, Yes dates are in order. the method you suggest is a binary search. I think it is the best solution for my problem.

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups