Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. How can i make this program Parrallel ?

How can i make this program Parrallel ?

Scheduled Pinned Locked Moved C#
questiondatabaseregexannouncement
8 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • U Offline
    U Offline
    User 10344714
    wrote on last edited by
    #1

    foreach (var fname in fnames)
    {
    // Open file, read in file contents,
    // process each line one at a time
    // split into words, add word to index if not already present
    // increment occurrence count for this file and record line number.
    string[] lines = File.ReadAllLines(fname);

    for (int i = 0; i < lines.Length; i++)
    {
        string\[\] wordsInLineRaw = Regex.Split(lines\[i\], @"\\W+");
    
        string\[\] wordsInLine = new string\[wordsInLineRaw.Length\];
    
        for (int p = 0; p < wordsInLineRaw.Length; p++)
        {
            wordsInLine\[p\] = wordsInLineRaw\[p\].ToLower();
        }
    
        // Process each word found in line
        for (int j = 0; j < wordsInLine.Length; j++)
        {
            string w = wordsInLine\[j\];
            // If word in stop list break
            if (stopWords.Contains(w)) continue;
    
            //if word not in index add it
            if (!index.ContainsKey(w))
            {
    
                if (worker.CancellationPending)
                {
    
                }
                    //e.Cancel();
    
                    index.Add(w, new IndexEntry());
    
    
                }
    
                // Update info about word
                index\[w\].UpdateInfo(fname, i);
            }
        }
    
    P 1 Reply Last reply
    0
    • U User 10344714

      foreach (var fname in fnames)
      {
      // Open file, read in file contents,
      // process each line one at a time
      // split into words, add word to index if not already present
      // increment occurrence count for this file and record line number.
      string[] lines = File.ReadAllLines(fname);

      for (int i = 0; i < lines.Length; i++)
      {
          string\[\] wordsInLineRaw = Regex.Split(lines\[i\], @"\\W+");
      
          string\[\] wordsInLine = new string\[wordsInLineRaw.Length\];
      
          for (int p = 0; p < wordsInLineRaw.Length; p++)
          {
              wordsInLine\[p\] = wordsInLineRaw\[p\].ToLower();
          }
      
          // Process each word found in line
          for (int j = 0; j < wordsInLine.Length; j++)
          {
              string w = wordsInLine\[j\];
              // If word in stop list break
              if (stopWords.Contains(w)) continue;
      
              //if word not in index add it
              if (!index.ContainsKey(w))
              {
      
                  if (worker.CancellationPending)
                  {
      
                  }
                      //e.Cancel();
      
                      index.Add(w, new IndexEntry());
      
      
                  }
      
                  // Update info about word
                  index\[w\].UpdateInfo(fname, i);
              }
          }
      
      P Offline
      P Offline
      Phil Martin
      wrote on last edited by
      #2

      There are a number of ways you can approach this. The simplest is to execute the body of your outer loop for multiple files at the same time. You can use the Task Parallel Library that is part of .Net for this. The 'Task' classes let you run code in parallel and wait for results at the end. If you choose to do that, the problem is now "How do I merge results of concurrent indexes into one big index" Some possible solutions are:

      • Use a lock to protect access to the index
      • Create separate indexes and merge them at the end.

      I would go with the second because it uses less shared state, and has less chance of me introducing a difficult to find bug

      var indexFilesTasks = (from fname in fnames
      select Task.Run(() => IndexSingleFile(fname, stopWords, () => worker.CancellationPending))).ToArray();

      Task.WaitAll(indexFilesTasks);

      index = MergeIndexes(from t in indexFilesTasks
      select t.Result);

      The IndexSingleFile() method contains all your code in your foreach (var fname in fnames) loop. The MergeIndexes() method takes multiple Dictionary() structures, and merges them together. Below is the code I used for these. Note this isn't the "best" way, but just one way of approaching the problem.

      Dictionary<string,IndexEntry> IndexSingleFile(string filename, HashSet<string> stopWords, Func<bool> isCancelling) {
      var index = new Dictionary<string,IndexEntry>();

      // Open file, read in file contents,
      // process each line one at a time
      // split into words, add word to index if not already present
      // increment occurrence count for this file and record line number.
      string\[\] lines = File.ReadAllLines(filename);
      
      for (int i = 0; i < lines.Length; i++)
      {
          string\[\] wordsInLineRaw = Regex.Split(lines\[i\], @"\\W+");
      
          string\[\] wordsInLine = new string\[wordsInLineRaw.Length\];
      
          for (int p = 0; p < wordsInLineRaw.Length; p++)
          {
              wordsInLine\[p\] = wordsInLineRaw\[p\].ToLower();
          }
      
          // Process each word found in line
          for (int j = 0; j < wordsInLine.Length; j++)
          {
              string w = wordsInLine\[j\];
              // If word in stop list break
              if (stopWords.Contains(w)) continue;
      
              //if word not in index add it
              if (!index.ContainsKey(w))
              {
      
                  i
      
      B 1 Reply Last reply
      0
      • P Phil Martin

        There are a number of ways you can approach this. The simplest is to execute the body of your outer loop for multiple files at the same time. You can use the Task Parallel Library that is part of .Net for this. The 'Task' classes let you run code in parallel and wait for results at the end. If you choose to do that, the problem is now "How do I merge results of concurrent indexes into one big index" Some possible solutions are:

        • Use a lock to protect access to the index
        • Create separate indexes and merge them at the end.

        I would go with the second because it uses less shared state, and has less chance of me introducing a difficult to find bug

        var indexFilesTasks = (from fname in fnames
        select Task.Run(() => IndexSingleFile(fname, stopWords, () => worker.CancellationPending))).ToArray();

        Task.WaitAll(indexFilesTasks);

        index = MergeIndexes(from t in indexFilesTasks
        select t.Result);

        The IndexSingleFile() method contains all your code in your foreach (var fname in fnames) loop. The MergeIndexes() method takes multiple Dictionary() structures, and merges them together. Below is the code I used for these. Note this isn't the "best" way, but just one way of approaching the problem.

        Dictionary<string,IndexEntry> IndexSingleFile(string filename, HashSet<string> stopWords, Func<bool> isCancelling) {
        var index = new Dictionary<string,IndexEntry>();

        // Open file, read in file contents,
        // process each line one at a time
        // split into words, add word to index if not already present
        // increment occurrence count for this file and record line number.
        string\[\] lines = File.ReadAllLines(filename);
        
        for (int i = 0; i < lines.Length; i++)
        {
            string\[\] wordsInLineRaw = Regex.Split(lines\[i\], @"\\W+");
        
            string\[\] wordsInLine = new string\[wordsInLineRaw.Length\];
        
            for (int p = 0; p < wordsInLineRaw.Length; p++)
            {
                wordsInLine\[p\] = wordsInLineRaw\[p\].ToLower();
            }
        
            // Process each word found in line
            for (int j = 0; j < wordsInLine.Length; j++)
            {
                string w = wordsInLine\[j\];
                // If word in stop list break
                if (stopWords.Contains(w)) continue;
        
                //if word not in index add it
                if (!index.ContainsKey(w))
                {
        
                    i
        
        B Offline
        B Offline
        Bernhard Hiller
        wrote on last edited by
        #3

        Phil Martin wrote:

        The simplest is to read in multiple files at the same time

        Wow, where do you get such multi-tasking harddisks from? :-D No, seriously: that step is likely to be the slowest of all, hence hardly anything can be gained from parallelizing. Only when processing the file contents takes a relatively high timespan, parallelizing might help, but then I'd uncouple reading and processing (i.e. read the file into memory and then send it to the processing function, with the next file read being started at that moment - note: may require large memory). The locks on the index may cause some extra overhead, i.e. slow down again.

        R P 2 Replies Last reply
        0
        • B Bernhard Hiller

          Phil Martin wrote:

          The simplest is to read in multiple files at the same time

          Wow, where do you get such multi-tasking harddisks from? :-D No, seriously: that step is likely to be the slowest of all, hence hardly anything can be gained from parallelizing. Only when processing the file contents takes a relatively high timespan, parallelizing might help, but then I'd uncouple reading and processing (i.e. read the file into memory and then send it to the processing function, with the next file read being started at that moment - note: may require large memory). The locks on the index may cause some extra overhead, i.e. slow down again.

          R Offline
          R Offline
          Rob Philpott
          wrote on last edited by
          #4

          I'm not sure Phil meant to read the files in parallel then continue processing them on a single thread. By parallelizing at the top level, the thread reading the file goes on to processing it so I'd say his assertion was correct but probably better stated as 'read in and process multiple files at the same time'. It's an odd thing to do anyway. If I were to try and improve the performance of this code the first thing I'd do is get rid of the ReadAllLines and switch to streams before even thinking of going parallel.

          Regards, Rob Philpott.

          P 1 Reply Last reply
          0
          • R Rob Philpott

            I'm not sure Phil meant to read the files in parallel then continue processing them on a single thread. By parallelizing at the top level, the thread reading the file goes on to processing it so I'd say his assertion was correct but probably better stated as 'read in and process multiple files at the same time'. It's an odd thing to do anyway. If I were to try and improve the performance of this code the first thing I'd do is get rid of the ReadAllLines and switch to streams before even thinking of going parallel.

            Regards, Rob Philpott.

            P Offline
            P Offline
            Phil Martin
            wrote on last edited by
            #5

            Thanks for clarifying that for me Rob. And I agree with you, there are many other things I'd do first and differently. I was aiming for minimum effort on the developer's part with minimum side effects. I'll edit the post and try to be a little less ambiguous about my intent.

            1 Reply Last reply
            0
            • B Bernhard Hiller

              Phil Martin wrote:

              The simplest is to read in multiple files at the same time

              Wow, where do you get such multi-tasking harddisks from? :-D No, seriously: that step is likely to be the slowest of all, hence hardly anything can be gained from parallelizing. Only when processing the file contents takes a relatively high timespan, parallelizing might help, but then I'd uncouple reading and processing (i.e. read the file into memory and then send it to the processing function, with the next file read being started at that moment - note: may require large memory). The locks on the index may cause some extra overhead, i.e. slow down again.

              P Offline
              P Offline
              Phil Martin
              wrote on last edited by
              #6

              Ha! I wish I had multitasking hard disks! My choice of wording was poor, and I hope the code clarified the intent, i.e. choosing the simplest point in the original code to parallelize. Will the changes be optimal? No. Will they be parallel with significant and measurable difference? Yes. And yes, there are many other options, but I think this was one of the simpler approaches to understand. If they already understood ways to set up a dataflow to achieve the task, I'm pretty sure they would have already done it that way.

              S 1 Reply Last reply
              0
              • P Phil Martin

                Ha! I wish I had multitasking hard disks! My choice of wording was poor, and I hope the code clarified the intent, i.e. choosing the simplest point in the original code to parallelize. Will the changes be optimal? No. Will they be parallel with significant and measurable difference? Yes. And yes, there are many other options, but I think this was one of the simpler approaches to understand. If they already understood ways to set up a dataflow to achieve the task, I'm pretty sure they would have already done it that way.

                S Offline
                S Offline
                SledgeHammer01
                wrote on last edited by
                #7

                Minimum changes would have been to just use Parallel.For on the outer loop and switch to a ConcurrentDictionary or use locks around the regular Dictionary. Your overall solution is overly complicated (no offense). Any time you save by parallelizing the main loop will be destroyed (and then some) by the merging loop.

                P 1 Reply Last reply
                0
                • S SledgeHammer01

                  Minimum changes would have been to just use Parallel.For on the outer loop and switch to a ConcurrentDictionary or use locks around the regular Dictionary. Your overall solution is overly complicated (no offense). Any time you save by parallelizing the main loop will be destroyed (and then some) by the merging loop.

                  P Offline
                  P Offline
                  Phil Martin
                  wrote on last edited by
                  #8

                  Thanks for taking the time to reply. And I agree, parallel.for would have been the least invasive. And no offense taken!

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups