Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Other Discussions
  3. Clever Code
  4. Challenge: the fastest way to filter doubled items from a list.

Challenge: the fastest way to filter doubled items from a list.

Scheduled Pinned Locked Moved Clever Code
learningcsharpjavahelpquestion
31 Posts 24 Posters 4 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • 0 0bx

    To all the clever coders out here! I have a list full of strings (doubledList) and I want to create a new list containing every unique string from the first list exactly once(UniqueList). My solution is this (C#):

    private List LoadUniqueList(List doubledList)
    {
    List UniqueList = new List();
    foreach (string item in doubledList)
    {
    bool x = true;
    foreach (string compare in UniqueList)
    {
    if (item == compare)
    {
    x = false;
    }
    if (!x) break;
    }
    if (x) UniqueList.Add(item);
    }
    return UniqueList;
    }

    Can you make it lighter, faster, sharper? :java::java::java: X| Personally, I have no idea if it's even possible; otherwise it wouldn't be very interesting for me of course. Also, do you think it's a good idea to use this part of the forum similar challenges? I think that starting with a working solution for a generic problem and then discussing whether you can improve it would be a great way to create more learning opportunities.

    Giraffes are not real.

    M Offline
    M Offline
    M Hussain
    wrote on last edited by
    #16

    I bet it will be faster than everything.

    UniqueList.Add((from string itm in UniqueList
    from string compareItem in doubledList
    where itm == compareItem
    select compareItem).ToString());

    1 Reply Last reply
    0
    • 0 0bx

      To all the clever coders out here! I have a list full of strings (doubledList) and I want to create a new list containing every unique string from the first list exactly once(UniqueList). My solution is this (C#):

      private List LoadUniqueList(List doubledList)
      {
      List UniqueList = new List();
      foreach (string item in doubledList)
      {
      bool x = true;
      foreach (string compare in UniqueList)
      {
      if (item == compare)
      {
      x = false;
      }
      if (!x) break;
      }
      if (x) UniqueList.Add(item);
      }
      return UniqueList;
      }

      Can you make it lighter, faster, sharper? :java::java::java: X| Personally, I have no idea if it's even possible; otherwise it wouldn't be very interesting for me of course. Also, do you think it's a good idea to use this part of the forum similar challenges? I think that starting with a working solution for a generic problem and then discussing whether you can improve it would be a great way to create more learning opportunities.

      Giraffes are not real.

      L Offline
      L Offline
      Lilith C
      wrote on last edited by
      #17

      Usually I sort the list then iterate through the list, outputting each change. When a change is found its value is used to compare for the next change detection. This code outputs to a string.

      iList.Sort();
      txtOutput.Text = "";
      foreach (string str in iList)
      {
      if (str != lastEntry)
      {
      txtOutput.Text += (str + "\r\n");
      lastEntry = str;
      }
      }

      And, yes, I know that StringBuilder might be more appropriate for some lists.

      I'm not a programmer but I play one at the office

      1 Reply Last reply
      0
      • P Philippe Mori

        Such an algorithm is 0(n²) thus for large collection, it will be very slow. Each time, you double the collection size, it is 4 times slower. Using an HastSet or a SortedSet or a Dictionary would be much faster for large collections. The order would then be about O(n) with hashing or O(n log(n)) with Dictionary (binary search). Another alternative would be to sort the list if the order does not matters. It is then easier to skip skip or remove duplicates.

        Philippe Mori

        C Offline
        C Offline
        Climate Turnip
        wrote on last edited by
        #18

        Its always dangerous to assume that all performance issues can be assessed using big O notation as an evaluation method, particularly when not comparing apples with apples in terms of implementation specifics. Hashing resolves down to numeric comparison as opposed to String evaluation but involves a hashing function overhead, these are important factors, there are also allocation issues particular to any implementation. Trivialising performance evaluation in this way where growth rate is a factor is always a dangerous proposition.

        1 Reply Last reply
        0
        • 0 0bx

          To all the clever coders out here! I have a list full of strings (doubledList) and I want to create a new list containing every unique string from the first list exactly once(UniqueList). My solution is this (C#):

          private List LoadUniqueList(List doubledList)
          {
          List UniqueList = new List();
          foreach (string item in doubledList)
          {
          bool x = true;
          foreach (string compare in UniqueList)
          {
          if (item == compare)
          {
          x = false;
          }
          if (!x) break;
          }
          if (x) UniqueList.Add(item);
          }
          return UniqueList;
          }

          Can you make it lighter, faster, sharper? :java::java::java: X| Personally, I have no idea if it's even possible; otherwise it wouldn't be very interesting for me of course. Also, do you think it's a good idea to use this part of the forum similar challenges? I think that starting with a working solution for a generic problem and then discussing whether you can improve it would be a great way to create more learning opportunities.

          Giraffes are not real.

          B Offline
          B Offline
          Brian L Parker
          wrote on last edited by
          #19

          public List<String> RemoveDoubles(List<string> doublelist) { return doublelist.Distinct().ToList(); }

          1 Reply Last reply
          0
          • 0 0bx

            To all the clever coders out here! I have a list full of strings (doubledList) and I want to create a new list containing every unique string from the first list exactly once(UniqueList). My solution is this (C#):

            private List LoadUniqueList(List doubledList)
            {
            List UniqueList = new List();
            foreach (string item in doubledList)
            {
            bool x = true;
            foreach (string compare in UniqueList)
            {
            if (item == compare)
            {
            x = false;
            }
            if (!x) break;
            }
            if (x) UniqueList.Add(item);
            }
            return UniqueList;
            }

            Can you make it lighter, faster, sharper? :java::java::java: X| Personally, I have no idea if it's even possible; otherwise it wouldn't be very interesting for me of course. Also, do you think it's a good idea to use this part of the forum similar challenges? I think that starting with a working solution for a generic problem and then discussing whether you can improve it would be a great way to create more learning opportunities.

            Giraffes are not real.

            K Offline
            K Offline
            KP Lee
            wrote on last edited by
            #20

            You've been given several options to use pre-existing code. Makes it easier, doesn't necessarily make it faster. I can certainly see how to make your code slightly faster:

                    foreach (string item in doubledList)
                   {
                        bool x = true;
                        foreach (string compare in UniqueList)
                        {
                            if (item == compare)
                            {
                                x = false;
                                break;
                            }
                        }
                        if (x) UniqueList.Add(item);
                    }
            

            There is one less if statement needed. The break command knows it is breaking the inner foreach, not the if statement it is in. If you have 100 items with one duplicate, the first outer loop saves nothing, after that, you've removed an if check for every inner loop executed. If the first and last are duplicates you've saved 4800+ if checks. (First outer loop never executes an inner loop, every loop after that executes the inner 1 less than the outer ones taken except the last loop that takes just one inner loop. My math may be wrong, but I'm sure that saves over 4800 if tests. 98/2=49 The last full inner loop will be for 98 times, the prior loop would be 97 and the first inner loop is 1 added together is 98, then 96+2 is 98, etc for 49 times 98 inner loops were taken. 49*98=4802 plus 1 for the 100th outer loop.) Maybe a sorted unique list could be faster? I'll think about that. Is a sorted list OK?

            1 Reply Last reply
            0
            • 0 0bx

              To all the clever coders out here! I have a list full of strings (doubledList) and I want to create a new list containing every unique string from the first list exactly once(UniqueList). My solution is this (C#):

              private List LoadUniqueList(List doubledList)
              {
              List UniqueList = new List();
              foreach (string item in doubledList)
              {
              bool x = true;
              foreach (string compare in UniqueList)
              {
              if (item == compare)
              {
              x = false;
              }
              if (!x) break;
              }
              if (x) UniqueList.Add(item);
              }
              return UniqueList;
              }

              Can you make it lighter, faster, sharper? :java::java::java: X| Personally, I have no idea if it's even possible; otherwise it wouldn't be very interesting for me of course. Also, do you think it's a good idea to use this part of the forum similar challenges? I think that starting with a working solution for a generic problem and then discussing whether you can improve it would be a great way to create more learning opportunities.

              Giraffes are not real.

              H Offline
              H Offline
              Harley L Pebley
              wrote on last edited by
              #21

              Here are the results of a performance test of all the ways mentioned (at the time of this writing) in this thread: skylark-software.com[^]

              T 1 Reply Last reply
              0
              • H Harley L Pebley

                Here are the results of a performance test of all the ways mentioned (at the time of this writing) in this thread: skylark-software.com[^]

                T Offline
                T Offline
                TG_Cid
                wrote on last edited by
                #22

                Good job, i dont know why anyone would downvote your post, but i think is nice to see the comparative. And surprisingly(at least to me) the most simply answers were the best.(i thought the sorted list would be the best)

                H 1 Reply Last reply
                0
                • T TG_Cid

                  Good job, i dont know why anyone would downvote your post, but i think is nice to see the comparative. And surprisingly(at least to me) the most simply answers were the best.(i thought the sorted list would be the best)

                  H Offline
                  H Offline
                  Harley L Pebley
                  wrote on last edited by
                  #23

                  Thanks. The down vote is strange to me too. Perhaps it's because I didn't post it as a CodeProject article. :confused: As I mentioned, I expected Distinct() to be about same as the sorted list and was surprised it was so much faster; apparently it's using a HashSet (or equivalent) internally. But I was really surprised at the difference in speed between the sorted list and the hash set. I thought they'd be closer to each other. I'm guessing it's probably a difference related to inserts.

                  L 1 Reply Last reply
                  0
                  • P PIEBALDconsult

                    Exactly. Iterate the input List, add new items to the HashSet and the result List.

                    Z Offline
                    Z Offline
                    Zhang Lu1011
                    wrote on last edited by
                    #24

                    this problem is complex!

                    1 Reply Last reply
                    0
                    • P PIEBALDconsult

                      Exactly. Iterate the input List, add new items to the HashSet and the result List.

                      J Offline
                      J Offline
                      Jason McBurney
                      wrote on last edited by
                      #25

                      Agreed, the example given will run in N^2 time, and your solution will run in N time.

                      You can only be young once. But you can always be immature. - Dave Barry

                      B D 2 Replies Last reply
                      0
                      • J Jason McBurney

                        Agreed, the example given will run in N^2 time, and your solution will run in N time.

                        You can only be young once. But you can always be immature. - Dave Barry

                        B Offline
                        B Offline
                        Bernhard Hiller
                        wrote on last edited by
                        #26

                        Do not forget that HashSet must somehow check the constraint that there must not be duplicates. Also the time needed for that operation must be taken into account. And it will increase with the size of the Hashset.

                        1 Reply Last reply
                        0
                        • H Harley L Pebley

                          Thanks. The down vote is strange to me too. Perhaps it's because I didn't post it as a CodeProject article. :confused: As I mentioned, I expected Distinct() to be about same as the sorted list and was surprised it was so much faster; apparently it's using a HashSet (or equivalent) internally. But I was really surprised at the difference in speed between the sorted list and the hash set. I thought they'd be closer to each other. I'm guessing it's probably a difference related to inserts.

                          L Offline
                          L Offline
                          Lutoslaw
                          wrote on last edited by
                          #27

                          Harley L. Pebley wrote:

                          it's using a HashSet (or equivalent) internally

                          It uses it's own Set type which basically is a minimal implementation of a HashSet. So, I am suprised that Distinct is slower than a HashSet, because it uses a dedicated implementation instead of a general solution.

                          Greetings - Jacek

                          1 Reply Last reply
                          0
                          • 0 0bx

                            To all the clever coders out here! I have a list full of strings (doubledList) and I want to create a new list containing every unique string from the first list exactly once(UniqueList). My solution is this (C#):

                            private List LoadUniqueList(List doubledList)
                            {
                            List UniqueList = new List();
                            foreach (string item in doubledList)
                            {
                            bool x = true;
                            foreach (string compare in UniqueList)
                            {
                            if (item == compare)
                            {
                            x = false;
                            }
                            if (!x) break;
                            }
                            if (x) UniqueList.Add(item);
                            }
                            return UniqueList;
                            }

                            Can you make it lighter, faster, sharper? :java::java::java: X| Personally, I have no idea if it's even possible; otherwise it wouldn't be very interesting for me of course. Also, do you think it's a good idea to use this part of the forum similar challenges? I think that starting with a working solution for a generic problem and then discussing whether you can improve it would be a great way to create more learning opportunities.

                            Giraffes are not real.

                            I Offline
                            I Offline
                            I explore code
                            wrote on last edited by
                            #28

                            so i know this thread is centuries old, but i decided to comment anyway. I haven't looked at what others have posted yet so heres my unadulterated take:

                            for (int i = 0; i < doubled.Count; i++)
                            {
                            if (!uniques.Contains(doubled[i]))
                            {
                            uniques.Add(doubled[i]);
                            }
                            }

                            just one loop, although i know contains() is basically an O(n) method but the time difference doesnt seem much between your code and mine. But mine is less number of lines ;)

                            1 Reply Last reply
                            0
                            • 0 0bx

                              To all the clever coders out here! I have a list full of strings (doubledList) and I want to create a new list containing every unique string from the first list exactly once(UniqueList). My solution is this (C#):

                              private List LoadUniqueList(List doubledList)
                              {
                              List UniqueList = new List();
                              foreach (string item in doubledList)
                              {
                              bool x = true;
                              foreach (string compare in UniqueList)
                              {
                              if (item == compare)
                              {
                              x = false;
                              }
                              if (!x) break;
                              }
                              if (x) UniqueList.Add(item);
                              }
                              return UniqueList;
                              }

                              Can you make it lighter, faster, sharper? :java::java::java: X| Personally, I have no idea if it's even possible; otherwise it wouldn't be very interesting for me of course. Also, do you think it's a good idea to use this part of the forum similar challenges? I think that starting with a working solution for a generic problem and then discussing whether you can improve it would be a great way to create more learning opportunities.

                              Giraffes are not real.

                              V Offline
                              V Offline
                              VallarasuS
                              wrote on last edited by
                              #29

                              call this guy from Fsharp.Core!! >> http://msdn.microsoft.com/en-us/library/ee353861.aspx[^]

                              Regards Vallarasu S | FSharpMe.blogspot.com

                              1 Reply Last reply
                              0
                              • 0 0bx

                                To all the clever coders out here! I have a list full of strings (doubledList) and I want to create a new list containing every unique string from the first list exactly once(UniqueList). My solution is this (C#):

                                private List LoadUniqueList(List doubledList)
                                {
                                List UniqueList = new List();
                                foreach (string item in doubledList)
                                {
                                bool x = true;
                                foreach (string compare in UniqueList)
                                {
                                if (item == compare)
                                {
                                x = false;
                                }
                                if (!x) break;
                                }
                                if (x) UniqueList.Add(item);
                                }
                                return UniqueList;
                                }

                                Can you make it lighter, faster, sharper? :java::java::java: X| Personally, I have no idea if it's even possible; otherwise it wouldn't be very interesting for me of course. Also, do you think it's a good idea to use this part of the forum similar challenges? I think that starting with a working solution for a generic problem and then discussing whether you can improve it would be a great way to create more learning opportunities.

                                Giraffes are not real.

                                K Offline
                                K Offline
                                Kostya Kovalskyy
                                wrote on last edited by
                                #30

                                You can put the strings that repeat often in the beginning of your unique string, so it will not have to iterate that many times through the loop. So it will be ordered by the amount it repeats instead of the order of original string. For example, for string string: one three two two three three unique string will be unique: three two one instead of unique: one three two because there is most threes in the string. But this will not always optimize, because there may be no most occurring string if they are completely at randomized. So for example, it would not work for random computer generated strings, but work for predictable user info. Edit: you can also order them by there length, for example if you have strings: reallyLongStringWhichIsReallyLong anotherString reallyLongStringWhichIsReallyLong shortString shortString ... Even through shortString and reallyLongString occur the same amount of times, it takes longer for computer to check if reallyLongString is in the front, so it would be more efficient to move this in the back of unique string. Also, how long will it take your function in C# to check this strings: "stuff", "str", "str", "morestuff", "str", "not unique", "evenMoreStuff", "stuff", "unique", "morestuff", "not unique"? In C++, it takes 0.124 seconds when I print results to the screen and 0.074 when I do not. in C++ with gcc:

                                //0.124 and 0.074
                                #include
                                #include
                                using namespace std;

                                vector getUnique(vector listStr) {
                                vector::iterator listIt;

                                vector uniqueStr;
                                vector::iterator uniqueIt;

                                for (listIt = listStr.begin(); listIt < listStr.end(); listIt++) {
                                bool exists = false;
                                for (uniqueIt = uniqueStr.begin(); uniqueIt < uniqueStr.end(); uniqueIt++) {
                                if (*listIt == *uniqueIt) {
                                exists = true;
                                break;
                                }
                                }
                                if (!exists) uniqueStr.push_back(*listIt);
                                }
                                cout << "list: ";
                                for (listIt = listStr.begin(); listIt < listStr.end(); listIt++) cout << " " << *listIt;
                                cout << "\n\nunique list: ";
                                for (uniqueIt = uniqueStr.begin(); uniqueIt < uniqueStr.end(); uniqueIt++) cout << " " << *uniqueIt;
                                return uniqueStr;
                                }

                                int main() {
                                vector str = {"stuff", "str", "str", "morestuff", "str", "not unique", "evenMoreStuff", "stuff", "unique", "morestuff", "not unique"};

                                getUnique(str);
                                return 0;
                                }

                                1 Reply Last reply
                                0
                                • J Jason McBurney

                                  Agreed, the example given will run in N^2 time, and your solution will run in N time.

                                  You can only be young once. But you can always be immature. - Dave Barry

                                  D Offline
                                  D Offline
                                  dojohansen
                                  wrote on last edited by
                                  #31

                                  That doesn't mean it's always faster - it just means it is asymptotically faster. It always annoyed me that my algorithms class never dealt with this issue, being exclusively interested in the theoretical framework where "best" only meant "best for infinite-size problem" and not at all interested in real-world performance, where infinite input isn't even possible. :)

                                  1 Reply Last reply
                                  0
                                  Reply
                                  • Reply as topic
                                  Log in to reply
                                  • Oldest to Newest
                                  • Newest to Oldest
                                  • Most Votes


                                  • Login

                                  • Don't have an account? Register

                                  • Login or register to search.
                                  • First post
                                    Last post
                                  0
                                  • Categories
                                  • Recent
                                  • Tags
                                  • Popular
                                  • World
                                  • Users
                                  • Groups