Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Removing duplicates / extracting unique vals from String array

Removing duplicates / extracting unique vals from String array

Scheduled Pinned Locked Moved C#
questiondockerdata-structures
8 Posts 6 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Offline
    C Offline
    csrss
    wrote on last edited by
    #1

    Alright, so i got an array like this:

    String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A"};

    I am trying to extract unique values / remove duplicates from this array, so in the end the output will be like: A, B, C, D, N, V, W Now here is my function:

    public static String[] Extract(String[] List)
    {
    String[] Output = null;
    String Symbols = String.Empty;
    String SymbolsHere = String.Empty;

            foreach (String Text in List)
            {
                if (Text.Contains(","))
                {
                    String\[\] Vals = Text.Split(new char\[\] { ',' });
                    foreach (String Val in Vals)
                    {
                        if (SymbolsHere.IndexOf(Val) == -1)
                        {
                       //     MessageBox.Show(SymbolsHere, Val);
                            SymbolsHere += Val + " ";
                            Symbols += Val + ", ";
                        }
                    }
                }
                else
                {
                    if (SymbolsHere.IndexOf(Text) == -1)
                    {
                        Symbols += Text + ", ";
                        SymbolsHere += Text + " ";
                    }
                }
            }
     //       MessageBox.Show(SymbolsHere);
            MessageBox.Show(Symbols);
            return Output;
        }
    

    Well, this almost works, but... Last messagebox shows "A, B, C, A, D, N, V, W" instead of "A, B, C, D, N, V, W", so as you can see, one extra "A" char gets into output and i got no idea why :confused: Take a look at this part:

    if (SymbolsHere.IndexOf(Val) == -1)
    {
    MessageBox.Show(SymbolsHere, Val);
    SymbolsHere += Val + " ";
    Symbols += Val + ", ";
    }

    So, like, if symbol has not been found in already collected symbols container, show symbols container itself and a new symbol. But, at this moment app pops a msgbox with collection of syms: A B C and new symbol, which is not found in our collection: A. Its like, absurd - A is already there:confused: What is wrong with this code? Thanks

    011011010110000101100011011010000110100101101110 0110010101110011

    L D P D 4 Replies Last reply
    0
    • C csrss

      Alright, so i got an array like this:

      String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A"};

      I am trying to extract unique values / remove duplicates from this array, so in the end the output will be like: A, B, C, D, N, V, W Now here is my function:

      public static String[] Extract(String[] List)
      {
      String[] Output = null;
      String Symbols = String.Empty;
      String SymbolsHere = String.Empty;

              foreach (String Text in List)
              {
                  if (Text.Contains(","))
                  {
                      String\[\] Vals = Text.Split(new char\[\] { ',' });
                      foreach (String Val in Vals)
                      {
                          if (SymbolsHere.IndexOf(Val) == -1)
                          {
                         //     MessageBox.Show(SymbolsHere, Val);
                              SymbolsHere += Val + " ";
                              Symbols += Val + ", ";
                          }
                      }
                  }
                  else
                  {
                      if (SymbolsHere.IndexOf(Text) == -1)
                      {
                          Symbols += Text + ", ";
                          SymbolsHere += Text + " ";
                      }
                  }
              }
       //       MessageBox.Show(SymbolsHere);
              MessageBox.Show(Symbols);
              return Output;
          }
      

      Well, this almost works, but... Last messagebox shows "A, B, C, A, D, N, V, W" instead of "A, B, C, D, N, V, W", so as you can see, one extra "A" char gets into output and i got no idea why :confused: Take a look at this part:

      if (SymbolsHere.IndexOf(Val) == -1)
      {
      MessageBox.Show(SymbolsHere, Val);
      SymbolsHere += Val + " ";
      Symbols += Val + ", ";
      }

      So, like, if symbol has not been found in already collected symbols container, show symbols container itself and a new symbol. But, at this moment app pops a msgbox with collection of syms: A B C and new symbol, which is not found in our collection: A. Its like, absurd - A is already there:confused: What is wrong with this code? Thanks

      011011010110000101100011011010000110100101101110 0110010101110011

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #2

      Not sure what your specs are: if by symbol you mean a single character, then you are doing it wrong as Split() will return lots of 2-character strings, since your input has spaces. You may want to add a Trim() somewhere. if by symbol you mean arbitrary multi-character strings ("words"), then IndexOf() is not the right method, as it also reports partial matches (i.e. "Y+" is found in "XY+="). :)

      Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

      Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

      C 1 Reply Last reply
      0
      • C csrss

        Alright, so i got an array like this:

        String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A"};

        I am trying to extract unique values / remove duplicates from this array, so in the end the output will be like: A, B, C, D, N, V, W Now here is my function:

        public static String[] Extract(String[] List)
        {
        String[] Output = null;
        String Symbols = String.Empty;
        String SymbolsHere = String.Empty;

                foreach (String Text in List)
                {
                    if (Text.Contains(","))
                    {
                        String\[\] Vals = Text.Split(new char\[\] { ',' });
                        foreach (String Val in Vals)
                        {
                            if (SymbolsHere.IndexOf(Val) == -1)
                            {
                           //     MessageBox.Show(SymbolsHere, Val);
                                SymbolsHere += Val + " ";
                                Symbols += Val + ", ";
                            }
                        }
                    }
                    else
                    {
                        if (SymbolsHere.IndexOf(Text) == -1)
                        {
                            Symbols += Text + ", ";
                            SymbolsHere += Text + " ";
                        }
                    }
                }
         //       MessageBox.Show(SymbolsHere);
                MessageBox.Show(Symbols);
                return Output;
            }
        

        Well, this almost works, but... Last messagebox shows "A, B, C, A, D, N, V, W" instead of "A, B, C, D, N, V, W", so as you can see, one extra "A" char gets into output and i got no idea why :confused: Take a look at this part:

        if (SymbolsHere.IndexOf(Val) == -1)
        {
        MessageBox.Show(SymbolsHere, Val);
        SymbolsHere += Val + " ";
        Symbols += Val + ", ";
        }

        So, like, if symbol has not been found in already collected symbols container, show symbols container itself and a new symbol. But, at this moment app pops a msgbox with collection of syms: A B C and new symbol, which is not found in our collection: A. Its like, absurd - A is already there:confused: What is wrong with this code? Thanks

        011011010110000101100011011010000110100101101110 0110010101110011

        D Offline
        D Offline
        dan sh
        wrote on last edited by
        #3

        First time when "A" is added to the output string, it is the first character in the array element. When it is added second time, it not "A" its " A" which gets added. You should be using trim to get rid of that additional characters. Also, you can make use of LINQ to make your code compact. Here is something quick I came up with (not the most efficient way though):

        String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A" };

                string uniqueCharacters = string.Empty;
        
                foreach (string str in Arr)
                {
                    List values = str.Split(',').Distinct().ToList();
                    values.ForEach(x => uniqueCharacters = !uniqueCharacters.Contains(x.Trim()) ? uniqueCharacters + "," + x : uniqueCharacters);
        
                }
        
                uniqueCharacters = uniqueCharacters.Remove(0, 1);
        

        Make sure to consider the size of array. If it is too huge, you may want to use StringBuilder instead of plain concatenation.

        "Your code will never work, Luc's always will.", Richard MacCutchan[^]

        C 1 Reply Last reply
        0
        • L Luc Pattyn

          Not sure what your specs are: if by symbol you mean a single character, then you are doing it wrong as Split() will return lots of 2-character strings, since your input has spaces. You may want to add a Trim() somewhere. if by symbol you mean arbitrary multi-character strings ("words"), then IndexOf() is not the right method, as it also reports partial matches (i.e. "Y+" is found in "XY+="). :)

          Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

          Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

          C Offline
          C Offline
          csrss
          wrote on last edited by
          #4

          Thank you! It was all about spaces. Trim() solved the issue.

          011011010110000101100011011010000110100101101110 0110010101110011

          1 Reply Last reply
          0
          • D dan sh

            First time when "A" is added to the output string, it is the first character in the array element. When it is added second time, it not "A" its " A" which gets added. You should be using trim to get rid of that additional characters. Also, you can make use of LINQ to make your code compact. Here is something quick I came up with (not the most efficient way though):

            String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A" };

                    string uniqueCharacters = string.Empty;
            
                    foreach (string str in Arr)
                    {
                        List values = str.Split(',').Distinct().ToList();
                        values.ForEach(x => uniqueCharacters = !uniqueCharacters.Contains(x.Trim()) ? uniqueCharacters + "," + x : uniqueCharacters);
            
                    }
            
                    uniqueCharacters = uniqueCharacters.Remove(0, 1);
            

            Make sure to consider the size of array. If it is too huge, you may want to use StringBuilder instead of plain concatenation.

            "Your code will never work, Luc's always will.", Richard MacCutchan[^]

            C Offline
            C Offline
            csrss
            wrote on last edited by
            #5

            Thanks, yes, that was it :)

            011011010110000101100011011010000110100101101110 0110010101110011

            1 Reply Last reply
            0
            • C csrss

              Alright, so i got an array like this:

              String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A"};

              I am trying to extract unique values / remove duplicates from this array, so in the end the output will be like: A, B, C, D, N, V, W Now here is my function:

              public static String[] Extract(String[] List)
              {
              String[] Output = null;
              String Symbols = String.Empty;
              String SymbolsHere = String.Empty;

                      foreach (String Text in List)
                      {
                          if (Text.Contains(","))
                          {
                              String\[\] Vals = Text.Split(new char\[\] { ',' });
                              foreach (String Val in Vals)
                              {
                                  if (SymbolsHere.IndexOf(Val) == -1)
                                  {
                                 //     MessageBox.Show(SymbolsHere, Val);
                                      SymbolsHere += Val + " ";
                                      Symbols += Val + ", ";
                                  }
                              }
                          }
                          else
                          {
                              if (SymbolsHere.IndexOf(Text) == -1)
                              {
                                  Symbols += Text + ", ";
                                  SymbolsHere += Text + " ";
                              }
                          }
                      }
               //       MessageBox.Show(SymbolsHere);
                      MessageBox.Show(Symbols);
                      return Output;
                  }
              

              Well, this almost works, but... Last messagebox shows "A, B, C, A, D, N, V, W" instead of "A, B, C, D, N, V, W", so as you can see, one extra "A" char gets into output and i got no idea why :confused: Take a look at this part:

              if (SymbolsHere.IndexOf(Val) == -1)
              {
              MessageBox.Show(SymbolsHere, Val);
              SymbolsHere += Val + " ";
              Symbols += Val + ", ";
              }

              So, like, if symbol has not been found in already collected symbols container, show symbols container itself and a new symbol. But, at this moment app pops a msgbox with collection of syms: A B C and new symbol, which is not found in our collection: A. Its like, absurd - A is already there:confused: What is wrong with this code? Thanks

              011011010110000101100011011010000110100101101110 0110010101110011

              P Offline
              P Offline
              PIEBALDconsult
              wrote on last edited by
              #6

              Also (if v3.5 or newer) take a look at System.Collections.Generic.HashSet<T>.

              1 Reply Last reply
              0
              • C csrss

                Alright, so i got an array like this:

                String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A"};

                I am trying to extract unique values / remove duplicates from this array, so in the end the output will be like: A, B, C, D, N, V, W Now here is my function:

                public static String[] Extract(String[] List)
                {
                String[] Output = null;
                String Symbols = String.Empty;
                String SymbolsHere = String.Empty;

                        foreach (String Text in List)
                        {
                            if (Text.Contains(","))
                            {
                                String\[\] Vals = Text.Split(new char\[\] { ',' });
                                foreach (String Val in Vals)
                                {
                                    if (SymbolsHere.IndexOf(Val) == -1)
                                    {
                                   //     MessageBox.Show(SymbolsHere, Val);
                                        SymbolsHere += Val + " ";
                                        Symbols += Val + ", ";
                                    }
                                }
                            }
                            else
                            {
                                if (SymbolsHere.IndexOf(Text) == -1)
                                {
                                    Symbols += Text + ", ";
                                    SymbolsHere += Text + " ";
                                }
                            }
                        }
                 //       MessageBox.Show(SymbolsHere);
                        MessageBox.Show(Symbols);
                        return Output;
                    }
                

                Well, this almost works, but... Last messagebox shows "A, B, C, A, D, N, V, W" instead of "A, B, C, D, N, V, W", so as you can see, one extra "A" char gets into output and i got no idea why :confused: Take a look at this part:

                if (SymbolsHere.IndexOf(Val) == -1)
                {
                MessageBox.Show(SymbolsHere, Val);
                SymbolsHere += Val + " ";
                Symbols += Val + ", ";
                }

                So, like, if symbol has not been found in already collected symbols container, show symbols container itself and a new symbol. But, at this moment app pops a msgbox with collection of syms: A B C and new symbol, which is not found in our collection: A. Its like, absurd - A is already there:confused: What is wrong with this code? Thanks

                011011010110000101100011011010000110100101101110 0110010101110011

                D Offline
                D Offline
                DaveyM69
                wrote on last edited by
                #7

                Combining the excellent answers you already have into a one liner using Linq (if available in the .NET version you are using)

                string[] vals = text.Split(',').Select(t => t.Trim()).Distinct().ToArray();

                Edit: This will only ensure unique in each group separated by ','. The easiest way to get unique between all groups may be to combine all the string array elements into one string and perform this operation on the full string e.g.

                string[] vals = string.Join(",", arr) // join to make one string
                .Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries) // split to array
                .Select(t => t.Trim()) // trim each entry
                .Distinct() // ensure unique
                .ToArray(); // back to array

                Not sure how efficient this is, but it works!

                Dave
                Binging is like googling, it just feels dirtier. Please take your VB.NET out of our nice case sensitive forum. Astonish us. Be exceptional. (Pete O'Hanlon)
                BTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)

                modified on Saturday, January 22, 2011 12:27 PM

                M 1 Reply Last reply
                0
                • D DaveyM69

                  Combining the excellent answers you already have into a one liner using Linq (if available in the .NET version you are using)

                  string[] vals = text.Split(',').Select(t => t.Trim()).Distinct().ToArray();

                  Edit: This will only ensure unique in each group separated by ','. The easiest way to get unique between all groups may be to combine all the string array elements into one string and perform this operation on the full string e.g.

                  string[] vals = string.Join(",", arr) // join to make one string
                  .Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries) // split to array
                  .Select(t => t.Trim()) // trim each entry
                  .Distinct() // ensure unique
                  .ToArray(); // back to array

                  Not sure how efficient this is, but it works!

                  Dave
                  Binging is like googling, it just feels dirtier. Please take your VB.NET out of our nice case sensitive forum. Astonish us. Be exceptional. (Pete O'Hanlon)
                  BTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)

                  modified on Saturday, January 22, 2011 12:27 PM

                  M Offline
                  M Offline
                  Mirko1980
                  wrote on last edited by
                  #8

                  Talking about efficiency, I read somewere on MSDN that, for generating a list of unique elements, the more efficient way is to use an HashSet (as PIEBALDconsult already said) instead of the Distinct extension method. Something like this:

                  var query = string.Join(",", arr) // join to make one string
                  .Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries) // split to array
                  .Select(t => t.Trim()); // trim each entry

                  var set = new Hashset<string>

                  foreach (string item in query)
                  set.Add(item); // add items to the hash set, this ensure unique

                  string[] vals = set.ToArray(); // back to array

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups