Removing duplicates / extracting unique vals from String array
-
Alright, so i got an array like this:
String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A"};
I am trying to extract unique values / remove duplicates from this array, so in the end the output will be like: A, B, C, D, N, V, W Now here is my function:
public static String[] Extract(String[] List)
{
String[] Output = null;
String Symbols = String.Empty;
String SymbolsHere = String.Empty;foreach (String Text in List) { if (Text.Contains(",")) { String\[\] Vals = Text.Split(new char\[\] { ',' }); foreach (String Val in Vals) { if (SymbolsHere.IndexOf(Val) == -1) { // MessageBox.Show(SymbolsHere, Val); SymbolsHere += Val + " "; Symbols += Val + ", "; } } } else { if (SymbolsHere.IndexOf(Text) == -1) { Symbols += Text + ", "; SymbolsHere += Text + " "; } } } // MessageBox.Show(SymbolsHere); MessageBox.Show(Symbols); return Output; }
Well, this almost works, but... Last messagebox shows "A, B, C, A, D, N, V, W" instead of "A, B, C, D, N, V, W", so as you can see, one extra "A" char gets into output and i got no idea why :confused: Take a look at this part:
if (SymbolsHere.IndexOf(Val) == -1)
{
MessageBox.Show(SymbolsHere, Val);
SymbolsHere += Val + " ";
Symbols += Val + ", ";
}So, like, if symbol has not been found in already collected symbols container, show symbols container itself and a new symbol. But, at this moment app pops a msgbox with collection of syms: A B C and new symbol, which is not found in our collection: A. Its like, absurd - A is already there:confused: What is wrong with this code? Thanks
011011010110000101100011011010000110100101101110 0110010101110011
-
Alright, so i got an array like this:
String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A"};
I am trying to extract unique values / remove duplicates from this array, so in the end the output will be like: A, B, C, D, N, V, W Now here is my function:
public static String[] Extract(String[] List)
{
String[] Output = null;
String Symbols = String.Empty;
String SymbolsHere = String.Empty;foreach (String Text in List) { if (Text.Contains(",")) { String\[\] Vals = Text.Split(new char\[\] { ',' }); foreach (String Val in Vals) { if (SymbolsHere.IndexOf(Val) == -1) { // MessageBox.Show(SymbolsHere, Val); SymbolsHere += Val + " "; Symbols += Val + ", "; } } } else { if (SymbolsHere.IndexOf(Text) == -1) { Symbols += Text + ", "; SymbolsHere += Text + " "; } } } // MessageBox.Show(SymbolsHere); MessageBox.Show(Symbols); return Output; }
Well, this almost works, but... Last messagebox shows "A, B, C, A, D, N, V, W" instead of "A, B, C, D, N, V, W", so as you can see, one extra "A" char gets into output and i got no idea why :confused: Take a look at this part:
if (SymbolsHere.IndexOf(Val) == -1)
{
MessageBox.Show(SymbolsHere, Val);
SymbolsHere += Val + " ";
Symbols += Val + ", ";
}So, like, if symbol has not been found in already collected symbols container, show symbols container itself and a new symbol. But, at this moment app pops a msgbox with collection of syms: A B C and new symbol, which is not found in our collection: A. Its like, absurd - A is already there:confused: What is wrong with this code? Thanks
011011010110000101100011011010000110100101101110 0110010101110011
Not sure what your specs are: if by symbol you mean a single character, then you are doing it wrong as Split() will return lots of 2-character strings, since your input has spaces. You may want to add a Trim() somewhere. if by symbol you mean arbitrary multi-character strings ("words"), then IndexOf() is not the right method, as it also reports partial matches (i.e. "Y+" is found in "XY+="). :)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
-
Alright, so i got an array like this:
String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A"};
I am trying to extract unique values / remove duplicates from this array, so in the end the output will be like: A, B, C, D, N, V, W Now here is my function:
public static String[] Extract(String[] List)
{
String[] Output = null;
String Symbols = String.Empty;
String SymbolsHere = String.Empty;foreach (String Text in List) { if (Text.Contains(",")) { String\[\] Vals = Text.Split(new char\[\] { ',' }); foreach (String Val in Vals) { if (SymbolsHere.IndexOf(Val) == -1) { // MessageBox.Show(SymbolsHere, Val); SymbolsHere += Val + " "; Symbols += Val + ", "; } } } else { if (SymbolsHere.IndexOf(Text) == -1) { Symbols += Text + ", "; SymbolsHere += Text + " "; } } } // MessageBox.Show(SymbolsHere); MessageBox.Show(Symbols); return Output; }
Well, this almost works, but... Last messagebox shows "A, B, C, A, D, N, V, W" instead of "A, B, C, D, N, V, W", so as you can see, one extra "A" char gets into output and i got no idea why :confused: Take a look at this part:
if (SymbolsHere.IndexOf(Val) == -1)
{
MessageBox.Show(SymbolsHere, Val);
SymbolsHere += Val + " ";
Symbols += Val + ", ";
}So, like, if symbol has not been found in already collected symbols container, show symbols container itself and a new symbol. But, at this moment app pops a msgbox with collection of syms: A B C and new symbol, which is not found in our collection: A. Its like, absurd - A is already there:confused: What is wrong with this code? Thanks
011011010110000101100011011010000110100101101110 0110010101110011
First time when "A" is added to the output string, it is the first character in the array element. When it is added second time, it not "A" its " A" which gets added. You should be using trim to get rid of that additional characters. Also, you can make use of LINQ to make your code compact. Here is something quick I came up with (not the most efficient way though):
String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A" };
string uniqueCharacters = string.Empty; foreach (string str in Arr) { List values = str.Split(',').Distinct().ToList(); values.ForEach(x => uniqueCharacters = !uniqueCharacters.Contains(x.Trim()) ? uniqueCharacters + "," + x : uniqueCharacters); } uniqueCharacters = uniqueCharacters.Remove(0, 1);
Make sure to consider the size of array. If it is too huge, you may want to use StringBuilder instead of plain concatenation.
"Your code will never work, Luc's always will.", Richard MacCutchan[^]
-
Not sure what your specs are: if by symbol you mean a single character, then you are doing it wrong as Split() will return lots of 2-character strings, since your input has spaces. You may want to add a Trim() somewhere. if by symbol you mean arbitrary multi-character strings ("words"), then IndexOf() is not the right method, as it also reports partial matches (i.e. "Y+" is found in "XY+="). :)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
-
First time when "A" is added to the output string, it is the first character in the array element. When it is added second time, it not "A" its " A" which gets added. You should be using trim to get rid of that additional characters. Also, you can make use of LINQ to make your code compact. Here is something quick I came up with (not the most efficient way though):
String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A" };
string uniqueCharacters = string.Empty; foreach (string str in Arr) { List values = str.Split(',').Distinct().ToList(); values.ForEach(x => uniqueCharacters = !uniqueCharacters.Contains(x.Trim()) ? uniqueCharacters + "," + x : uniqueCharacters); } uniqueCharacters = uniqueCharacters.Remove(0, 1);
Make sure to consider the size of array. If it is too huge, you may want to use StringBuilder instead of plain concatenation.
"Your code will never work, Luc's always will.", Richard MacCutchan[^]
-
Alright, so i got an array like this:
String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A"};
I am trying to extract unique values / remove duplicates from this array, so in the end the output will be like: A, B, C, D, N, V, W Now here is my function:
public static String[] Extract(String[] List)
{
String[] Output = null;
String Symbols = String.Empty;
String SymbolsHere = String.Empty;foreach (String Text in List) { if (Text.Contains(",")) { String\[\] Vals = Text.Split(new char\[\] { ',' }); foreach (String Val in Vals) { if (SymbolsHere.IndexOf(Val) == -1) { // MessageBox.Show(SymbolsHere, Val); SymbolsHere += Val + " "; Symbols += Val + ", "; } } } else { if (SymbolsHere.IndexOf(Text) == -1) { Symbols += Text + ", "; SymbolsHere += Text + " "; } } } // MessageBox.Show(SymbolsHere); MessageBox.Show(Symbols); return Output; }
Well, this almost works, but... Last messagebox shows "A, B, C, A, D, N, V, W" instead of "A, B, C, D, N, V, W", so as you can see, one extra "A" char gets into output and i got no idea why :confused: Take a look at this part:
if (SymbolsHere.IndexOf(Val) == -1)
{
MessageBox.Show(SymbolsHere, Val);
SymbolsHere += Val + " ";
Symbols += Val + ", ";
}So, like, if symbol has not been found in already collected symbols container, show symbols container itself and a new symbol. But, at this moment app pops a msgbox with collection of syms: A B C and new symbol, which is not found in our collection: A. Its like, absurd - A is already there:confused: What is wrong with this code? Thanks
011011010110000101100011011010000110100101101110 0110010101110011
Also (if v3.5 or newer) take a look at
System.Collections.Generic.HashSet<T>
. -
Alright, so i got an array like this:
String[] Arr = { "A, B, C", "B, A, C", "C, A", "B, C, A", "D, A, N, V", "N, W, B, A"};
I am trying to extract unique values / remove duplicates from this array, so in the end the output will be like: A, B, C, D, N, V, W Now here is my function:
public static String[] Extract(String[] List)
{
String[] Output = null;
String Symbols = String.Empty;
String SymbolsHere = String.Empty;foreach (String Text in List) { if (Text.Contains(",")) { String\[\] Vals = Text.Split(new char\[\] { ',' }); foreach (String Val in Vals) { if (SymbolsHere.IndexOf(Val) == -1) { // MessageBox.Show(SymbolsHere, Val); SymbolsHere += Val + " "; Symbols += Val + ", "; } } } else { if (SymbolsHere.IndexOf(Text) == -1) { Symbols += Text + ", "; SymbolsHere += Text + " "; } } } // MessageBox.Show(SymbolsHere); MessageBox.Show(Symbols); return Output; }
Well, this almost works, but... Last messagebox shows "A, B, C, A, D, N, V, W" instead of "A, B, C, D, N, V, W", so as you can see, one extra "A" char gets into output and i got no idea why :confused: Take a look at this part:
if (SymbolsHere.IndexOf(Val) == -1)
{
MessageBox.Show(SymbolsHere, Val);
SymbolsHere += Val + " ";
Symbols += Val + ", ";
}So, like, if symbol has not been found in already collected symbols container, show symbols container itself and a new symbol. But, at this moment app pops a msgbox with collection of syms: A B C and new symbol, which is not found in our collection: A. Its like, absurd - A is already there:confused: What is wrong with this code? Thanks
011011010110000101100011011010000110100101101110 0110010101110011
Combining the excellent answers you already have into a one liner using Linq (if available in the .NET version you are using)
string[] vals = text.Split(',').Select(t => t.Trim()).Distinct().ToArray();
Edit: This will only ensure unique in each group separated by ','. The easiest way to get unique between all groups may be to combine all the string array elements into one string and perform this operation on the full string e.g.
string[] vals = string.Join(",", arr) // join to make one string
.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries) // split to array
.Select(t => t.Trim()) // trim each entry
.Distinct() // ensure unique
.ToArray(); // back to arrayNot sure how efficient this is, but it works!
Dave
Binging is like googling, it just feels dirtier. Please take your VB.NET out of our nice case sensitive forum. Astonish us. Be exceptional. (Pete O'Hanlon)
BTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)modified on Saturday, January 22, 2011 12:27 PM
-
Combining the excellent answers you already have into a one liner using Linq (if available in the .NET version you are using)
string[] vals = text.Split(',').Select(t => t.Trim()).Distinct().ToArray();
Edit: This will only ensure unique in each group separated by ','. The easiest way to get unique between all groups may be to combine all the string array elements into one string and perform this operation on the full string e.g.
string[] vals = string.Join(",", arr) // join to make one string
.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries) // split to array
.Select(t => t.Trim()) // trim each entry
.Distinct() // ensure unique
.ToArray(); // back to arrayNot sure how efficient this is, but it works!
Dave
Binging is like googling, it just feels dirtier. Please take your VB.NET out of our nice case sensitive forum. Astonish us. Be exceptional. (Pete O'Hanlon)
BTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)modified on Saturday, January 22, 2011 12:27 PM
Talking about efficiency, I read somewere on MSDN that, for generating a list of unique elements, the more efficient way is to use an HashSet (as PIEBALDconsult already said) instead of the Distinct extension method. Something like this:
var query = string.Join(",", arr) // join to make one string
.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries) // split to array
.Select(t => t.Trim()); // trim each entryvar set = new Hashset<string>
foreach (string item in query)
set.Add(item); // add items to the hash set, this ensure uniquestring[] vals = set.ToArray(); // back to array