no duplicates in array
-
Hi there I have a file with allot of sentences. I need to make a dictionary with the words from that file. Until now I've separated the words and sort them using Split() and Sort() methods. My problem is to make a list without duplicate words. How can I do that?
static int n = 0;
public static string[] NoDuplicate(string[] array)
{
int i;
string[] res = (string[])array.Clone();
for (i = 0; i < array.Length-1; i++)
if (array[i + 1] != array[i])
res[n++] = (string)array[i];
return res;
}- how can I do it more neat? 2) i don't like that method because is initialized using Clone() and the lenght is too big. many thx
Store words in dictionary. Before adding new word check if the dictionary already contains it or not.
Giorgi Dalakishvili #region signature My Articles Asynchronous Registry Notification Using Strongly-typed WMI Classes in .NET [^] My blog #endregion
-
Hi there I have a file with allot of sentences. I need to make a dictionary with the words from that file. Until now I've separated the words and sort them using Split() and Sort() methods. My problem is to make a list without duplicate words. How can I do that?
static int n = 0;
public static string[] NoDuplicate(string[] array)
{
int i;
string[] res = (string[])array.Clone();
for (i = 0; i < array.Length-1; i++)
if (array[i + 1] != array[i])
res[n++] = (string)array[i];
return res;
}- how can I do it more neat? 2) i don't like that method because is initialized using Clone() and the lenght is too big. many thx
First, I'd use a
List
instead of anarray
because aList
can grow dynamically, and will only be as large as is required to store your data. To avoid duplicates, you could do this:List<string> dictionary = new List<string>()
// get the next sentence to process (you have to write
// the GetNextSentence function)
string sentence = GetNextSentence();
// split the sentence into words
string[] words = sentence.ToLower().Split(' ');
// for each word
for (int i = 0; i < words.Length; i++)
{
// if it's not already in the dictionary using the
// List.Contains method
if (!dictionary.Contains(words[i])
{
// add it to the dictionary
dictionary.Add(words[i]);
}
}"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001 -
First, I'd use a
List
instead of anarray
because aList
can grow dynamically, and will only be as large as is required to store your data. To avoid duplicates, you could do this:List<string> dictionary = new List<string>()
// get the next sentence to process (you have to write
// the GetNextSentence function)
string sentence = GetNextSentence();
// split the sentence into words
string[] words = sentence.ToLower().Split(' ');
// for each word
for (int i = 0; i < words.Length; i++)
{
// if it's not already in the dictionary using the
// List.Contains method
if (!dictionary.Contains(words[i])
{
// add it to the dictionary
dictionary.Add(words[i]);
}
}"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001 -
Hi there I have a file with allot of sentences. I need to make a dictionary with the words from that file. Until now I've separated the words and sort them using Split() and Sort() methods. My problem is to make a list without duplicate words. How can I do that?
static int n = 0;
public static string[] NoDuplicate(string[] array)
{
int i;
string[] res = (string[])array.Clone();
for (i = 0; i < array.Length-1; i++)
if (array[i + 1] != array[i])
res[n++] = (string)array[i];
return res;
}- how can I do it more neat? 2) i don't like that method because is initialized using Clone() and the lenght is too big. many thx
-
Instead of using a
List
and calling it a dictionary, use a realDictionary
. TheDictionary.Contains
method is a lot faster than theList.Contains
method.Despite everything, the person most likely to be fooling you next is yourself.
-
why would you need key & value just to store a word though? I would've gone with John's suggestion myself.
The difference is the style of the storage object. A list is just that, an unsorted sequential list of items. To do a Contains() operation on it, you have to iterate through the list and check every item. On the other hand, a dictionary is a form of hash table, so a Contains() operation only has to hash the key and check if it already exists. in .net 3.5 you could instead consider a HashSet<String> This is specifically optimised for sets containing no duplicates.
Simon
-
Hi there I have a file with allot of sentences. I need to make a dictionary with the words from that file. Until now I've separated the words and sort them using Split() and Sort() methods. My problem is to make a list without duplicate words. How can I do that?
static int n = 0;
public static string[] NoDuplicate(string[] array)
{
int i;
string[] res = (string[])array.Clone();
for (i = 0; i < array.Length-1; i++)
if (array[i + 1] != array[i])
res[n++] = (string)array[i];
return res;
}- how can I do it more neat? 2) i don't like that method because is initialized using Clone() and the lenght is too big. many thx
Take a look at the HashSet<String>[^] class (.net 3.5 only). It provides an optimised hash collection and it doesn't allow duplicates, (it just ignores attempts to add duplicates), and you can call ToArray() when you are done with it if you really need a string array.
Simon
-
Take a look at the HashSet<String>[^] class (.net 3.5 only). It provides an optimised hash collection and it doesn't allow duplicates, (it just ignores attempts to add duplicates), and you can call ToArray() when you are done with it if you really need a string array.
Simon
Nice find Simon, hadn't come accross this one before... always good to learn something new :-D
Dave
BTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)
Visual Basic is not used by normal people so we're not covering it here. (Uncyclopedia) -
The difference is the style of the storage object. A list is just that, an unsorted sequential list of items. To do a Contains() operation on it, you have to iterate through the list and check every item. On the other hand, a dictionary is a form of hash table, so a Contains() operation only has to hash the key and check if it already exists. in .net 3.5 you could instead consider a HashSet<String> This is specifically optimised for sets containing no duplicates.
Simon
-
Ignore what I posted before, Simon's HashSet is perfect, and it has the ToArray if you need it. Leaned something new today :-D