Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. no duplicates in array

no duplicates in array

Scheduled Pinned Locked Moved C#
questiondata-structureshelp
11 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D duta

    Hi there I have a file with allot of sentences. I need to make a dictionary with the words from that file. Until now I've separated the words and sort them using Split() and Sort() methods. My problem is to make a list without duplicate words. How can I do that?

    static int n = 0;
    public static string[] NoDuplicate(string[] array)
    {
    int i;
    string[] res = (string[])array.Clone();
    for (i = 0; i < array.Length-1; i++)
    if (array[i + 1] != array[i])
    res[n++] = (string)array[i];
    return res;
    }

    1. how can I do it more neat? 2) i don't like that method because is initialized using Clone() and the lenght is too big. many thx
    G Offline
    G Offline
    Giorgi Dalakishvili
    wrote on last edited by
    #2

    Store words in dictionary. Before adding new word check if the dictionary already contains it or not.

    Giorgi Dalakishvili #region signature My Articles Asynchronous Registry Notification Using Strongly-typed WMI Classes in .NET [^] My blog #endregion

    1 Reply Last reply
    0
    • D duta

      Hi there I have a file with allot of sentences. I need to make a dictionary with the words from that file. Until now I've separated the words and sort them using Split() and Sort() methods. My problem is to make a list without duplicate words. How can I do that?

      static int n = 0;
      public static string[] NoDuplicate(string[] array)
      {
      int i;
      string[] res = (string[])array.Clone();
      for (i = 0; i < array.Length-1; i++)
      if (array[i + 1] != array[i])
      res[n++] = (string)array[i];
      return res;
      }

      1. how can I do it more neat? 2) i don't like that method because is initialized using Clone() and the lenght is too big. many thx
      R Offline
      R Offline
      realJSOP
      wrote on last edited by
      #3

      First, I'd use a List instead of an array because a List can grow dynamically, and will only be as large as is required to store your data. To avoid duplicates, you could do this:

      List<string> dictionary = new List<string>()

      // get the next sentence to process (you have to write
      // the GetNextSentence function)
      string sentence = GetNextSentence();
      // split the sentence into words
      string[] words = sentence.ToLower().Split(' ');
      // for each word
      for (int i = 0; i < words.Length; i++)
      {
      // if it's not already in the dictionary using the
      // List.Contains method
      if (!dictionary.Contains(words[i])
      {
      // add it to the dictionary
      dictionary.Add(words[i]);
      }
      }

      "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
      -----
      "...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001

      G 1 Reply Last reply
      0
      • R realJSOP

        First, I'd use a List instead of an array because a List can grow dynamically, and will only be as large as is required to store your data. To avoid duplicates, you could do this:

        List<string> dictionary = new List<string>()

        // get the next sentence to process (you have to write
        // the GetNextSentence function)
        string sentence = GetNextSentence();
        // split the sentence into words
        string[] words = sentence.ToLower().Split(' ');
        // for each word
        for (int i = 0; i < words.Length; i++)
        {
        // if it's not already in the dictionary using the
        // List.Contains method
        if (!dictionary.Contains(words[i])
        {
        // add it to the dictionary
        dictionary.Add(words[i]);
        }
        }

        "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
        -----
        "...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001

        G Offline
        G Offline
        Guffa
        wrote on last edited by
        #4

        Instead of using a List and calling it a dictionary, use a real Dictionary. The Dictionary.Contains method is a lot faster than the List.Contains method.

        Despite everything, the person most likely to be fooling you next is yourself.

        J 1 Reply Last reply
        0
        • D duta

          Hi there I have a file with allot of sentences. I need to make a dictionary with the words from that file. Until now I've separated the words and sort them using Split() and Sort() methods. My problem is to make a list without duplicate words. How can I do that?

          static int n = 0;
          public static string[] NoDuplicate(string[] array)
          {
          int i;
          string[] res = (string[])array.Clone();
          for (i = 0; i < array.Length-1; i++)
          if (array[i + 1] != array[i])
          res[n++] = (string)array[i];
          return res;
          }

          1. how can I do it more neat? 2) i don't like that method because is initialized using Clone() and the lenght is too big. many thx
          D Offline
          D Offline
          DaveyM69
          wrote on last edited by
          #5

          Ignore what I posted before, Simon's HashSet is perfect, and it has the ToArray if you need it. Leaned something new today :-D

          D 1 Reply Last reply
          0
          • G Guffa

            Instead of using a List and calling it a dictionary, use a real Dictionary. The Dictionary.Contains method is a lot faster than the List.Contains method.

            Despite everything, the person most likely to be fooling you next is yourself.

            J Offline
            J Offline
            J4amieC
            wrote on last edited by
            #6

            why would you need key & value just to store a word though? I would've gone with John's suggestion myself.

            S 1 Reply Last reply
            0
            • J J4amieC

              why would you need key & value just to store a word though? I would've gone with John's suggestion myself.

              S Offline
              S Offline
              Simon P Stevens
              wrote on last edited by
              #7

              The difference is the style of the storage object. A list is just that, an unsorted sequential list of items. To do a Contains() operation on it, you have to iterate through the list and check every item. On the other hand, a dictionary is a form of hash table, so a Contains() operation only has to hash the key and check if it already exists. in .net 3.5 you could instead consider a HashSet<String> This is specifically optimised for sets containing no duplicates.

              Simon

              J 1 Reply Last reply
              0
              • D duta

                Hi there I have a file with allot of sentences. I need to make a dictionary with the words from that file. Until now I've separated the words and sort them using Split() and Sort() methods. My problem is to make a list without duplicate words. How can I do that?

                static int n = 0;
                public static string[] NoDuplicate(string[] array)
                {
                int i;
                string[] res = (string[])array.Clone();
                for (i = 0; i < array.Length-1; i++)
                if (array[i + 1] != array[i])
                res[n++] = (string)array[i];
                return res;
                }

                1. how can I do it more neat? 2) i don't like that method because is initialized using Clone() and the lenght is too big. many thx
                S Offline
                S Offline
                Simon P Stevens
                wrote on last edited by
                #8

                Take a look at the HashSet<String>[^] class (.net 3.5 only). It provides an optimised hash collection and it doesn't allow duplicates, (it just ignores attempts to add duplicates), and you can call ToArray() when you are done with it if you really need a string array.

                Simon

                D 1 Reply Last reply
                0
                • S Simon P Stevens

                  Take a look at the HashSet<String>[^] class (.net 3.5 only). It provides an optimised hash collection and it doesn't allow duplicates, (it just ignores attempts to add duplicates), and you can call ToArray() when you are done with it if you really need a string array.

                  Simon

                  D Offline
                  D Offline
                  DaveyM69
                  wrote on last edited by
                  #9

                  Nice find Simon, hadn't come accross this one before... always good to learn something new :-D

                  Dave
                  BTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)
                  Visual Basic is not used by normal people so we're not covering it here. (Uncyclopedia)

                  1 Reply Last reply
                  0
                  • S Simon P Stevens

                    The difference is the style of the storage object. A list is just that, an unsorted sequential list of items. To do a Contains() operation on it, you have to iterate through the list and check every item. On the other hand, a dictionary is a form of hash table, so a Contains() operation only has to hash the key and check if it already exists. in .net 3.5 you could instead consider a HashSet<String> This is specifically optimised for sets containing no duplicates.

                    Simon

                    J Offline
                    J Offline
                    J4amieC
                    wrote on last edited by
                    #10

                    thanks for the info.

                    1 Reply Last reply
                    0
                    • D DaveyM69

                      Ignore what I posted before, Simon's HashSet is perfect, and it has the ToArray if you need it. Leaned something new today :-D

                      D Offline
                      D Offline
                      duta
                      wrote on last edited by
                      #11

                      HashSet is only in .Net framework 3.5 :( and i;m using vs2005:(( But the advice is soo great, thx to all

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups