Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Visual Basic
  4. count character frequency

count character frequency

Scheduled Pinned Locked Moved Visual Basic
questioncsharp
11 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Offline
    C Offline
    Cool Smith
    wrote on last edited by
    #1

    i need to examine a text file containing Arabic writings, by counting each character frequency of each character, how do i do this in vb.net? Do i need to detect the file code-page before reading? do i need to convert the file code-page before doing anything? AM using vb.net 2008

    T L 2 Replies Last reply
    0
    • C Cool Smith

      i need to examine a text file containing Arabic writings, by counting each character frequency of each character, how do i do this in vb.net? Do i need to detect the file code-page before reading? do i need to convert the file code-page before doing anything? AM using vb.net 2008

      T Offline
      T Offline
      Tieske8
      wrote on last edited by
      #2

      For my 2cts; I don't think there is any way to detect the codepage, you ought to know it before opening it. Text files don't have any metadata to tell you anything about its content. So if you know the encoding, you can load the full file by using System.IO.File.ReadAllText(path As String, encoding As Encoding) As String. From here you should have a properly encoded string and you can start counting characters.

      If you want something done fast, then do it right (Grissom, CSI) Thanks for your reply, you just acknowledged my existence

      C 1 Reply Last reply
      0
      • T Tieske8

        For my 2cts; I don't think there is any way to detect the codepage, you ought to know it before opening it. Text files don't have any metadata to tell you anything about its content. So if you know the encoding, you can load the full file by using System.IO.File.ReadAllText(path As String, encoding As Encoding) As String. From here you should have a properly encoded string and you can start counting characters.

        If you want something done fast, then do it right (Grissom, CSI) Thanks for your reply, you just acknowledged my existence

        C Offline
        C Offline
        Cool Smith
        wrote on last edited by
        #3

        what i want to do is count the frequency of characters that appears in an Arabic text

        T D 2 Replies Last reply
        0
        • C Cool Smith

          what i want to do is count the frequency of characters that appears in an Arabic text

          T Offline
          T Offline
          Tieske8
          wrote on last edited by
          #4

          Have you tried it? Or can you be more specific? A string in .NET is not just a list of bytes. Every character consists of 1 or more bytes depending on the encoding used. The method provided will read the file into a proper encoded string. All you have to do is traverse the string and count the characters.

          If you want something done fast, then do it right (Grissom, CSI) Thanks for your reply, you just acknowledged my existence

          1 Reply Last reply
          0
          • C Cool Smith

            what i want to do is count the frequency of characters that appears in an Arabic text

            D Offline
            D Offline
            Dave Kreskowiak
            wrote on last edited by
            #5

            All you have to do is read the file into a String, then iterate over the Characters in the string and add them to a Dictionary collection. Since Dictionary is a key/value pair collection, the "key" will be the character you are looking at. The "value" will be the count of those characters. When you go to add the character to the collection, you first see if it is already there, and if so, get it's value and increment it by one. If not, add the new key with a value of 1. Move on the next character...

            A guide to posting questions on CodeProject[^]
            Dave Kreskowiak

            1 Reply Last reply
            0
            • C Cool Smith

              i need to examine a text file containing Arabic writings, by counting each character frequency of each character, how do i do this in vb.net? Do i need to detect the file code-page before reading? do i need to convert the file code-page before doing anything? AM using vb.net 2008

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              Cool Smith wrote:

              i need to examine a text file containing Arabic writings

              What format is it in? Ideally, it'd be UTF. It's important since the encoding determines the length of a single character. Download a HEX-editor and open the textfile with it - what do the first bytes look like in HEX?

              Cool Smith wrote:

              Do i need to detect the file code-page before reading?

              There's no way of detecting it with good precision, but Notepad can tale an educated guess[^]. If you have any say in it, then it should be UTF. If you don't, ask which codepage was used to write the files. There'll be a difference in Windows Arabic 1256[^] and DOS Arabic 864[^]

              Cool Smith wrote:

              by counting each character frequency of each character, how do i do this in vb.net?

              First, determine the encoding, and read the file with that encoding. Then create a dictionary, read the entire file as a string. Loop through the string by eating characters, adding them to the dictionary as the key, or adding +1 to it's value if it's already in the dictionary. When done eating, burp out the results :)

              I are Troll :suss:

              C 1 Reply Last reply
              0
              • L Lost User

                Cool Smith wrote:

                i need to examine a text file containing Arabic writings

                What format is it in? Ideally, it'd be UTF. It's important since the encoding determines the length of a single character. Download a HEX-editor and open the textfile with it - what do the first bytes look like in HEX?

                Cool Smith wrote:

                Do i need to detect the file code-page before reading?

                There's no way of detecting it with good precision, but Notepad can tale an educated guess[^]. If you have any say in it, then it should be UTF. If you don't, ask which codepage was used to write the files. There'll be a difference in Windows Arabic 1256[^] and DOS Arabic 864[^]

                Cool Smith wrote:

                by counting each character frequency of each character, how do i do this in vb.net?

                First, determine the encoding, and read the file with that encoding. Then create a dictionary, read the entire file as a string. Loop through the string by eating characters, adding them to the dictionary as the key, or adding +1 to it's value if it's already in the dictionary. When done eating, burp out the results :)

                I are Troll :suss:

                C Offline
                C Offline
                Cool Smith
                wrote on last edited by
                #7

                What format is it in? Ideally, it'd be UTF. It's important since the encoding determines the length of a single character. Download a HEX-editor and open the textfile with it - what do the first bytes look like in HEX?

                The this is, the software will be examining different text files (*.txt) only that contains arabic writings. i found code here that can detect the code page of a file and another that can convert between different code page.

                First, determine the encoding, and read the file with that encoding. Then create a dictionary, read the entire file as a string. Loop through the string by eating characters, adding them to the dictionary as the key, or adding +1 to it's value if it's already in the dictionary. When done eating, burp out the results

                can you give me pseudo code for this, i don't have any idea how to do it

                L 1 Reply Last reply
                0
                • C Cool Smith

                  What format is it in? Ideally, it'd be UTF. It's important since the encoding determines the length of a single character. Download a HEX-editor and open the textfile with it - what do the first bytes look like in HEX?

                  The this is, the software will be examining different text files (*.txt) only that contains arabic writings. i found code here that can detect the code page of a file and another that can convert between different code page.

                  First, determine the encoding, and read the file with that encoding. Then create a dictionary, read the entire file as a string. Loop through the string by eating characters, adding them to the dictionary as the key, or adding +1 to it's value if it's already in the dictionary. When done eating, burp out the results

                  can you give me pseudo code for this, i don't have any idea how to do it

                  L Offline
                  L Offline
                  Lost User
                  wrote on last edited by
                  #8

                  Cool Smith wrote:

                  i found code here that can detect the code page of a file

                  Can you post a link to that article? I haven't read it yet :)

                  Cool Smith wrote:

                  can you give me pseudo code for this

                  It'd go something like this;

                  // A dictionary, used to count the frequencies
                  Dictionary characterCounter = new Dictionary();

                  // we'll read the entire file into a string;
                  string theFile = File.ReadAllText("C:\test.txt");

                  // we'll keep removing characters and process them, until the string is empty
                  while (theFile.Length > 0)
                  {
                  // get the char at the end of the string
                  string CurrentCharacter = theFile[theFile.Length -1];

                  // remove that thing from the string that holds the file
                  string theFile = theFile.Remove(theFile.Length -1, 1);

                  // if the dictionary contains our character
                  if (characterCounter.ContainsKey(CurrentCharacter))
                  {
                  // increase the value of the int
                  characterCounter[CurrentCharacter] = characterCounter[CurrentCharacter] + 1;
                  }
                  else
                  {
                  // it wasn't in the dictionary yet, so it must be the
                  // first time that we encounter this character. Add it;
                  characterCounter.Add(CurrentCharacter, 1);
                  }
                  }

                  // done with counting, now show the results to the user
                  for each (DictionaryEntry entry in characterCounter)
                  {
                  textBox1.Text += String.Format("char {0} occurs {1} times", entry.Key, entry.Value);
                  }

                  This could be a bit slow with large files, as it forces .NET to allocate memory each time for a new string. It'd be more efficient if it were a moving frame. That'd go something more like this;

                  string theFile = File.ReadAllText("C:\test.txt");

                  // this will point to the index of the character that we're processing
                  Int64 currentPos = 0;
                  Int64 endPos = theFile.Length -1;

                  // while the current position in the string doesn't match the end position;
                  while (currentPos <> endPos)
                  {
                  // fetch the current character from the string, at the current index
                  string CurrentCharacter = theFile[currentPos];

                  // increase the index
                  currentPos = currentPos + 1;
                  
                  // rest of dictionary-code here;
                  ...
                  

                  }

                  I are Troll :suss:

                  C 2 Replies Last reply
                  0
                  • L Lost User

                    Cool Smith wrote:

                    i found code here that can detect the code page of a file

                    Can you post a link to that article? I haven't read it yet :)

                    Cool Smith wrote:

                    can you give me pseudo code for this

                    It'd go something like this;

                    // A dictionary, used to count the frequencies
                    Dictionary characterCounter = new Dictionary();

                    // we'll read the entire file into a string;
                    string theFile = File.ReadAllText("C:\test.txt");

                    // we'll keep removing characters and process them, until the string is empty
                    while (theFile.Length > 0)
                    {
                    // get the char at the end of the string
                    string CurrentCharacter = theFile[theFile.Length -1];

                    // remove that thing from the string that holds the file
                    string theFile = theFile.Remove(theFile.Length -1, 1);

                    // if the dictionary contains our character
                    if (characterCounter.ContainsKey(CurrentCharacter))
                    {
                    // increase the value of the int
                    characterCounter[CurrentCharacter] = characterCounter[CurrentCharacter] + 1;
                    }
                    else
                    {
                    // it wasn't in the dictionary yet, so it must be the
                    // first time that we encounter this character. Add it;
                    characterCounter.Add(CurrentCharacter, 1);
                    }
                    }

                    // done with counting, now show the results to the user
                    for each (DictionaryEntry entry in characterCounter)
                    {
                    textBox1.Text += String.Format("char {0} occurs {1} times", entry.Key, entry.Value);
                    }

                    This could be a bit slow with large files, as it forces .NET to allocate memory each time for a new string. It'd be more efficient if it were a moving frame. That'd go something more like this;

                    string theFile = File.ReadAllText("C:\test.txt");

                    // this will point to the index of the character that we're processing
                    Int64 currentPos = 0;
                    Int64 endPos = theFile.Length -1;

                    // while the current position in the string doesn't match the end position;
                    while (currentPos <> endPos)
                    {
                    // fetch the current character from the string, at the current index
                    string CurrentCharacter = theFile[currentPos];

                    // increase the index
                    currentPos = currentPos + 1;
                    
                    // rest of dictionary-code here;
                    ...
                    

                    }

                    I are Troll :suss:

                    C Offline
                    C Offline
                    Cool Smith
                    wrote on last edited by
                    #9

                    here are the links CodePage File Converter[^] Detect Encoding for In- and Outgoing Text[^] i'll try your implementation and and back to you. besides i found a hextostring code, will it work well for recognizing single characters in a joined character Private Function ConvertStringToHex(ByVal MyString As String) As String Dim Result As String = vbNullString If Len(MyString) = 0 Then Result = vbNullString Else For i As Integer = 0 To Len(MyString.Trim) - 1 Dim MyChar As String = Mid(MyString.Trim, i + 1, 1) Result = Result + Xformat(Hex(Microsoft.VisualBasic.AscW(MyChar))) Next End If Return Result End Function Private Function ConvertHexToString(ByVal MyString As String) As String Dim Result As String = vbNullString If Len(MyString) = 0 Then Result = vbNullString Else For i As Integer = 0 To Len(MyString.Trim) - 1 Step 4 Dim MyChar As String = Mid(MyString.Trim, i + 1, 4) Result = Result + Microsoft.VisualBasic.ChrW(Convert.ToInt32(MyChar, 16)) Next End If Return Result End Function Function Xformat(ByVal xin As String) As String Dim retval As String = xin Select Case Len(xin) Case Is = 3 retval = "0" & xin Case Is = 2 retval = "00" & xin Case Is = 1 retval = "000" & xin End Select Return retval End Function End Class

                    1 Reply Last reply
                    0
                    • L Lost User

                      Cool Smith wrote:

                      i found code here that can detect the code page of a file

                      Can you post a link to that article? I haven't read it yet :)

                      Cool Smith wrote:

                      can you give me pseudo code for this

                      It'd go something like this;

                      // A dictionary, used to count the frequencies
                      Dictionary characterCounter = new Dictionary();

                      // we'll read the entire file into a string;
                      string theFile = File.ReadAllText("C:\test.txt");

                      // we'll keep removing characters and process them, until the string is empty
                      while (theFile.Length > 0)
                      {
                      // get the char at the end of the string
                      string CurrentCharacter = theFile[theFile.Length -1];

                      // remove that thing from the string that holds the file
                      string theFile = theFile.Remove(theFile.Length -1, 1);

                      // if the dictionary contains our character
                      if (characterCounter.ContainsKey(CurrentCharacter))
                      {
                      // increase the value of the int
                      characterCounter[CurrentCharacter] = characterCounter[CurrentCharacter] + 1;
                      }
                      else
                      {
                      // it wasn't in the dictionary yet, so it must be the
                      // first time that we encounter this character. Add it;
                      characterCounter.Add(CurrentCharacter, 1);
                      }
                      }

                      // done with counting, now show the results to the user
                      for each (DictionaryEntry entry in characterCounter)
                      {
                      textBox1.Text += String.Format("char {0} occurs {1} times", entry.Key, entry.Value);
                      }

                      This could be a bit slow with large files, as it forces .NET to allocate memory each time for a new string. It'd be more efficient if it were a moving frame. That'd go something more like this;

                      string theFile = File.ReadAllText("C:\test.txt");

                      // this will point to the index of the character that we're processing
                      Int64 currentPos = 0;
                      Int64 endPos = theFile.Length -1;

                      // while the current position in the string doesn't match the end position;
                      while (currentPos <> endPos)
                      {
                      // fetch the current character from the string, at the current index
                      string CurrentCharacter = theFile[currentPos];

                      // increase the index
                      currentPos = currentPos + 1;
                      
                      // rest of dictionary-code here;
                      ...
                      

                      }

                      I are Troll :suss:

                      C Offline
                      C Offline
                      Cool Smith
                      wrote on last edited by
                      #10

                      first am using vb.net not c#, i tried convertin to vb.net using http://www.developerfusion.com/tools/convert/csharp-to-vb/[^], and i get many errors. Can you provide vb.net version?

                      L 1 Reply Last reply
                      0
                      • C Cool Smith

                        first am using vb.net not c#, i tried convertin to vb.net using http://www.developerfusion.com/tools/convert/csharp-to-vb/[^], and i get many errors. Can you provide vb.net version?

                        L Offline
                        L Offline
                        Lost User
                        wrote on last edited by
                        #11

                        Cool Smith wrote:

                        first am using vb.net not c#, i tried convertin to vb.net

                        You asked for pseudocode, and that's what it is.

                        Cool Smith wrote:

                        Can you provide vb.net version?

                        No, since it's not my job. You could post your code however, and people could have a look. That is, if you explain where you're stuck.

                        I are Troll :suss:

                        1 Reply Last reply
                        0
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Don't have an account? Register

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • World
                        • Users
                        • Groups