Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Visual Basic
  4. count character frequency

count character frequency

Scheduled Pinned Locked Moved Visual Basic
questioncsharp
11 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Cool Smith

    i need to examine a text file containing Arabic writings, by counting each character frequency of each character, how do i do this in vb.net? Do i need to detect the file code-page before reading? do i need to convert the file code-page before doing anything? AM using vb.net 2008

    T Offline
    T Offline
    Tieske8
    wrote on last edited by
    #2

    For my 2cts; I don't think there is any way to detect the codepage, you ought to know it before opening it. Text files don't have any metadata to tell you anything about its content. So if you know the encoding, you can load the full file by using System.IO.File.ReadAllText(path As String, encoding As Encoding) As String. From here you should have a properly encoded string and you can start counting characters.

    If you want something done fast, then do it right (Grissom, CSI) Thanks for your reply, you just acknowledged my existence

    C 1 Reply Last reply
    0
    • T Tieske8

      For my 2cts; I don't think there is any way to detect the codepage, you ought to know it before opening it. Text files don't have any metadata to tell you anything about its content. So if you know the encoding, you can load the full file by using System.IO.File.ReadAllText(path As String, encoding As Encoding) As String. From here you should have a properly encoded string and you can start counting characters.

      If you want something done fast, then do it right (Grissom, CSI) Thanks for your reply, you just acknowledged my existence

      C Offline
      C Offline
      Cool Smith
      wrote on last edited by
      #3

      what i want to do is count the frequency of characters that appears in an Arabic text

      T D 2 Replies Last reply
      0
      • C Cool Smith

        what i want to do is count the frequency of characters that appears in an Arabic text

        T Offline
        T Offline
        Tieske8
        wrote on last edited by
        #4

        Have you tried it? Or can you be more specific? A string in .NET is not just a list of bytes. Every character consists of 1 or more bytes depending on the encoding used. The method provided will read the file into a proper encoded string. All you have to do is traverse the string and count the characters.

        If you want something done fast, then do it right (Grissom, CSI) Thanks for your reply, you just acknowledged my existence

        1 Reply Last reply
        0
        • C Cool Smith

          what i want to do is count the frequency of characters that appears in an Arabic text

          D Offline
          D Offline
          Dave Kreskowiak
          wrote on last edited by
          #5

          All you have to do is read the file into a String, then iterate over the Characters in the string and add them to a Dictionary collection. Since Dictionary is a key/value pair collection, the "key" will be the character you are looking at. The "value" will be the count of those characters. When you go to add the character to the collection, you first see if it is already there, and if so, get it's value and increment it by one. If not, add the new key with a value of 1. Move on the next character...

          A guide to posting questions on CodeProject[^]
          Dave Kreskowiak

          1 Reply Last reply
          0
          • C Cool Smith

            i need to examine a text file containing Arabic writings, by counting each character frequency of each character, how do i do this in vb.net? Do i need to detect the file code-page before reading? do i need to convert the file code-page before doing anything? AM using vb.net 2008

            L Offline
            L Offline
            Lost User
            wrote on last edited by
            #6

            Cool Smith wrote:

            i need to examine a text file containing Arabic writings

            What format is it in? Ideally, it'd be UTF. It's important since the encoding determines the length of a single character. Download a HEX-editor and open the textfile with it - what do the first bytes look like in HEX?

            Cool Smith wrote:

            Do i need to detect the file code-page before reading?

            There's no way of detecting it with good precision, but Notepad can tale an educated guess[^]. If you have any say in it, then it should be UTF. If you don't, ask which codepage was used to write the files. There'll be a difference in Windows Arabic 1256[^] and DOS Arabic 864[^]

            Cool Smith wrote:

            by counting each character frequency of each character, how do i do this in vb.net?

            First, determine the encoding, and read the file with that encoding. Then create a dictionary, read the entire file as a string. Loop through the string by eating characters, adding them to the dictionary as the key, or adding +1 to it's value if it's already in the dictionary. When done eating, burp out the results :)

            I are Troll :suss:

            C 1 Reply Last reply
            0
            • L Lost User

              Cool Smith wrote:

              i need to examine a text file containing Arabic writings

              What format is it in? Ideally, it'd be UTF. It's important since the encoding determines the length of a single character. Download a HEX-editor and open the textfile with it - what do the first bytes look like in HEX?

              Cool Smith wrote:

              Do i need to detect the file code-page before reading?

              There's no way of detecting it with good precision, but Notepad can tale an educated guess[^]. If you have any say in it, then it should be UTF. If you don't, ask which codepage was used to write the files. There'll be a difference in Windows Arabic 1256[^] and DOS Arabic 864[^]

              Cool Smith wrote:

              by counting each character frequency of each character, how do i do this in vb.net?

              First, determine the encoding, and read the file with that encoding. Then create a dictionary, read the entire file as a string. Loop through the string by eating characters, adding them to the dictionary as the key, or adding +1 to it's value if it's already in the dictionary. When done eating, burp out the results :)

              I are Troll :suss:

              C Offline
              C Offline
              Cool Smith
              wrote on last edited by
              #7

              What format is it in? Ideally, it'd be UTF. It's important since the encoding determines the length of a single character. Download a HEX-editor and open the textfile with it - what do the first bytes look like in HEX?

              The this is, the software will be examining different text files (*.txt) only that contains arabic writings. i found code here that can detect the code page of a file and another that can convert between different code page.

              First, determine the encoding, and read the file with that encoding. Then create a dictionary, read the entire file as a string. Loop through the string by eating characters, adding them to the dictionary as the key, or adding +1 to it's value if it's already in the dictionary. When done eating, burp out the results

              can you give me pseudo code for this, i don't have any idea how to do it

              L 1 Reply Last reply
              0
              • C Cool Smith

                What format is it in? Ideally, it'd be UTF. It's important since the encoding determines the length of a single character. Download a HEX-editor and open the textfile with it - what do the first bytes look like in HEX?

                The this is, the software will be examining different text files (*.txt) only that contains arabic writings. i found code here that can detect the code page of a file and another that can convert between different code page.

                First, determine the encoding, and read the file with that encoding. Then create a dictionary, read the entire file as a string. Loop through the string by eating characters, adding them to the dictionary as the key, or adding +1 to it's value if it's already in the dictionary. When done eating, burp out the results

                can you give me pseudo code for this, i don't have any idea how to do it

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #8

                Cool Smith wrote:

                i found code here that can detect the code page of a file

                Can you post a link to that article? I haven't read it yet :)

                Cool Smith wrote:

                can you give me pseudo code for this

                It'd go something like this;

                // A dictionary, used to count the frequencies
                Dictionary characterCounter = new Dictionary();

                // we'll read the entire file into a string;
                string theFile = File.ReadAllText("C:\test.txt");

                // we'll keep removing characters and process them, until the string is empty
                while (theFile.Length > 0)
                {
                // get the char at the end of the string
                string CurrentCharacter = theFile[theFile.Length -1];

                // remove that thing from the string that holds the file
                string theFile = theFile.Remove(theFile.Length -1, 1);

                // if the dictionary contains our character
                if (characterCounter.ContainsKey(CurrentCharacter))
                {
                // increase the value of the int
                characterCounter[CurrentCharacter] = characterCounter[CurrentCharacter] + 1;
                }
                else
                {
                // it wasn't in the dictionary yet, so it must be the
                // first time that we encounter this character. Add it;
                characterCounter.Add(CurrentCharacter, 1);
                }
                }

                // done with counting, now show the results to the user
                for each (DictionaryEntry entry in characterCounter)
                {
                textBox1.Text += String.Format("char {0} occurs {1} times", entry.Key, entry.Value);
                }

                This could be a bit slow with large files, as it forces .NET to allocate memory each time for a new string. It'd be more efficient if it were a moving frame. That'd go something more like this;

                string theFile = File.ReadAllText("C:\test.txt");

                // this will point to the index of the character that we're processing
                Int64 currentPos = 0;
                Int64 endPos = theFile.Length -1;

                // while the current position in the string doesn't match the end position;
                while (currentPos <> endPos)
                {
                // fetch the current character from the string, at the current index
                string CurrentCharacter = theFile[currentPos];

                // increase the index
                currentPos = currentPos + 1;
                
                // rest of dictionary-code here;
                ...
                

                }

                I are Troll :suss:

                C 2 Replies Last reply
                0
                • L Lost User

                  Cool Smith wrote:

                  i found code here that can detect the code page of a file

                  Can you post a link to that article? I haven't read it yet :)

                  Cool Smith wrote:

                  can you give me pseudo code for this

                  It'd go something like this;

                  // A dictionary, used to count the frequencies
                  Dictionary characterCounter = new Dictionary();

                  // we'll read the entire file into a string;
                  string theFile = File.ReadAllText("C:\test.txt");

                  // we'll keep removing characters and process them, until the string is empty
                  while (theFile.Length > 0)
                  {
                  // get the char at the end of the string
                  string CurrentCharacter = theFile[theFile.Length -1];

                  // remove that thing from the string that holds the file
                  string theFile = theFile.Remove(theFile.Length -1, 1);

                  // if the dictionary contains our character
                  if (characterCounter.ContainsKey(CurrentCharacter))
                  {
                  // increase the value of the int
                  characterCounter[CurrentCharacter] = characterCounter[CurrentCharacter] + 1;
                  }
                  else
                  {
                  // it wasn't in the dictionary yet, so it must be the
                  // first time that we encounter this character. Add it;
                  characterCounter.Add(CurrentCharacter, 1);
                  }
                  }

                  // done with counting, now show the results to the user
                  for each (DictionaryEntry entry in characterCounter)
                  {
                  textBox1.Text += String.Format("char {0} occurs {1} times", entry.Key, entry.Value);
                  }

                  This could be a bit slow with large files, as it forces .NET to allocate memory each time for a new string. It'd be more efficient if it were a moving frame. That'd go something more like this;

                  string theFile = File.ReadAllText("C:\test.txt");

                  // this will point to the index of the character that we're processing
                  Int64 currentPos = 0;
                  Int64 endPos = theFile.Length -1;

                  // while the current position in the string doesn't match the end position;
                  while (currentPos <> endPos)
                  {
                  // fetch the current character from the string, at the current index
                  string CurrentCharacter = theFile[currentPos];

                  // increase the index
                  currentPos = currentPos + 1;
                  
                  // rest of dictionary-code here;
                  ...
                  

                  }

                  I are Troll :suss:

                  C Offline
                  C Offline
                  Cool Smith
                  wrote on last edited by
                  #9

                  here are the links CodePage File Converter[^] Detect Encoding for In- and Outgoing Text[^] i'll try your implementation and and back to you. besides i found a hextostring code, will it work well for recognizing single characters in a joined character Private Function ConvertStringToHex(ByVal MyString As String) As String Dim Result As String = vbNullString If Len(MyString) = 0 Then Result = vbNullString Else For i As Integer = 0 To Len(MyString.Trim) - 1 Dim MyChar As String = Mid(MyString.Trim, i + 1, 1) Result = Result + Xformat(Hex(Microsoft.VisualBasic.AscW(MyChar))) Next End If Return Result End Function Private Function ConvertHexToString(ByVal MyString As String) As String Dim Result As String = vbNullString If Len(MyString) = 0 Then Result = vbNullString Else For i As Integer = 0 To Len(MyString.Trim) - 1 Step 4 Dim MyChar As String = Mid(MyString.Trim, i + 1, 4) Result = Result + Microsoft.VisualBasic.ChrW(Convert.ToInt32(MyChar, 16)) Next End If Return Result End Function Function Xformat(ByVal xin As String) As String Dim retval As String = xin Select Case Len(xin) Case Is = 3 retval = "0" & xin Case Is = 2 retval = "00" & xin Case Is = 1 retval = "000" & xin End Select Return retval End Function End Class

                  1 Reply Last reply
                  0
                  • L Lost User

                    Cool Smith wrote:

                    i found code here that can detect the code page of a file

                    Can you post a link to that article? I haven't read it yet :)

                    Cool Smith wrote:

                    can you give me pseudo code for this

                    It'd go something like this;

                    // A dictionary, used to count the frequencies
                    Dictionary characterCounter = new Dictionary();

                    // we'll read the entire file into a string;
                    string theFile = File.ReadAllText("C:\test.txt");

                    // we'll keep removing characters and process them, until the string is empty
                    while (theFile.Length > 0)
                    {
                    // get the char at the end of the string
                    string CurrentCharacter = theFile[theFile.Length -1];

                    // remove that thing from the string that holds the file
                    string theFile = theFile.Remove(theFile.Length -1, 1);

                    // if the dictionary contains our character
                    if (characterCounter.ContainsKey(CurrentCharacter))
                    {
                    // increase the value of the int
                    characterCounter[CurrentCharacter] = characterCounter[CurrentCharacter] + 1;
                    }
                    else
                    {
                    // it wasn't in the dictionary yet, so it must be the
                    // first time that we encounter this character. Add it;
                    characterCounter.Add(CurrentCharacter, 1);
                    }
                    }

                    // done with counting, now show the results to the user
                    for each (DictionaryEntry entry in characterCounter)
                    {
                    textBox1.Text += String.Format("char {0} occurs {1} times", entry.Key, entry.Value);
                    }

                    This could be a bit slow with large files, as it forces .NET to allocate memory each time for a new string. It'd be more efficient if it were a moving frame. That'd go something more like this;

                    string theFile = File.ReadAllText("C:\test.txt");

                    // this will point to the index of the character that we're processing
                    Int64 currentPos = 0;
                    Int64 endPos = theFile.Length -1;

                    // while the current position in the string doesn't match the end position;
                    while (currentPos <> endPos)
                    {
                    // fetch the current character from the string, at the current index
                    string CurrentCharacter = theFile[currentPos];

                    // increase the index
                    currentPos = currentPos + 1;
                    
                    // rest of dictionary-code here;
                    ...
                    

                    }

                    I are Troll :suss:

                    C Offline
                    C Offline
                    Cool Smith
                    wrote on last edited by
                    #10

                    first am using vb.net not c#, i tried convertin to vb.net using http://www.developerfusion.com/tools/convert/csharp-to-vb/[^], and i get many errors. Can you provide vb.net version?

                    L 1 Reply Last reply
                    0
                    • C Cool Smith

                      first am using vb.net not c#, i tried convertin to vb.net using http://www.developerfusion.com/tools/convert/csharp-to-vb/[^], and i get many errors. Can you provide vb.net version?

                      L Offline
                      L Offline
                      Lost User
                      wrote on last edited by
                      #11

                      Cool Smith wrote:

                      first am using vb.net not c#, i tried convertin to vb.net

                      You asked for pseudocode, and that's what it is.

                      Cool Smith wrote:

                      Can you provide vb.net version?

                      No, since it's not my job. You could post your code however, and people could have a look. That is, if you explain where you're stuck.

                      I are Troll :suss:

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups