Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Transform non english words to a unique representation

Transform non english words to a unique representation

Scheduled Pinned Locked Moved C#
csharp
5 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H Offline
    H Offline
    HZ_79
    wrote on last edited by
    #1

    Hello everybody, I need a method in c# to transform non english word to a unique string representation. The method should detect if the word contains non english characters and, only in that case it converts the word to diffrent representation, the algo is the following:

    String transformString(String inputString)
    {
    if(inputString.containsNonEnglishChar())
    {
    String res = "";
    foreach(char ch in inputString)
    {
    res += transformChar(ch);
    }
    return res;
    }
    return inputString;// return the word as is
    }

    I can write the method my way, but I prefer to find something standard, like base 64 or URL encoding or something famous. Thanks in advance.

    HZ

    U D 2 Replies Last reply
    0
    • H HZ_79

      Hello everybody, I need a method in c# to transform non english word to a unique string representation. The method should detect if the word contains non english characters and, only in that case it converts the word to diffrent representation, the algo is the following:

      String transformString(String inputString)
      {
      if(inputString.containsNonEnglishChar())
      {
      String res = "";
      foreach(char ch in inputString)
      {
      res += transformChar(ch);
      }
      return res;
      }
      return inputString;// return the word as is
      }

      I can write the method my way, but I prefer to find something standard, like base 64 or URL encoding or something famous. Thanks in advance.

      HZ

      U Offline
      U Offline
      Uwe Keim
      wrote on last edited by
      #2

      The System.Char structure [^] has some properties like IsLetter. Probably there are similar functions that are culture aware, for which you basically pass some English culture (e.g. "en-US") and then call "IsLetter" and get a true/false if it is or not.

      • My personal 24/7 webcam • Zeta Test - Intuitive, competitive Test Management environment for Test Plans and Test Cases. Download now! • Zeta Producer Desktop CMS - Intuitive, very easy to use. Download now!

      H 1 Reply Last reply
      0
      • U Uwe Keim

        The System.Char structure [^] has some properties like IsLetter. Probably there are similar functions that are culture aware, for which you basically pass some English culture (e.g. "en-US") and then call "IsLetter" and get a true/false if it is or not.

        • My personal 24/7 webcam • Zeta Test - Intuitive, competitive Test Management environment for Test Plans and Test Cases. Download now! • Zeta Producer Desktop CMS - Intuitive, very easy to use. Download now!

        H Offline
        H Offline
        HZ_79
        wrote on last edited by
        #3

        Thanks man, I can know if a char is in the ASCII range, I can do this: if((int)ch <= 255), and this is sufficient in my case, but I try to find a better way to encode the string like the url format. As Does the HttpServerUtility.UrlEncode method. But I cannot add a reference to the System.web namespace, I have to find some alternative. Thanks for your advice.

        HZ

        L 1 Reply Last reply
        0
        • H HZ_79

          Thanks man, I can know if a char is in the ASCII range, I can do this: if((int)ch <= 255), and this is sufficient in my case, but I try to find a better way to encode the string like the url format. As Does the HttpServerUtility.UrlEncode method. But I cannot add a reference to the System.web namespace, I have to find some alternative. Thanks for your advice.

          HZ

          L Offline
          L Offline
          Luc Pattyn
          wrote on last edited by
          #4

          HZ_79 wrote:

          I cannot add a reference to the System.web namespace

          Why is it so? just do: Solution Pane/Add Reference/.NET/System.Web then insert a using statement. :)

          Luc Pattyn [Forum Guidelines] [My Articles]


          The quality and detail of your question reflects on the effectiveness of the help you are likely to get. Show formatted code inside PRE tags, and give clear symptoms when describing a problem.


          1 Reply Last reply
          0
          • H HZ_79

            Hello everybody, I need a method in c# to transform non english word to a unique string representation. The method should detect if the word contains non english characters and, only in that case it converts the word to diffrent representation, the algo is the following:

            String transformString(String inputString)
            {
            if(inputString.containsNonEnglishChar())
            {
            String res = "";
            foreach(char ch in inputString)
            {
            res += transformChar(ch);
            }
            return res;
            }
            return inputString;// return the word as is
            }

            I can write the method my way, but I prefer to find something standard, like base 64 or URL encoding or something famous. Thanks in advance.

            HZ

            D Offline
            D Offline
            Daniel Grunwald
            wrote on last edited by
            #5

            One possibility to encode all non-ASCII characters is to use UTF-7[^]: encode: Encoding.ASCII.GetString(Encoding.UTF7.GetBytes(text)) decode: Encoding.UTF7.GetString(Encoding.ASCII.GetBytes(encodedString))

            modified on Wednesday, June 3, 2009 10:55 AM

            1 Reply Last reply
            0
            Reply
            • Reply as topic
            Log in to reply
            • Oldest to Newest
            • Newest to Oldest
            • Most Votes


            • Login

            • Don't have an account? Register

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • World
            • Users
            • Groups