Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Regular Expressions
  4. regex to replace accents

regex to replace accents

Scheduled Pinned Locked Moved Regular Expressions
regex
9 Posts 4 Posters 20 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    Member_14890678
    wrote on last edited by
    #1

    I have this regex to replace accents but it fails if the text contains "|". string Text = "word|word2";

    // Regex.
    System.Text.RegularExpressions.Regex replace_a_Accents = new System.Text.RegularExpressions.Regex("[á|à|ä|â]", System.Text.RegularExpressions.RegexOptions.Compiled);
    System.Text.RegularExpressions.Regex replace_e_Accents = new System.Text.RegularExpressions.Regex("[é|è|ë|ê]", System.Text.RegularExpressions.RegexOptions.Compiled);
    System.Text.RegularExpressions.Regex replace_i_Accents = new System.Text.RegularExpressions.Regex("[í|ì|ï|î]", System.Text.RegularExpressions.RegexOptions.Compiled);
    System.Text.RegularExpressions.Regex replace_o_Accents = new System.Text.RegularExpressions.Regex("[ó|ò|ö|ô]", System.Text.RegularExpressions.RegexOptions.Compiled);
    System.Text.RegularExpressions.Regex replace_u_Accents = new System.Text.RegularExpressions.Regex("[ú|ù|ü|û]", System.Text.RegularExpressions.RegexOptions.Compiled);

    // Reemplaza.
    Texto_Retorno = replace_a_Accents.Replace(Texto_Retorno, "a");
    Texto_Retorno = replace_e_Accents.Replace(Texto_Retorno, "e");
    Texto_Retorno = replace_i_Accents.Replace(Texto_Retorno, "i");
    Texto_Retorno = replace_o_Accents.Replace(Texto_Retorno, "o");
    Texto_Retorno = replace_u_Accents.Replace(Texto_Retorno, "u");

    Richard DeemingR P 2 Replies Last reply
    0
    • M Member_14890678

      I have this regex to replace accents but it fails if the text contains "|". string Text = "word|word2";

      // Regex.
      System.Text.RegularExpressions.Regex replace_a_Accents = new System.Text.RegularExpressions.Regex("[á|à|ä|â]", System.Text.RegularExpressions.RegexOptions.Compiled);
      System.Text.RegularExpressions.Regex replace_e_Accents = new System.Text.RegularExpressions.Regex("[é|è|ë|ê]", System.Text.RegularExpressions.RegexOptions.Compiled);
      System.Text.RegularExpressions.Regex replace_i_Accents = new System.Text.RegularExpressions.Regex("[í|ì|ï|î]", System.Text.RegularExpressions.RegexOptions.Compiled);
      System.Text.RegularExpressions.Regex replace_o_Accents = new System.Text.RegularExpressions.Regex("[ó|ò|ö|ô]", System.Text.RegularExpressions.RegexOptions.Compiled);
      System.Text.RegularExpressions.Regex replace_u_Accents = new System.Text.RegularExpressions.Regex("[ú|ù|ü|û]", System.Text.RegularExpressions.RegexOptions.Compiled);

      // Reemplaza.
      Texto_Retorno = replace_a_Accents.Replace(Texto_Retorno, "a");
      Texto_Retorno = replace_e_Accents.Replace(Texto_Retorno, "e");
      Texto_Retorno = replace_i_Accents.Replace(Texto_Retorno, "i");
      Texto_Retorno = replace_o_Accents.Replace(Texto_Retorno, "o");
      Texto_Retorno = replace_u_Accents.Replace(Texto_Retorno, "u");

      Richard DeemingR Offline
      Richard DeemingR Offline
      Richard Deeming
      wrote on last edited by
      #2

      Why would you use a Regex for something so simple?

      bool replacedAny = false;
      char[] characters = Texto_Retorno.ToCharArray();
      for (int index = 0; index < characters.Length; index++)
      {
      switch (characters[index])
      {
      case 'á':
      case 'à':
      case 'ä':
      case 'â':
      {
      characters[index] = 'a';
      replacedAny = true;
      break;
      }
      case 'é':
      case 'è':
      case 'ë':
      case 'ê':
      {
      characters[index] = 'e';
      replacedAny = true;
      break;
      }
      case 'í':
      case 'ì':
      case 'ï':
      case 'î':
      {
      characters[index] = 'i';
      replacedAny = true;
      break;
      }
      case 'ó':
      case 'ò':
      case 'ö':
      case 'ô':
      {
      characters[index] = 'o';
      replacedAny = true;
      break;
      }
      case 'ú':
      case 'ù':
      case 'ü':
      case 'û':
      {
      characters[index] = 'u';
      replacedAny = true;
      break;
      }
      }
      }

      if (replacedAny)
      {
      Texto_Retorno = new string(characters);
      }


      "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

      "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

      L M 3 Replies Last reply
      0
      • Richard DeemingR Richard Deeming

        Why would you use a Regex for something so simple?

        bool replacedAny = false;
        char[] characters = Texto_Retorno.ToCharArray();
        for (int index = 0; index < characters.Length; index++)
        {
        switch (characters[index])
        {
        case 'á':
        case 'à':
        case 'ä':
        case 'â':
        {
        characters[index] = 'a';
        replacedAny = true;
        break;
        }
        case 'é':
        case 'è':
        case 'ë':
        case 'ê':
        {
        characters[index] = 'e';
        replacedAny = true;
        break;
        }
        case 'í':
        case 'ì':
        case 'ï':
        case 'î':
        {
        characters[index] = 'i';
        replacedAny = true;
        break;
        }
        case 'ó':
        case 'ò':
        case 'ö':
        case 'ô':
        {
        characters[index] = 'o';
        replacedAny = true;
        break;
        }
        case 'ú':
        case 'ù':
        case 'ü':
        case 'û':
        {
        characters[index] = 'u';
        replacedAny = true;
        break;
        }
        }
        }

        if (replacedAny)
        {
        Texto_Retorno = new string(characters);
        }


        "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

        L Offline
        L Offline
        Lost User
        wrote on last edited by
        #3

        characters[index] = '3'; // ??

        Richard DeemingR 1 Reply Last reply
        0
        • M Member_14890678

          I have this regex to replace accents but it fails if the text contains "|". string Text = "word|word2";

          // Regex.
          System.Text.RegularExpressions.Regex replace_a_Accents = new System.Text.RegularExpressions.Regex("[á|à|ä|â]", System.Text.RegularExpressions.RegexOptions.Compiled);
          System.Text.RegularExpressions.Regex replace_e_Accents = new System.Text.RegularExpressions.Regex("[é|è|ë|ê]", System.Text.RegularExpressions.RegexOptions.Compiled);
          System.Text.RegularExpressions.Regex replace_i_Accents = new System.Text.RegularExpressions.Regex("[í|ì|ï|î]", System.Text.RegularExpressions.RegexOptions.Compiled);
          System.Text.RegularExpressions.Regex replace_o_Accents = new System.Text.RegularExpressions.Regex("[ó|ò|ö|ô]", System.Text.RegularExpressions.RegexOptions.Compiled);
          System.Text.RegularExpressions.Regex replace_u_Accents = new System.Text.RegularExpressions.Regex("[ú|ù|ü|û]", System.Text.RegularExpressions.RegexOptions.Compiled);

          // Reemplaza.
          Texto_Retorno = replace_a_Accents.Replace(Texto_Retorno, "a");
          Texto_Retorno = replace_e_Accents.Replace(Texto_Retorno, "e");
          Texto_Retorno = replace_i_Accents.Replace(Texto_Retorno, "i");
          Texto_Retorno = replace_o_Accents.Replace(Texto_Retorno, "o");
          Texto_Retorno = replace_u_Accents.Replace(Texto_Retorno, "u");

          P Offline
          P Offline
          Peter_in_2780
          wrote on last edited by
          #4

          In [...] groups (alternatives) you don't want the |s. [aeiou] as a regex will match any vowel, for example.

          Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012

          1 Reply Last reply
          0
          • L Lost User

            characters[index] = '3'; // ??

            Richard DeemingR Offline
            Richard DeemingR Offline
            Richard Deeming
            wrote on last edited by
            #5

            Nothing to see here. :innocent-whistle-smily: In my defence, "3" is directly above "e" on the keyboard. :laugh:


            "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

            "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

            L 1 Reply Last reply
            0
            • Richard DeemingR Richard Deeming

              Nothing to see here. :innocent-whistle-smily: In my defence, "3" is directly above "e" on the keyboard. :laugh:


              "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              Given some of the nonsense that appears when I type too fast I think you do very well.

              1 Reply Last reply
              0
              • Richard DeemingR Richard Deeming

                Why would you use a Regex for something so simple?

                bool replacedAny = false;
                char[] characters = Texto_Retorno.ToCharArray();
                for (int index = 0; index < characters.Length; index++)
                {
                switch (characters[index])
                {
                case 'á':
                case 'à':
                case 'ä':
                case 'â':
                {
                characters[index] = 'a';
                replacedAny = true;
                break;
                }
                case 'é':
                case 'è':
                case 'ë':
                case 'ê':
                {
                characters[index] = 'e';
                replacedAny = true;
                break;
                }
                case 'í':
                case 'ì':
                case 'ï':
                case 'î':
                {
                characters[index] = 'i';
                replacedAny = true;
                break;
                }
                case 'ó':
                case 'ò':
                case 'ö':
                case 'ô':
                {
                characters[index] = 'o';
                replacedAny = true;
                break;
                }
                case 'ú':
                case 'ù':
                case 'ü':
                case 'û':
                {
                characters[index] = 'u';
                replacedAny = true;
                break;
                }
                }
                }

                if (replacedAny)
                {
                Texto_Retorno = new string(characters);
                }


                "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

                M Offline
                M Offline
                Member_14890678
                wrote on last edited by
                #7

                I was dreading it, Regex doesn't do the job and doesn't replace either?

                Richard DeemingR 1 Reply Last reply
                0
                • Richard DeemingR Richard Deeming

                  Why would you use a Regex for something so simple?

                  bool replacedAny = false;
                  char[] characters = Texto_Retorno.ToCharArray();
                  for (int index = 0; index < characters.Length; index++)
                  {
                  switch (characters[index])
                  {
                  case 'á':
                  case 'à':
                  case 'ä':
                  case 'â':
                  {
                  characters[index] = 'a';
                  replacedAny = true;
                  break;
                  }
                  case 'é':
                  case 'è':
                  case 'ë':
                  case 'ê':
                  {
                  characters[index] = 'e';
                  replacedAny = true;
                  break;
                  }
                  case 'í':
                  case 'ì':
                  case 'ï':
                  case 'î':
                  {
                  characters[index] = 'i';
                  replacedAny = true;
                  break;
                  }
                  case 'ó':
                  case 'ò':
                  case 'ö':
                  case 'ô':
                  {
                  characters[index] = 'o';
                  replacedAny = true;
                  break;
                  }
                  case 'ú':
                  case 'ù':
                  case 'ü':
                  case 'û':
                  {
                  characters[index] = 'u';
                  replacedAny = true;
                  break;
                  }
                  }
                  }

                  if (replacedAny)
                  {
                  Texto_Retorno = new string(characters);
                  }


                  "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

                  M Offline
                  M Offline
                  Member_14890678
                  wrote on last edited by
                  #8

                  case 'Á' case 'À' case 'Ó' ... to be continue

                  1 Reply Last reply
                  0
                  • M Member_14890678

                    I was dreading it, Regex doesn't do the job and doesn't replace either?

                    Richard DeemingR Offline
                    Richard DeemingR Offline
                    Richard Deeming
                    wrote on last edited by
                    #9

                    Regex can do the job. But running five+ separate regex operations on a string just to replace a few letters with their unaccented alternatives is overkill. The other option, which is even nastier and less obvious, is to use Unicode normalization:

                    static string RemoveDiacritics(string stIn)
                    {
                    string stFormD = stIn.Normalize(NormalizationForm.FormD);
                    StringBuilder sb = new StringBuilder();

                    for(int ich = 0; ich < stFormD.Length; ich++) 
                    {
                        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD\[ich\]);
                        if (uc != UnicodeCategory.NonSpacingMark) 
                        {
                            sb.Append(stFormD\[ich\]);
                        }
                    }
                    
                    return sb.ToString().Normalize(NormalizationForm.FormC);
                    

                    }

                    string input = "Príliš žlutoucký kun úpel dábelské ódy.";
                    string result = RemoveDiacritics(input); // "Prilis zlutoucky kun upel dabelske ody."

                    Source[^]


                    "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

                    "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

                    1 Reply Last reply
                    0
                    Reply
                    • Reply as topic
                    Log in to reply
                    • Oldest to Newest
                    • Newest to Oldest
                    • Most Votes


                    • Login

                    • Don't have an account? Register

                    • Login or register to search.
                    • First post
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • World
                    • Users
                    • Groups