Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Visual Basic
  4. How to detect UTF-8-based encoded strings

How to detect UTF-8-based encoded strings

Scheduled Pinned Locked Moved Visual Basic
csharphtmldesignsales
6 Posts 4 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D Offline
    D Offline
    diegosendra
    wrote on last edited by
    #1

    Hi A customer of asked us to build him a multi-language based support VB6 scraper, for which we had the need to detect UTF-8 based encoded strings to decode it later for proper displaying in application UI. It's necessary to point out that this need arises based on VB6 limitations to natively support UTF-8 in its controls, contrary to what it happens in .NET where you can tell a control that it should expect UTF-8 encoding. VB6 natively supports ISO 8859-1 and/or Windows-1252 encodings only, for which textboxes, dropdowns, listview controls, others can't be defined to natively support/expect UTF-8 as you can do in .NET considering what we just explained; so we would see weird symbols such as é, è among others, making it a whole mess at the time of displaying. So, next function contains whole UTF-8 encoded punctuation marks and symbols from languages like Spanish, Italian, German, Portuguese, French and others, based on an excellent UTF-8 based list we got from this link - Ref. http://home.telfort.nl/~t876506/utf8tbl.html Basically, the function compares if each and one of the listed UTF-8 encoded sentences, separated by | (pipe) are found in our passed string making a substring search first. Whether it's not found, it makes an alternative ASCII value based search to get a match. Say, a string like "Societé" (Society in english) would return FALSE through calling isUTF8("Societé") while it would return TRUE when calling isUTF8("SocietÈ") since È is the UTF-8 encoded representation of é. Once you got it TRUE or FALSE, you can decode the string through DecodeUTF8() function for properly displaying it, a function we found somewhere else time ago and also included in this post. [code] Function isUTF8(ByVal ptstr As String) Dim tUTFencoded As String Dim tUTFencodedaux Dim tUTFencodedASCII As String Dim ptstrASCII As String Dim iaux, iaux2 As Integer Dim ffound As Boolean ffound = False ptstrASCII = "" For iaux = 1 To Len(ptstr) ptstrASCII = ptstrASCII & Asc(Mid(ptstr, iaux, 1)) & "|" Next tUTFencoded = "Ä|Ã…|Ç|É|Ñ|Ö|ÃŒ|á|Ã|â|ä|ã|Ã¥|ç|é|è|ê|ë|í|ì|î|ï|ñ|ó|ò|ô|ö|õ|ú|ù|û|ü|â€|°|¢|£|§|•|¶|ß|®|©|â„¢|´|¨|â‰|Æ|Ø|∞|±|≤|≥|Â¥|µ|∂|∑|∏|Ï€|∫|ª|º|Ω|æ|ø|¿|¡|¬|√|Æ’|≈|∆|«|»|…|Â|À|Ã|Õ|Å’|Å“|–|—|“|”|‘|’|÷|â—Š|ÿ|Ÿ|⁄|€|‹|›|fi|fl|‡|·|‚|„|‰|Â|Ú|Á|Ë|È|Í|ÃŽ|Ï|ÃŒ|Ó|Ô||Ã’|Ú|Û|Ù|ı|ˆ|Ëœ|¯|˘|Ë™|Ëš|¸|˝|Ë›|ˇ" & _

    Richard DeemingR 1 Reply Last reply
    0
    • D diegosendra

      Hi A customer of asked us to build him a multi-language based support VB6 scraper, for which we had the need to detect UTF-8 based encoded strings to decode it later for proper displaying in application UI. It's necessary to point out that this need arises based on VB6 limitations to natively support UTF-8 in its controls, contrary to what it happens in .NET where you can tell a control that it should expect UTF-8 encoding. VB6 natively supports ISO 8859-1 and/or Windows-1252 encodings only, for which textboxes, dropdowns, listview controls, others can't be defined to natively support/expect UTF-8 as you can do in .NET considering what we just explained; so we would see weird symbols such as é, è among others, making it a whole mess at the time of displaying. So, next function contains whole UTF-8 encoded punctuation marks and symbols from languages like Spanish, Italian, German, Portuguese, French and others, based on an excellent UTF-8 based list we got from this link - Ref. http://home.telfort.nl/~t876506/utf8tbl.html Basically, the function compares if each and one of the listed UTF-8 encoded sentences, separated by | (pipe) are found in our passed string making a substring search first. Whether it's not found, it makes an alternative ASCII value based search to get a match. Say, a string like "Societé" (Society in english) would return FALSE through calling isUTF8("Societé") while it would return TRUE when calling isUTF8("SocietÈ") since È is the UTF-8 encoded representation of é. Once you got it TRUE or FALSE, you can decode the string through DecodeUTF8() function for properly displaying it, a function we found somewhere else time ago and also included in this post. [code] Function isUTF8(ByVal ptstr As String) Dim tUTFencoded As String Dim tUTFencodedaux Dim tUTFencodedASCII As String Dim ptstrASCII As String Dim iaux, iaux2 As Integer Dim ffound As Boolean ffound = False ptstrASCII = "" For iaux = 1 To Len(ptstr) ptstrASCII = ptstrASCII & Asc(Mid(ptstr, iaux, 1)) & "|" Next tUTFencoded = "Ä|Ã…|Ç|É|Ñ|Ö|ÃŒ|á|Ã|â|ä|ã|Ã¥|ç|é|è|ê|ë|í|ì|î|ï|ñ|ó|ò|ô|ö|õ|ú|ù|û|ü|â€|°|¢|£|§|•|¶|ß|®|©|â„¢|´|¨|â‰|Æ|Ø|∞|±|≤|≥|Â¥|µ|∂|∑|∏|Ï€|∫|ª|º|Ω|æ|ø|¿|¡|¬|√|Æ’|≈|∆|«|»|…|Â|À|Ã|Õ|Å’|Å“|–|—|“|”|‘|’|÷|â—Š|ÿ|Ÿ|⁄|€|‹|›|fi|fl|‡|·|‚|„|‰|Â|Ú|Á|Ë|È|Í|ÃŽ|Ï|ÃŒ|Ó|Ô||Ã’|Ú|Û|Ù|ı|ˆ|Ëœ|¯|˘|Ë™|Ëš|¸|˝|Ë›|ˇ" & _

      Richard DeemingR Offline
      Richard DeemingR Offline
      Richard Deeming
      wrote on last edited by
      #2

      This forum is for asking questions - hence the big red button labelled "Ask a Question". Your post looks more like it should be a tip/trick[^]. Also, posting your email address in a public forum is a great way to have your inbox flooded with spam.


      "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

      "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

      D 1 Reply Last reply
      0
      • Richard DeemingR Richard Deeming

        This forum is for asking questions - hence the big red button labelled "Ask a Question". Your post looks more like it should be a tip/trick[^]. Also, posting your email address in a public forum is a great way to have your inbox flooded with spam.


        "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

        D Offline
        D Offline
        diegosendra
        wrote on last edited by
        #3

        Hi Richard, maybe we didn't know there was such a tip/tricks section?

        R 1 Reply Last reply
        0
        • D diegosendra

          Hi Richard, maybe we didn't know there was such a tip/tricks section?

          R Offline
          R Offline
          Roy Heil
          wrote on last edited by
          #4

          There is an Articles section, where you can submit articles and tips.

          Roy.

          D 1 Reply Last reply
          0
          • R Roy Heil

            There is an Articles section, where you can submit articles and tips.

            Roy.

            D Offline
            D Offline
            diegosendra
            wrote on last edited by
            #5

            Great, I've posted it there now. Didn't know such a section existed in the site

            L 1 Reply Last reply
            0
            • D diegosendra

              Great, I've posted it there now. Didn't know such a section existed in the site

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              diegosendra wrote:

              Didn't know such a section existed in the site

              Strange then that you managed to submit this Tip[^] successfully there, last October!

              Use the best guess

              1 Reply Last reply
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups