Codepages and character encoding
-
I am trying to translate some extended characters back into a more "English" representation, so that users can more easily search for the strings in a database. For example, the singer "Beyoncé" has a "e" with a symbol over it and we use this representation for the display name. This symbol can be entered by typing alt+130 on the numeric keypad. However, we also want to store the "normal" version of the string ("beyonce") in a second field, so that we can search for it without having to jump through hoops. We want to automatically generate the search name based on the text in the display name field. I've tried messing around with string replacements, lookup tables and other esoteric mechanisms, but nothing seems to work consistently. I'm now thinking I need to start looking at code pages and different encoding mechanisms, but the documentation on this is so terrible, it's difficult to know where to begin. Does anyone know of a reliable, platform-independet method to remove these extended characters and replace them with a suitable English equivalent? Additionally, is there a better way of entering these characters, other than the alt+numpad mechanism, which also seems to change depending on code pages and other witchcraft. eg: Beyoncé --> beyonce Mötley Crüe --> motley crue Thanks!
Sunrise Wallpaper Project | The StartPage Randomizer | The Windows Cheerleader