I think you should forget about regular expressions as a solution to your problem. You really should consider cleaning up and re-working your code first... at least for the learning experience. Learning how to do these types of manipulations is pretty important and, looking at your code, you're just not there, yet. All those temp variables and creating new strings in the loops area really expensive. Learn how strings work. Learn what immutable means and what happens when you build strings repeatedly within a loop. As a first step, start from the basics. Learn to traverse a string and manipulate it character by character (as you attempted above). Start with something like this:
private string CleanString(string dirtyString)
{
StringBuilder cleanString = new StringBuilder(); // Learn what this does and why to use it
foreach (char c in dirtyString)
{
// Note: C# strings are made up of 2-byte Unicode/UTF-16 characters, not ASCII characters.
if ((c != '\u0009') || (c != '\u000B') ... etc. )
{
// if character is not dirty, add it to the new string
cleanString.Append(c);
}
}
return (cleanString.ToString());
}
Get that working, but then start using .NET's built in methods to improve your code. Next, read about string.IndexOf(char) so you can search the entire string at once for a character. Rewrite your code and get that working. Then, try creating an array of "dirty characters" so you can search for them all at once. Start by reading about this stuff:
char[] dirtyChars = new char[] { '\u0009', '\u000B', ... etc. };
int dirtyIndex = dirtyString.IndexOfAny(dirtyChars);
Then rewrite your code again and get it working. Then read about regular expressions, if you're curious. Will regular expressions work better? Maybe marginally... that's a really small "maybe." Probably not enough to matter. More readable?... I doubt it. Enjoy, Robert C. Cartaino
modified on Wednesday, November 19, 2008 5:24 PM