Fast string to integer conversion
-
I am parsing a live feed that contains many numbers. The feed comes from a trusted source (i.e. format is guaranteed to be correct), and the numbers are simple ASCII integers (no localization, no thousands separator, etc).
Int32.Parse
is simply too slow for such "simple" operations. In my benchmarks, it's taking 5-16 times more than some of my implementations. -
So you say you've written a fast implementation of int parsing, that performs better than int.Parse - so what exactly are you asking? If you've done it right? You've not posted your code if so!
I am asking whether there exists a library that includes such functions, in hope that it would be better tested and optimized than my own implementation. If there isn't, I was hoping to get some links to papers or algorithms for how to convert a string to a number efficiently.
-
I am searching for a fast string to integer conversion algorithm.
Int32.Parse
is too slow for my purposes. I implemented an algorithm that was faster by orders of magnitude, but I was hoping someone could point me to an existing library that includes such functions, or at least to a paper to make sure I'm working correctly. Thank you.Parsing a string is not very hard. I put together this code, it should cover the basics:
public static int ParseInt32(string text) {
long value = 0;
long sign = 1;
bool first = true;
foreach (char c in text) {
if (c >= '0' && c <= '9') {
value = value * 10 + c - '0';
} else if (c == '-' && first) {
sign = -1;
} else {
throw new FormatException();
}
first = false;
}
value *= sign;
if (value < int.MinValue || value > int.MaxValue) throw new OverflowException();
return (int)value;
}Despite everything, the person most likely to be fooling you next is yourself.
-
I am asking whether there exists a library that includes such functions, in hope that it would be better tested and optimized than my own implementation. If there isn't, I was hoping to get some links to papers or algorithms for how to convert a string to a number efficiently.
Ok, I see. The performance issue with Int32.Parse is because, like alot of things in the framework, it tries to cater for every man's needs and therefore includes the ability, and logic (therefore overhead) to parse lots of different types of string (eg, currency, hex etc). If you are satisfied that you are only receiving decimal-formatted numbers, without any other markup (such as separators) you could look at Int32.Parse in reflector and follow the trail to the unmanaged code that does the actual parsing - this will lead you into an internal class called
Number
which has an unsafe methodParseNumber
that does the bulk of the work. Your job would be to identify the bit that parses a decimal-formatted number and just extract that portion into your own library. -
Parsing a string is not very hard. I put together this code, it should cover the basics:
public static int ParseInt32(string text) {
long value = 0;
long sign = 1;
bool first = true;
foreach (char c in text) {
if (c >= '0' && c <= '9') {
value = value * 10 + c - '0';
} else if (c == '-' && first) {
sign = -1;
} else {
throw new FormatException();
}
first = false;
}
value *= sign;
if (value < int.MinValue || value > int.MaxValue) throw new OverflowException();
return (int)value;
}Despite everything, the person most likely to be fooling you next is yourself.
That was brilliant and simple. 5ed
Navaneeth How to use google | Ask smart questions
-
That was brilliant and simple. 5ed
Navaneeth How to use google | Ask smart questions
You could also disable array bound checking, which could be even faster. Check the documentation for "unsafe" and "fixed" C# keywords.
-
That was brilliant and simple. 5ed
Navaneeth How to use google | Ask smart questions
-
Parsing a string is not very hard. I put together this code, it should cover the basics:
public static int ParseInt32(string text) {
long value = 0;
long sign = 1;
bool first = true;
foreach (char c in text) {
if (c >= '0' && c <= '9') {
value = value * 10 + c - '0';
} else if (c == '-' && first) {
sign = -1;
} else {
throw new FormatException();
}
first = false;
}
value *= sign;
if (value < int.MinValue || value > int.MaxValue) throw new OverflowException();
return (int)value;
}Despite everything, the person most likely to be fooling you next is yourself.
-
You could also disable array bound checking, which could be even faster. Check the documentation for "unsafe" and "fixed" C# keywords.
Sometimes yes, sometimes no. According to my measurements, sometimes the cost of using
fixed
is almost the same as bounds checking (especially for small arrays), because it calls a function to pin the referred object and return a pointer to it. (You can check it with IL DASM.) -
Sometimes yes, sometimes no. According to my measurements, sometimes the cost of using
fixed
is almost the same as bounds checking (especially for small arrays), because it calls a function to pin the referred object and return a pointer to it. (You can check it with IL DASM.)Absolutely. One must choose the solution that fits best, that is why testing more than one way to achieve this is the best way to find out.