String manipulation performance issue
-
Hello, In order to show users a string representation of a 120kb file in Hex-format within a windows forms textbox I wrote a simple routine that converts all decimal bytes in a buffer into a hexadecimal string, taking formatting into account. The routine looks like this:
int x =0; string content = String.Format("\n{0:X5}: ", x); for (; x < Data.senderdata.Length; x++) { content += String.Format("{0:X2}", Data.senderdata[x]); #if DEBUG flp.setProgress(0, Data.senderdata.Length, x); #endif if ((x + 1) % 2 == 0) content += " "; if ((x + 1) % 16 == 0) content += String.Format("\n{0:X5}: ", (x+1)); }
When done, the content string is assigned to a textbox control. Perhaps I should mention "senderdata" is a byte array. The routine seems to take an insane amount of time to complete. I realize that it might take a few secconds, since senderdata.Length typically has a value of about 125K, but this is taking over a minute to complete, in a single threaded program running on a pentium 4. Is this due to the String.Format Hex-conversion routine or is it just that it's too hard for Windows to handle such long strings? Does anyone have a tip to improve on the performance of this? Thanks in advance for any help, Benny -
Hello, In order to show users a string representation of a 120kb file in Hex-format within a windows forms textbox I wrote a simple routine that converts all decimal bytes in a buffer into a hexadecimal string, taking formatting into account. The routine looks like this:
int x =0; string content = String.Format("\n{0:X5}: ", x); for (; x < Data.senderdata.Length; x++) { content += String.Format("{0:X2}", Data.senderdata[x]); #if DEBUG flp.setProgress(0, Data.senderdata.Length, x); #endif if ((x + 1) % 2 == 0) content += " "; if ((x + 1) % 16 == 0) content += String.Format("\n{0:X5}: ", (x+1)); }
When done, the content string is assigned to a textbox control. Perhaps I should mention "senderdata" is a byte array. The routine seems to take an insane amount of time to complete. I realize that it might take a few secconds, since senderdata.Length typically has a value of about 125K, but this is taking over a minute to complete, in a single threaded program running on a pentium 4. Is this due to the String.Format Hex-conversion routine or is it just that it's too hard for Windows to handle such long strings? Does anyone have a tip to improve on the performance of this? Thanks in advance for any help, Benny -
Hello, In order to show users a string representation of a 120kb file in Hex-format within a windows forms textbox I wrote a simple routine that converts all decimal bytes in a buffer into a hexadecimal string, taking formatting into account. The routine looks like this:
int x =0; string content = String.Format("\n{0:X5}: ", x); for (; x < Data.senderdata.Length; x++) { content += String.Format("{0:X2}", Data.senderdata[x]); #if DEBUG flp.setProgress(0, Data.senderdata.Length, x); #endif if ((x + 1) % 2 == 0) content += " "; if ((x + 1) % 16 == 0) content += String.Format("\n{0:X5}: ", (x+1)); }
When done, the content string is assigned to a textbox control. Perhaps I should mention "senderdata" is a byte array. The routine seems to take an insane amount of time to complete. I realize that it might take a few secconds, since senderdata.Length typically has a value of about 125K, but this is taking over a minute to complete, in a single threaded program running on a pentium 4. Is this due to the String.Format Hex-conversion routine or is it just that it's too hard for Windows to handle such long strings? Does anyone have a tip to improve on the performance of this? Thanks in advance for any help, BennyI think a main issue is the mass number of string concatenation. As a string is
immutable
, a new string object is created everytime you append something to your content variable. Try using StrinBuilder[^] instead.
"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." - Rick Cook
-
One possible suggestion is, you could replace the variable type of content, from string to StringBuilder instance. StringBuilder's concatenation are faster than string's. -- modified at 9:48 Tuesday 6th June, 2006
-
Hello, In order to show users a string representation of a 120kb file in Hex-format within a windows forms textbox I wrote a simple routine that converts all decimal bytes in a buffer into a hexadecimal string, taking formatting into account. The routine looks like this:
int x =0; string content = String.Format("\n{0:X5}: ", x); for (; x < Data.senderdata.Length; x++) { content += String.Format("{0:X2}", Data.senderdata[x]); #if DEBUG flp.setProgress(0, Data.senderdata.Length, x); #endif if ((x + 1) % 2 == 0) content += " "; if ((x + 1) % 16 == 0) content += String.Format("\n{0:X5}: ", (x+1)); }
When done, the content string is assigned to a textbox control. Perhaps I should mention "senderdata" is a byte array. The routine seems to take an insane amount of time to complete. I realize that it might take a few secconds, since senderdata.Length typically has a value of about 125K, but this is taking over a minute to complete, in a single threaded program running on a pentium 4. Is this due to the String.Format Hex-conversion routine or is it just that it's too hard for Windows to handle such long strings? Does anyone have a tip to improve on the performance of this? Thanks in advance for any help, BennyWhen the += operator is used on a string, it might appear like the string is appended to the end of the original string. This is not true, as strings are immutable in .NET. The statement:
content += " ";
is actually performed as:content = string.Concat(content, " ");
With that in mind, let's do some math to find out why the routine is so slow: Each iteration does either one, two or three concatenations. The first one is done every iteration, the second is done every other iteration, and the third is done every 16th iteration. This gives that: :: Each iteration does by average 1.5625 string concatenations. :: Each iteration adds by average 3.0625 characters to the string. With an array containing 125000 elements it produces a string that contains about 383000 characters. As each character is two bytes, that gives a string that uses 766 kbyte of data. As the string is growing in a linear fashion, we can calcuate the average work done by each concatenation by taking the average size of the string during the operation, which is half the size of the finished string. So a concatenation is by average moving an amount of 383 kbytes of data. As we have 125000 iterations, we have around 195000 string concatenations (125000 times 1.5625). 195000 times 383 kbytes makes 74685000 kbyte. When the routine has finished, it has moved somewhere around 75 gigabyte of data. (As that is far more than the amount of avialable RAM, this has also caused hundreds of garbage collections to take place.) That is the reason why the routine is so slow. To improve the routine is easy. Use a StringBuilder. That would make the routine run around a 100000 times faster. As an interresting observation in optimization, one can speed up the routine somewhat by using a temporary string:string content, line;
content = string.Empty;
line = string.Empty;
for (int x = 0; x < Data.senderdata.Length; x++) {
if (x % 16 == 0) {
content += line;
line = String.Format("\n{0:X5}: ", (x+1));
}
line += String.Format("{0:X2}", Data.senderdata[x]);
if ((x + 1) % 2 == 0) line += " ";
}
content += line;This would redude the number of lengthy concatenations from 1.5625 per iteration to 0.0625, reducing the execution time by 96%. Not nearly as effective as using a StringBuilder, but somewhat impressive eventhough... :) --- b { font-weight: normal; } -- modified at 11:27 Tuesday 6th June, 2006