List(Of Integer) vs. an array (VB.NET 2008)
-
I am a hobbyist programmer, and this may be a newbie question that has been asked before, but I was unsuccessful searching through the forums. I am writing a data analysis program that reads disk files and parses the information. Each file is one "session" and each session contains many "trials". Trials are identified by a number of items (trial length, trial stimulus, etc.) and also holds data which are a bunch of integers. So, I am building a trial class that has a number of properties, but one of those properties will be this array of integers (of unknown number). I'm wondering if I should represent that in the class as a simple array which heavily makes use of ReDim statements as the numbers are read and added to the array, or if I should instead represent them as a List(Of Integer) using the Add method to grow the collection. This is a philosophical question, not a coding question. I'm sure I will be able to code it either way. What I don't know is whether one or the other is preferable considering performance or other issues. I understand that using a collection would be a bad idea because of boxing/unboxing issues, but I was given to believe List(Of T) does not suffer this penalty. If List(Of Integer) is preferable, would you go on to say that one should always use generics and that arrays of values are always less preferable? Or perhaps a List(Of Integer) and an array of integers are the same under the hood? Thanks.
-
I am a hobbyist programmer, and this may be a newbie question that has been asked before, but I was unsuccessful searching through the forums. I am writing a data analysis program that reads disk files and parses the information. Each file is one "session" and each session contains many "trials". Trials are identified by a number of items (trial length, trial stimulus, etc.) and also holds data which are a bunch of integers. So, I am building a trial class that has a number of properties, but one of those properties will be this array of integers (of unknown number). I'm wondering if I should represent that in the class as a simple array which heavily makes use of ReDim statements as the numbers are read and added to the array, or if I should instead represent them as a List(Of Integer) using the Add method to grow the collection. This is a philosophical question, not a coding question. I'm sure I will be able to code it either way. What I don't know is whether one or the other is preferable considering performance or other issues. I understand that using a collection would be a bad idea because of boxing/unboxing issues, but I was given to believe List(Of T) does not suffer this penalty. If List(Of Integer) is preferable, would you go on to say that one should always use generics and that arrays of values are always less preferable? Or perhaps a List(Of Integer) and an array of integers are the same under the hood? Thanks.
To be honest you pose some really good questions that really hit the fundamentals of .NET and their language integration. I don't really know enough to fully answer you. One thing I can tell you though: memory reallocation is a very expensive operation, no matter what the implementation, and thus using constant ReDim's as the data is coming in is a bad idea. Probably 90% of the processing time will then be taken by the ReDim statement(s). The List(Of T) on the other hand allocates blocks of memory at a time, so only once the block is filled does it reallocate. I believe, but have no hard proof for this, that the utility classes .NET provides are all quite fast (in .NET relative terms, of course). Quite possibly, using a similar system of allocated larger blocks at a time and only reallocating when a block is filled yourself with ReDim's could be a bit faster. At the very least, you'll probably avoid some function call overhead. You've got me curious now :) I'll look in to it, barring someone else being able to answer these questions before I get time :P You could also always write your own test where you time both implementations. One test where you ReDim the array and add a random integer constantly, and the other where you use a List(Of Integer) and then checking the runtime of each on, say, 10000 or so iterations. EDIT 1: First test is in. Adding 100,000 integers to a list using ReDim's or List with standard capacity: VB ReDim's: 8919.503 ms List(Of Integer): 2.0002 ms So, ReDim-ing as you can see is really a bad idea. I do have to note though that on 10,000 items the time is not noticeable. EDIT 2: Included a ReDim mimic version of List(Of T)'s implementation: doubling capacity every time we overflow. The results are in:" VB ReDim's: 8907.5024 ms List(Of Integer): 2.0001 ms VB ReDim with Capacity: 2.0001 ms The conclusion is: use List(Of Integer). There is no speed difference noticeable (not even in my test) between implementing it yourself. But List(Of T) handles a lot of things for you, has lots of extra functionality, and negates the need for manual array management = double win.
modified on Tuesday, June 21, 2011 2:01 PM
-
To be honest you pose some really good questions that really hit the fundamentals of .NET and their language integration. I don't really know enough to fully answer you. One thing I can tell you though: memory reallocation is a very expensive operation, no matter what the implementation, and thus using constant ReDim's as the data is coming in is a bad idea. Probably 90% of the processing time will then be taken by the ReDim statement(s). The List(Of T) on the other hand allocates blocks of memory at a time, so only once the block is filled does it reallocate. I believe, but have no hard proof for this, that the utility classes .NET provides are all quite fast (in .NET relative terms, of course). Quite possibly, using a similar system of allocated larger blocks at a time and only reallocating when a block is filled yourself with ReDim's could be a bit faster. At the very least, you'll probably avoid some function call overhead. You've got me curious now :) I'll look in to it, barring someone else being able to answer these questions before I get time :P You could also always write your own test where you time both implementations. One test where you ReDim the array and add a random integer constantly, and the other where you use a List(Of Integer) and then checking the runtime of each on, say, 10000 or so iterations. EDIT 1: First test is in. Adding 100,000 integers to a list using ReDim's or List with standard capacity: VB ReDim's: 8919.503 ms List(Of Integer): 2.0002 ms So, ReDim-ing as you can see is really a bad idea. I do have to note though that on 10,000 items the time is not noticeable. EDIT 2: Included a ReDim mimic version of List(Of T)'s implementation: doubling capacity every time we overflow. The results are in:" VB ReDim's: 8907.5024 ms List(Of Integer): 2.0001 ms VB ReDim with Capacity: 2.0001 ms The conclusion is: use List(Of Integer). There is no speed difference noticeable (not even in my test) between implementing it yourself. But List(Of T) handles a lot of things for you, has lots of extra functionality, and negates the need for manual array management = double win.
modified on Tuesday, June 21, 2011 2:01 PM
additional to your EDIT 2: did you also test starting with an big array (say 1.000.000) and redim to double size if overflow, and one redim at the end (to reduce to used size)? how fast would this be?
I cannot remember: What did I before google?
-
additional to your EDIT 2: did you also test starting with an big array (say 1.000.000) and redim to double size if overflow, and one redim at the end (to reduce to used size)? how fast would this be?
I cannot remember: What did I before google?
I did not test that, as I didn't think from the OP that an array as large as 1,000,000 would be relevant. The ReDim would be slow if needed, but then again it won't be needed much. Anyway, it'd be the same performance as starting with
New List(Of Integer)(1000000)
, then using Add to add the items, and finally calling TrimExcess() on the list. You can test all this yourself by writing a very simple test program. I used Now() to get the time before and after the operation and output the difference in milliseconds. Took me no more than 10 minutes to write the entire test program and do the tests. -
I did not test that, as I didn't think from the OP that an array as large as 1,000,000 would be relevant. The ReDim would be slow if needed, but then again it won't be needed much. Anyway, it'd be the same performance as starting with
New List(Of Integer)(1000000)
, then using Add to add the items, and finally calling TrimExcess() on the list. You can test all this yourself by writing a very simple test program. I used Now() to get the time before and after the operation and output the difference in milliseconds. Took me no more than 10 minutes to write the entire test program and do the tests.I tried to save this 10 minutes :)
I cannot remember: What did I before google?
-
I tried to save this 10 minutes :)
I cannot remember: What did I before google?
I wasn't on my own computer, so I poured all of those 10 minutes into EDIT 1 & 2 :P
-
I wasn't on my own computer, so I poured all of those 10 minutes into EDIT 1 & 2 :P
ok, i used a few minutes ... and proofed you are right. AND i want YOUR computer *g* both ways used 3 milliseconds on my machine :(
I cannot remember: What did I before google?
-
I am a hobbyist programmer, and this may be a newbie question that has been asked before, but I was unsuccessful searching through the forums. I am writing a data analysis program that reads disk files and parses the information. Each file is one "session" and each session contains many "trials". Trials are identified by a number of items (trial length, trial stimulus, etc.) and also holds data which are a bunch of integers. So, I am building a trial class that has a number of properties, but one of those properties will be this array of integers (of unknown number). I'm wondering if I should represent that in the class as a simple array which heavily makes use of ReDim statements as the numbers are read and added to the array, or if I should instead represent them as a List(Of Integer) using the Add method to grow the collection. This is a philosophical question, not a coding question. I'm sure I will be able to code it either way. What I don't know is whether one or the other is preferable considering performance or other issues. I understand that using a collection would be a bad idea because of boxing/unboxing issues, but I was given to believe List(Of T) does not suffer this penalty. If List(Of Integer) is preferable, would you go on to say that one should always use generics and that arrays of values are always less preferable? Or perhaps a List(Of Integer) and an array of integers are the same under the hood? Thanks.
Read SlimList (I'm the author, but I think it's very relevant to your question). It discusses a bit about how List works. The List class basically uses an array that doubles in capacity each time capacity is reached. If you average out the time spent on each add operation, that comes to a performance of O(1) for each insert operation. Performing an array resize each time you add an item would average to O(N) for each insert operation (where N is the total number of objects inserted). Of course, you could reimplement the capacity-doubling functionality to make it O(1), but then you'd just be making List all over again. Read about big-O notation if you don't know what O(1) and O(N) mean. Go with List. Read the above article if you want a more in-depth understanding of List (as well as an understanding of how it can theoretically be improved upon). Also, one more note: do not use SlimList. It merely demonstrates a theoretical improvement that is in practice so slow that it is not worthwile to with any real production code.
Steven St. John wrote:
I understand that using a collection would be a bad idea because of boxing/unboxing issues, but I was given to believe List(Of T) does not suffer this penalty.
The word "collection" applies to both arrays and Lists. In either case, those collections will only suffer a boxing penalty if you try to store a non-reference type in a reference variable. For example, if you put an integer into a
List(Of Object)
. However, putting an integer into aList(Of Integer)
would not cause boxing.Help a brotha out and vote Managing Your JavaScript Library in ASP.NET as the best ASP.NET article of May 2011.
-
To be honest you pose some really good questions that really hit the fundamentals of .NET and their language integration. I don't really know enough to fully answer you. One thing I can tell you though: memory reallocation is a very expensive operation, no matter what the implementation, and thus using constant ReDim's as the data is coming in is a bad idea. Probably 90% of the processing time will then be taken by the ReDim statement(s). The List(Of T) on the other hand allocates blocks of memory at a time, so only once the block is filled does it reallocate. I believe, but have no hard proof for this, that the utility classes .NET provides are all quite fast (in .NET relative terms, of course). Quite possibly, using a similar system of allocated larger blocks at a time and only reallocating when a block is filled yourself with ReDim's could be a bit faster. At the very least, you'll probably avoid some function call overhead. You've got me curious now :) I'll look in to it, barring someone else being able to answer these questions before I get time :P You could also always write your own test where you time both implementations. One test where you ReDim the array and add a random integer constantly, and the other where you use a List(Of Integer) and then checking the runtime of each on, say, 10000 or so iterations. EDIT 1: First test is in. Adding 100,000 integers to a list using ReDim's or List with standard capacity: VB ReDim's: 8919.503 ms List(Of Integer): 2.0002 ms So, ReDim-ing as you can see is really a bad idea. I do have to note though that on 10,000 items the time is not noticeable. EDIT 2: Included a ReDim mimic version of List(Of T)'s implementation: doubling capacity every time we overflow. The results are in:" VB ReDim's: 8907.5024 ms List(Of Integer): 2.0001 ms VB ReDim with Capacity: 2.0001 ms The conclusion is: use List(Of Integer). There is no speed difference noticeable (not even in my test) between implementing it yourself. But List(Of T) handles a lot of things for you, has lots of extra functionality, and negates the need for manual array management = double win.
modified on Tuesday, June 21, 2011 2:01 PM
Thank you for running the test and your very helpful answer.
-
Read SlimList (I'm the author, but I think it's very relevant to your question). It discusses a bit about how List works. The List class basically uses an array that doubles in capacity each time capacity is reached. If you average out the time spent on each add operation, that comes to a performance of O(1) for each insert operation. Performing an array resize each time you add an item would average to O(N) for each insert operation (where N is the total number of objects inserted). Of course, you could reimplement the capacity-doubling functionality to make it O(1), but then you'd just be making List all over again. Read about big-O notation if you don't know what O(1) and O(N) mean. Go with List. Read the above article if you want a more in-depth understanding of List (as well as an understanding of how it can theoretically be improved upon). Also, one more note: do not use SlimList. It merely demonstrates a theoretical improvement that is in practice so slow that it is not worthwile to with any real production code.
Steven St. John wrote:
I understand that using a collection would be a bad idea because of boxing/unboxing issues, but I was given to believe List(Of T) does not suffer this penalty.
The word "collection" applies to both arrays and Lists. In either case, those collections will only suffer a boxing penalty if you try to store a non-reference type in a reference variable. For example, if you put an integer into a
List(Of Object)
. However, putting an integer into aList(Of Integer)
would not cause boxing.Help a brotha out and vote Managing Your JavaScript Library in ASP.NET as the best ASP.NET article of May 2011.
Thank you - your article was very informative!
-
Thank you - your article was very informative!
I'm glad it helped. :)
Help a brotha out and vote Managing Your JavaScript Library in ASP.NET as the best ASP.NET article of May 2011.