Data structure quiz [modified]

Daniel Grunwald

Warning: Code in the lounge ;P You are given this class:

class PersistentList<T> {
List<T> innerList = new List<T>();

public PersistentList<T> Insert(int index, T item) {
PersistentList<T> copy = new PersistentList<T>();
copy.innerList.AddRange(this.innerList);
copy.innerList.Insert(index, item);
return copy;
}
public PersistentList<T> RemoveAt(int index) {
PersistentList<T> copy = new PersistentList<T>();
copy.innerList.AddRange(this.innerList);
copy.innerList.RemoveAt(index);
return copy;
}
public T this[int index] { get { return innerList[index]; } }
public int Count { get { return innerList.Count; } }
}

So basically this is just an immutable list; items can be retrieved by index; Insert and Remove return copies. In this trivial implementation, each insert/remove operation requires a full copy. Obviously this isn't going to perform well. How would you reimplement it so that it runs faster? Describe the internal data structure you would use instead of a List<T> and how the Insert, Remove and get-by-index operations would work on it. (just give an overview, don't go into the details) First hint: In my solution, all operations run in O(lg n) worst-case time. This means it is much more efficient for insertions at random positions than a normal List<T> even if you don't need it to be persistent. Hint #2: To make the copy operation faster, we must not copy the whole list, but instead share unchanged segments between old and new list. This is possible using a binary tree - we only have to create copies of the nodes on the path to the changed position. We don't need parent pointers in the tree because we will always traverse it from the root, so we can store the list of parents on a stack for the operations that need it. Problems left: - how do we find a node by index? - how can we keep the tree balanced? If nobody can solve this, I will post more hints soon.

Last modified: 36mins after originally posted -- Hint 2

Pete OHanlon

Well - I would use a Red-Black tree. The implementation details of this are all over the web so, the design is easily obtainable from google.

Deja View - the feeling that you've seen this post before.

Daniel Grunwald

Using a red-black tree is the solution to the balancing problem. But a normal red-black tree contains a sorted collection for access using the "Find" operation. Here we want to store elements without sorting them and allow access by index. I realize the solution is on the web, but it's more fun to find it on your own.

Pete OHanlon

Daniel Grunwald wrote:

But a normal red-black tree contains a sorted collection for access using the "Find" operation. Here we want to store elements without sorting them and allow access by index.

Then, unless I missed it in your post, you should have stated this as a requirement.:)

Daniel Grunwald wrote:

I realize the solution is on the web, but it's more fun to find it on your own.

It's more fun, but less productive.:-D

Deja View - the feeling that you've seen this post before.

Daniel Grunwald

My post stated that I want a better implementation for the given class. System.Collections.Generic.List<T> is an unsorted container for access by index. "So basically this is just an immutable list; items can be retrieved by index; Insert and Remove return copies."

Leslie Sanford

I wrote a small library[^] of persistent data structures awhile back. I have a section[^] on using balanced binary trees for implementing them. Oh, and for accessing the elements by index, I used Knuth's method. I talk about that here[^].

Daniel Grunwald

Great article. I used a red-black tree instead of an AVL tree because the red-black tree needs at most 2 rotations after an insertion/removal. (if I remember correctly, AVL can take up to lg n rotations) RB-trees require less time to keep them balanced, but since they are less balanced, lookup can take a bit longer. (the longest path from root to leaf in a RB tree can be twice the length of the shortest path, in an AVL tree it can be only 1 longer) BTW: how a node can be found by index in a red-black tree is documented under the name "order statistics tree" (that is for sorted red-black trees, but it works the same for any binary tree)