DataView.RowFilter considered harmful..
-
..for performance. I dug up an ETL-method from my .NET-dark ages because I needed it again and in general it's working fine. Just a bit slow. Because there's a lot of transformational stuff going on I first thought that there's just no way around that and considered implementing a nice progress indication for the user that actually delivers on its "remaining time" estimation. To be able to identify which "atomic parts" of the transformation require how much time I re-arranged the code so that steps which happened in a nested fashion now execute one after another. And to my surprise the transformation which I suspected to be the most time consuming one wasn't! Instead it was a step inside that one which now revealed its awful performance and on closer inspection it boiled down to using DataView.RowFilter, which, I assume, I used because there was no LINQ at the time and I couldn't be bothered to write some loops, assuming that .RowFilter wouldn't be THAT bad. But it is. Replacing this:
treeView.RowFilter = String.Format("ParentID={0} AND {1}<>{0}", parentId, idColumnName);
(and the code that relies on it) with this:
Dictionary<int, HashSet<int>> itemsByParentId = treeTable.AsEnumerable() .Select(row => new { parentId = row.Field<int>\[parentIdColIdx\], itemId = row.Field<int>\[itemIdColIdx\] }) .Where(x => x.parentId != x.itemId) .GroupBy(x => x.parentId) .ToDictionary(grp => grp.Key, grp => new HashSet<int>(grp.Select(x => x.itemId)));
(and code that uses it instead) cut down the time for the whole ETL-process from 26 minutes to 2 minutes...
If the brain were so simple we could understand it, we would be so simple we couldn't. — Lyall Watson