Challenge: the fastest way to filter doubled items from a list.
-
To all the clever coders out here! I have a list full of strings (doubledList) and I want to create a new list containing every unique string from the first list exactly once(UniqueList). My solution is this (C#):
private List LoadUniqueList(List doubledList)
{
List UniqueList = new List();
foreach (string item in doubledList)
{
bool x = true;
foreach (string compare in UniqueList)
{
if (item == compare)
{
x = false;
}
if (!x) break;
}
if (x) UniqueList.Add(item);
}
return UniqueList;
}Can you make it lighter, faster, sharper? :java::java::java: X| Personally, I have no idea if it's even possible; otherwise it wouldn't be very interesting for me of course. Also, do you think it's a good idea to use this part of the forum similar challenges? I think that starting with a working solution for a generic problem and then discussing whether you can improve it would be a great way to create more learning opportunities.
Giraffes are not real.
Here are the results of a performance test of all the ways mentioned (at the time of this writing) in this thread: skylark-software.com[^]
-
Here are the results of a performance test of all the ways mentioned (at the time of this writing) in this thread: skylark-software.com[^]
-
Good job, i dont know why anyone would downvote your post, but i think is nice to see the comparative. And surprisingly(at least to me) the most simply answers were the best.(i thought the sorted list would be the best)
Thanks. The down vote is strange to me too. Perhaps it's because I didn't post it as a CodeProject article. :confused: As I mentioned, I expected Distinct() to be about same as the sorted list and was surprised it was so much faster; apparently it's using a HashSet (or equivalent) internally. But I was really surprised at the difference in speed between the sorted list and the hash set. I thought they'd be closer to each other. I'm guessing it's probably a difference related to inserts.
-
Exactly. Iterate the input List, add new items to the HashSet and the result List.
this problem is complex!
-
Exactly. Iterate the input List, add new items to the HashSet and the result List.
Agreed, the example given will run in N^2 time, and your solution will run in N time.
You can only be young once. But you can always be immature. - Dave Barry
-
Agreed, the example given will run in N^2 time, and your solution will run in N time.
You can only be young once. But you can always be immature. - Dave Barry
Do not forget that HashSet must somehow check the constraint that there must not be duplicates. Also the time needed for that operation must be taken into account. And it will increase with the size of the Hashset.
-
Thanks. The down vote is strange to me too. Perhaps it's because I didn't post it as a CodeProject article. :confused: As I mentioned, I expected Distinct() to be about same as the sorted list and was surprised it was so much faster; apparently it's using a HashSet (or equivalent) internally. But I was really surprised at the difference in speed between the sorted list and the hash set. I thought they'd be closer to each other. I'm guessing it's probably a difference related to inserts.
Harley L. Pebley wrote:
it's using a HashSet (or equivalent) internally
It uses it's own Set type which basically is a minimal implementation of a HashSet. So, I am suprised that
Distinct
is slower than aHashSet
, because it uses a dedicated implementation instead of a general solution.Greetings - Jacek
-
To all the clever coders out here! I have a list full of strings (doubledList) and I want to create a new list containing every unique string from the first list exactly once(UniqueList). My solution is this (C#):
private List LoadUniqueList(List doubledList)
{
List UniqueList = new List();
foreach (string item in doubledList)
{
bool x = true;
foreach (string compare in UniqueList)
{
if (item == compare)
{
x = false;
}
if (!x) break;
}
if (x) UniqueList.Add(item);
}
return UniqueList;
}Can you make it lighter, faster, sharper? :java::java::java: X| Personally, I have no idea if it's even possible; otherwise it wouldn't be very interesting for me of course. Also, do you think it's a good idea to use this part of the forum similar challenges? I think that starting with a working solution for a generic problem and then discussing whether you can improve it would be a great way to create more learning opportunities.
Giraffes are not real.
so i know this thread is centuries old, but i decided to comment anyway. I haven't looked at what others have posted yet so heres my unadulterated take:
for (int i = 0; i < doubled.Count; i++)
{
if (!uniques.Contains(doubled[i]))
{
uniques.Add(doubled[i]);
}
}just one loop, although i know contains() is basically an O(n) method but the time difference doesnt seem much between your code and mine. But mine is less number of lines ;)
-
To all the clever coders out here! I have a list full of strings (doubledList) and I want to create a new list containing every unique string from the first list exactly once(UniqueList). My solution is this (C#):
private List LoadUniqueList(List doubledList)
{
List UniqueList = new List();
foreach (string item in doubledList)
{
bool x = true;
foreach (string compare in UniqueList)
{
if (item == compare)
{
x = false;
}
if (!x) break;
}
if (x) UniqueList.Add(item);
}
return UniqueList;
}Can you make it lighter, faster, sharper? :java::java::java: X| Personally, I have no idea if it's even possible; otherwise it wouldn't be very interesting for me of course. Also, do you think it's a good idea to use this part of the forum similar challenges? I think that starting with a working solution for a generic problem and then discussing whether you can improve it would be a great way to create more learning opportunities.
Giraffes are not real.
call this guy from Fsharp.Core!! >> http://msdn.microsoft.com/en-us/library/ee353861.aspx[^]
Regards Vallarasu S | FSharpMe.blogspot.com
-
To all the clever coders out here! I have a list full of strings (doubledList) and I want to create a new list containing every unique string from the first list exactly once(UniqueList). My solution is this (C#):
private List LoadUniqueList(List doubledList)
{
List UniqueList = new List();
foreach (string item in doubledList)
{
bool x = true;
foreach (string compare in UniqueList)
{
if (item == compare)
{
x = false;
}
if (!x) break;
}
if (x) UniqueList.Add(item);
}
return UniqueList;
}Can you make it lighter, faster, sharper? :java::java::java: X| Personally, I have no idea if it's even possible; otherwise it wouldn't be very interesting for me of course. Also, do you think it's a good idea to use this part of the forum similar challenges? I think that starting with a working solution for a generic problem and then discussing whether you can improve it would be a great way to create more learning opportunities.
Giraffes are not real.
You can put the strings that repeat often in the beginning of your unique string, so it will not have to iterate that many times through the loop. So it will be ordered by the amount it repeats instead of the order of original string. For example, for string string: one three two two three three unique string will be unique: three two one instead of unique: one three two because there is most threes in the string. But this will not always optimize, because there may be no most occurring string if they are completely at randomized. So for example, it would not work for random computer generated strings, but work for predictable user info. Edit: you can also order them by there length, for example if you have strings: reallyLongStringWhichIsReallyLong anotherString reallyLongStringWhichIsReallyLong shortString shortString ... Even through shortString and reallyLongString occur the same amount of times, it takes longer for computer to check if reallyLongString is in the front, so it would be more efficient to move this in the back of unique string. Also, how long will it take your function in C# to check this strings: "stuff", "str", "str", "morestuff", "str", "not unique", "evenMoreStuff", "stuff", "unique", "morestuff", "not unique"? In C++, it takes 0.124 seconds when I print results to the screen and 0.074 when I do not. in C++ with gcc:
//0.124 and 0.074
#include
#include
using namespace std;vector getUnique(vector listStr) {
vector::iterator listIt;vector uniqueStr;
vector::iterator uniqueIt;for (listIt = listStr.begin(); listIt < listStr.end(); listIt++) {
bool exists = false;
for (uniqueIt = uniqueStr.begin(); uniqueIt < uniqueStr.end(); uniqueIt++) {
if (*listIt == *uniqueIt) {
exists = true;
break;
}
}
if (!exists) uniqueStr.push_back(*listIt);
}
cout << "list: ";
for (listIt = listStr.begin(); listIt < listStr.end(); listIt++) cout << " " << *listIt;
cout << "\n\nunique list: ";
for (uniqueIt = uniqueStr.begin(); uniqueIt < uniqueStr.end(); uniqueIt++) cout << " " << *uniqueIt;
return uniqueStr;
}int main() {
vector str = {"stuff", "str", "str", "morestuff", "str", "not unique", "evenMoreStuff", "stuff", "unique", "morestuff", "not unique"};getUnique(str);
return 0;
} -
Agreed, the example given will run in N^2 time, and your solution will run in N time.
You can only be young once. But you can always be immature. - Dave Barry
That doesn't mean it's always faster - it just means it is asymptotically faster. It always annoyed me that my algorithms class never dealt with this issue, being exclusively interested in the theoretical framework where "best" only meant "best for infinite-size problem" and not at all interested in real-world performance, where infinite input isn't even possible. :)