Please help. Is multimap what I need?
-
Hi-- I've been banging my head trying to solve this problem. Of course I'm a self trained pseudo programmer who just started to use STL. I have a huge file with 10 million entries that are paired, say: 1 8 1 5 1 3 2 0 2 4 3 0 3 8 etc... They are stored as a two dimensional array of 10 million rows and two columns... you get the idea of the data. I need to find the records in the file that have one and only one of the tags in the first entry of the pair and put the second in another vector file in the order found. For example, if I need records with the tag 2, I'll create the vector 0 4 I suppose I need to use a multimap, but do not know how to read the file and then store the results in a vector. Right now I have this simple code: typedef vector DVECTOR; DVECTOR FDV; int tag; double FD; int id = 3; int idp = id + 1; while(tag != idp) { GetData >> tag >> FD; cout << tag << " " << FD << endl ; if(tag == id) FDV.push_back(FD); } This does what I want by scaning the file from the begining until it finds the target value and reads all the entries with such value. If the number is close to the first entry, cool it's fast, but if it is near the end it will take a long time (big program repeated many times...). The question is, would the multimap work better and faster to do the same task? If so... how do I do it??? that is, read the file, store the data in a multimap, search for the tag number of my interest and copy all values associated to the tag number into a vector. If I have to read the whole 10 million line file to put it in a map, then it's going to be hanging around in memory for further use (several times)... or should I read the file every time I need it?? If so, then, isn't my naive code more efficient??? Thank you so much! Carlos
-
Hi-- I've been banging my head trying to solve this problem. Of course I'm a self trained pseudo programmer who just started to use STL. I have a huge file with 10 million entries that are paired, say: 1 8 1 5 1 3 2 0 2 4 3 0 3 8 etc... They are stored as a two dimensional array of 10 million rows and two columns... you get the idea of the data. I need to find the records in the file that have one and only one of the tags in the first entry of the pair and put the second in another vector file in the order found. For example, if I need records with the tag 2, I'll create the vector 0 4 I suppose I need to use a multimap, but do not know how to read the file and then store the results in a vector. Right now I have this simple code: typedef vector DVECTOR; DVECTOR FDV; int tag; double FD; int id = 3; int idp = id + 1; while(tag != idp) { GetData >> tag >> FD; cout << tag << " " << FD << endl ; if(tag == id) FDV.push_back(FD); } This does what I want by scaning the file from the begining until it finds the target value and reads all the entries with such value. If the number is close to the first entry, cool it's fast, but if it is near the end it will take a long time (big program repeated many times...). The question is, would the multimap work better and faster to do the same task? If so... how do I do it??? that is, read the file, store the data in a multimap, search for the tag number of my interest and copy all values associated to the tag number into a vector. If I have to read the whole 10 million line file to put it in a map, then it's going to be hanging around in memory for further use (several times)... or should I read the file every time I need it?? If so, then, isn't my naive code more efficient??? Thank you so much! Carlos
Hi, If your data is in memory, yes it is a multimap you need. You fill it like this:
typedef multimap<int,int> mmapint ;
mmapint mapData;
mapData.insert(mmapint::value_type(1,8));
mapData.insert(mmapint::value_type(1,5));
...To find the data,use this
vector<int> vecResults;
mmapint::const_iterator iteWhere = mapData.find(nSearchedValue);
while ( iteWhere != mapData.end() )
{
if ( iteWhere->first != nSearchedValue)
break;
vecResults.push_back(iteWhere->second);
iteWhere++;
}Now, if each time you are going to do a search, you are reading the whole file into memory, you might as well collect the information as you read it... And if your file is sorted, then you should probably do a search by doing successive seeks and halfing the extent of your search... moving to the first one and iterate until you find all of them. Hope this helps!
-
Hi-- I've been banging my head trying to solve this problem. Of course I'm a self trained pseudo programmer who just started to use STL. I have a huge file with 10 million entries that are paired, say: 1 8 1 5 1 3 2 0 2 4 3 0 3 8 etc... They are stored as a two dimensional array of 10 million rows and two columns... you get the idea of the data. I need to find the records in the file that have one and only one of the tags in the first entry of the pair and put the second in another vector file in the order found. For example, if I need records with the tag 2, I'll create the vector 0 4 I suppose I need to use a multimap, but do not know how to read the file and then store the results in a vector. Right now I have this simple code: typedef vector DVECTOR; DVECTOR FDV; int tag; double FD; int id = 3; int idp = id + 1; while(tag != idp) { GetData >> tag >> FD; cout << tag << " " << FD << endl ; if(tag == id) FDV.push_back(FD); } This does what I want by scaning the file from the begining until it finds the target value and reads all the entries with such value. If the number is close to the first entry, cool it's fast, but if it is near the end it will take a long time (big program repeated many times...). The question is, would the multimap work better and faster to do the same task? If so... how do I do it??? that is, read the file, store the data in a multimap, search for the tag number of my interest and copy all values associated to the tag number into a vector. If I have to read the whole 10 million line file to put it in a map, then it's going to be hanging around in memory for further use (several times)... or should I read the file every time I need it?? If so, then, isn't my naive code more efficient??? Thank you so much! Carlos
1. If your source file does not change, 2. and you read it each time, or keep the file handle open, 3. it is already sorted by your first key, then you might also optimize your existing system by keeping track of the file position where each set of elements begin. That is, where do the 1 begin, the 2 begin, the 3 begin, etc. Then you can SetFilePosition() or seek() to that spot, and at least read until you get something NOT matching your number. Otherwise, if always in memory, then use the multimap.
-
Hi, If your data is in memory, yes it is a multimap you need. You fill it like this:
typedef multimap<int,int> mmapint ;
mmapint mapData;
mapData.insert(mmapint::value_type(1,8));
mapData.insert(mmapint::value_type(1,5));
...To find the data,use this
vector<int> vecResults;
mmapint::const_iterator iteWhere = mapData.find(nSearchedValue);
while ( iteWhere != mapData.end() )
{
if ( iteWhere->first != nSearchedValue)
break;
vecResults.push_back(iteWhere->second);
iteWhere++;
}Now, if each time you are going to do a search, you are reading the whole file into memory, you might as well collect the information as you read it... And if your file is sorted, then you should probably do a search by doing successive seeks and halfing the extent of your search... moving to the first one and iterate until you find all of them. Hope this helps!
-
1. If your source file does not change, 2. and you read it each time, or keep the file handle open, 3. it is already sorted by your first key, then you might also optimize your existing system by keeping track of the file position where each set of elements begin. That is, where do the 1 begin, the 2 begin, the 3 begin, etc. Then you can SetFilePosition() or seek() to that spot, and at least read until you get something NOT matching your number. Otherwise, if always in memory, then use the multimap.