Friday's Coding Challenge
-
Here's a more involved problem that is suitable for a lazy Friday afternoon. Suppose you have a table (or other structure) that stores a trillion name/value pairs. You need to look up values from this table millions of times as fast as possible, but you don't have enough memory to simply store the table in memory. One thing you do notice, though, is that the same values tend to be requested multiple times over short periods of time. So for 1 minute you may only be accessing 1000 values, repeatedly, then another minute - or hour (who knows) - you may be accessing an entirely different set of 1000 values. You can't cache the entire table. The challenge is to provide a caching algorithm that will automatically adapt to the changing subset of values being requested. Pseudo code is fine but ASM gets you Man Points.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
I don't have a Perl script for this so I can't help you. Next challenge... :-D
Just along for the ride. "the meat from that butcher is just the dogs danglies, absolutely amazing cuts of beef." - DaveAuld (2011)
"No, that is just the earthly manifestation of the Great God Retardon." - Nagy Vilmos (2011) "It is the celestial scrotum of good luck!" - Nagy Vilmos (2011) -
i'd try it with a fixed-size double-ended list. get a request for A check the list for A, starting at the 'front' if A is in the list, move it to the front if A isn't in the list add it to the front of the list if you just added and the list has more than N items, pull the item off the back end, and discard. frequently-used items will stay near the front of the list. infrequently-used items will get pushed out, eventually. (you could probably also do this with a circular buffer.)
Chris Losinger wrote:
frequently-used items will stay near the front of the list. infrequently-used items will get pushed out, eventually.
That seems like a nice approach. Avoid the temporal issue entirely. Marc
My Blog
An Agile walk on the wild side with Relationship Oriented Programming -
When I worked on an app that needed to cache the most recently/frequently used media files (large videos/PNGs), what I did was to write a cache-manager that promoted items to a higher rank based on the frequency of access as well as considered most-recently-accessed-time as a factor. I don't remember if I kept the size of the cache fixed. That was not RDBMS-based (at that time) and used a custom binary data format (large GB+ files). BTW, Rama and I tried to get these programming discussions going here in the past. After getting poor responses (mostly humor), we tried to do it in GIT (where it got more attention), but later GITians lost interest too. Kinda ironic that the guys who are most likely to have tried to respond to these threads don't post here all that much anymore (Rama, John, Shog, CG).
Regards, Nish
My technology blog: voidnish.wordpress.com
Nishant Sivakumar wrote:
Kinda ironic
don't you think. A little toooo ironic. :-D Anyhow, I like these kind of discussions. However, there can be an overload of discussion in one area of the thread and your idea gets overlooked by the majority. In Chris' first thread, it "seemed" that most of the discussion had nothing to do with the challenge proper but about jokes and quips about the details of the challenge. I, of course, could be wrong in my thinking. Funny how when Chris shoots out the challenge, everyone jumps on the wagon. Just sayin'...
Just along for the ride. "the meat from that butcher is just the dogs danglies, absolutely amazing cuts of beef." - DaveAuld (2011)
"No, that is just the earthly manifestation of the Great God Retardon." - Nagy Vilmos (2011) "It is the celestial scrotum of good luck!" - Nagy Vilmos (2011) -
And what caching algorithm would you use with MemCache? There are trillions of values you need to store. Assume you can't hold them all in memory, even with a distributed cache. This isn't a hardware / memory / processor problem. It's about thinking through the actual problem.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
There are some Microsoft cache tools here too: http://msdn.microsoft.com/en-us/windowsserver/gg675186[^]
it´s the journey, not the destination that matters
-
I did see the joke icon, but I'm sick of seeing someone or the other replying with this same "joke" every time a programming related thread is started in the lounge. Not that I'm voting on that post, but if it really is meant to be a joke, it's not even mildly funny.
"Real men drive manual transmission" - Rajesh.
You do realize I was replying to Chris, right? If you're stepping in, that's fine, but I didn't want you to take any offense.
The difficult we do right away... ...the impossible takes slightly longer.
-
You do realize I was replying to Chris, right? If you're stepping in, that's fine, but I didn't want you to take any offense.
The difficult we do right away... ...the impossible takes slightly longer.
Hey, no worries here. I know your reply was to Chris, but I was merely stating my opinion on the matter. I know I kinda "jumped in" though. :)
"Real men drive manual transmission" - Rajesh.
-
OK, I'll throw one of our solutions into the ring seeing as we're not getting any actual code, nor even pseudo-code (though Chris Losinger[^] was closest)
Create a nice linked list - say 5000 elements.
Decide on the number of common requests (say 1000)
For every request, check to see if it's in the list by traversing from the head element
If the element is in the array
If the element is in the first 1000 items
return the value
else
move the value to the head of the cache
and drop the last item in the cache if we have more than 5000 items
and return the value
else
Look up the value from the table
and add it to the head of the list
and drop the last item in the cache if we have more than 5000 items
and return the valueThe specific situation this problem was motivated from was IP lookups and spiders. Generally IP lookups were random, but occasionally we'd have a single IP generating tens of thousands of lookups. We ended up running a very small (500-1000) size cache with a "quick lookup" section at the head of the list of 300 items. This ran faster than any other caching method we used at the time. We have since moved to a more general caching method that combines linked list and dictionary so we have much faster lookup, a nice "quick lookup" area, and a fast reordering. I keep meaning to post the code. One of these days...
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
Chris Maunder wrote:
IP lookups and spiders
I'd want to know more about that -- it sounds like each item is very small. At first I was imagining very large data items, like URL-to-document.
-
i'd try it with a fixed-size double-ended list. get a request for A check the list for A, starting at the 'front' if A is in the list, move it to the front if A isn't in the list add it to the front of the list if you just added and the list has more than N items, pull the item off the back end, and discard. frequently-used items will stay near the front of the list. infrequently-used items will get pushed out, eventually. (you could probably also do this with a circular buffer.)
-
So much for using both the joke icon and the smiley - maybe they cancel each other out?
Cheers, विक्रम "We have already been through this, I am not going to repeat myself." - fat_boy, in a global warming thread :doh:
-
So much for using both the joke icon and the smiley - maybe they cancel each other out?
Cheers, विक्रम "We have already been through this, I am not going to repeat myself." - fat_boy, in a global warming thread :doh:
Some jokes are well beyond their best-by date. Time to encourage technical discussions, not make those who may not notice an icon or two think twice about discussing the very topic this site was founded upon. I know I'm waxing lyrical here, but it's so important to me to have members *want* to talk code. There's less and less code talk these days and that makes me incredibly sad.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP