Friday's Coding Challenge
-
Here's a more involved problem that is suitable for a lazy Friday afternoon. Suppose you have a table (or other structure) that stores a trillion name/value pairs. You need to look up values from this table millions of times as fast as possible, but you don't have enough memory to simply store the table in memory. One thing you do notice, though, is that the same values tend to be requested multiple times over short periods of time. So for 1 minute you may only be accessing 1000 values, repeatedly, then another minute - or hour (who knows) - you may be accessing an entirely different set of 1000 values. You can't cache the entire table. The challenge is to provide a caching algorithm that will automatically adapt to the changing subset of values being requested. Pseudo code is fine but ASM gets you Man Points.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
I'd make a trillion web pages and let google index them, and then use google to lookup the result. ;P
Wout
-
Here's a more involved problem that is suitable for a lazy Friday afternoon. Suppose you have a table (or other structure) that stores a trillion name/value pairs. You need to look up values from this table millions of times as fast as possible, but you don't have enough memory to simply store the table in memory. One thing you do notice, though, is that the same values tend to be requested multiple times over short periods of time. So for 1 minute you may only be accessing 1000 values, repeatedly, then another minute - or hour (who knows) - you may be accessing an entirely different set of 1000 values. You can't cache the entire table. The challenge is to provide a caching algorithm that will automatically adapt to the changing subset of values being requested. Pseudo code is fine but ASM gets you Man Points.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
When I worked on an app that needed to cache the most recently/frequently used media files (large videos/PNGs), what I did was to write a cache-manager that promoted items to a higher rank based on the frequency of access as well as considered most-recently-accessed-time as a factor. I don't remember if I kept the size of the cache fixed. That was not RDBMS-based (at that time) and used a custom binary data format (large GB+ files). BTW, Rama and I tried to get these programming discussions going here in the past. After getting poor responses (mostly humor), we tried to do it in GIT (where it got more attention), but later GITians lost interest too. Kinda ironic that the guys who are most likely to have tried to respond to these threads don't post here all that much anymore (Rama, John, Shog, CG).
Regards, Nish
My technology blog: voidnish.wordpress.com
-
Here's a more involved problem that is suitable for a lazy Friday afternoon. Suppose you have a table (or other structure) that stores a trillion name/value pairs. You need to look up values from this table millions of times as fast as possible, but you don't have enough memory to simply store the table in memory. One thing you do notice, though, is that the same values tend to be requested multiple times over short periods of time. So for 1 minute you may only be accessing 1000 values, repeatedly, then another minute - or hour (who knows) - you may be accessing an entirely different set of 1000 values. You can't cache the entire table. The challenge is to provide a caching algorithm that will automatically adapt to the changing subset of values being requested. Pseudo code is fine but ASM gets you Man Points.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
-
Here's a more involved problem that is suitable for a lazy Friday afternoon. Suppose you have a table (or other structure) that stores a trillion name/value pairs. You need to look up values from this table millions of times as fast as possible, but you don't have enough memory to simply store the table in memory. One thing you do notice, though, is that the same values tend to be requested multiple times over short periods of time. So for 1 minute you may only be accessing 1000 values, repeatedly, then another minute - or hour (who knows) - you may be accessing an entirely different set of 1000 values. You can't cache the entire table. The challenge is to provide a caching algorithm that will automatically adapt to the changing subset of values being requested. Pseudo code is fine but ASM gets you Man Points.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
-
Whoa, somebody missed the joke icon!
The difficult we do right away... ...the impossible takes slightly longer.
-
Here's a more involved problem that is suitable for a lazy Friday afternoon. Suppose you have a table (or other structure) that stores a trillion name/value pairs. You need to look up values from this table millions of times as fast as possible, but you don't have enough memory to simply store the table in memory. One thing you do notice, though, is that the same values tend to be requested multiple times over short periods of time. So for 1 minute you may only be accessing 1000 values, repeatedly, then another minute - or hour (who knows) - you may be accessing an entirely different set of 1000 values. You can't cache the entire table. The challenge is to provide a caching algorithm that will automatically adapt to the changing subset of values being requested. Pseudo code is fine but ASM gets you Man Points.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
I would use an LRFU[^] (a hybrid of LRU least recently used and LFU least frequently used) algorithm.
m.bergman
For Bruce Schneier, quanta only have one state : afraid.
To succeed in the world it is not enough to be stupid, you must also be well-mannered. -- Voltaire
Honesty is the best policy, but insanity is a better defense. -- Steve Landesberg
-
Whoa, somebody missed the joke icon!
The difficult we do right away... ...the impossible takes slightly longer.
I did see the joke icon, but I'm sick of seeing someone or the other replying with this same "joke" every time a programming related thread is started in the lounge. Not that I'm voting on that post, but if it really is meant to be a joke, it's not even mildly funny.
"Real men drive manual transmission" - Rajesh.
-
I like these Challenges as they give me a chance to try something beyond what I do at work! I would also like to see what the possible answer could be too
Lobster Thermidor aux crevettes with a Mornay sauce, served in a Provençale manner with shallots and aubergines, garnished with truffle pate, brandy and a fried egg on top and Spam - Monty Python Spam Sketch
OK, I'll throw one of our solutions into the ring seeing as we're not getting any actual code, nor even pseudo-code (though Chris Losinger[^] was closest)
Create a nice linked list - say 5000 elements.
Decide on the number of common requests (say 1000)
For every request, check to see if it's in the list by traversing from the head element
If the element is in the array
If the element is in the first 1000 items
return the value
else
move the value to the head of the cache
and drop the last item in the cache if we have more than 5000 items
and return the value
else
Look up the value from the table
and add it to the head of the list
and drop the last item in the cache if we have more than 5000 items
and return the valueThe specific situation this problem was motivated from was IP lookups and spiders. Generally IP lookups were random, but occasionally we'd have a single IP generating tens of thousands of lookups. We ended up running a very small (500-1000) size cache with a "quick lookup" section at the head of the list of 300 items. This ran faster than any other caching method we used at the time. We have since moved to a more general caching method that combines linked list and dictionary so we have much faster lookup, a nice "quick lookup" area, and a fast reordering. I keep meaning to post the code. One of these days...
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
-
:rolleyes: I'm pulling out small puzzles we have already solved and that I enjoyed solving. It's easier for me to pose a question that I have already solved (at least to a point where it works sufficiently) than to rip off programming challenges from other sites and books that people can simply Google to get the answer to. So how about a different challenge for you: come up with your own programming challenge.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
-
Here's a more involved problem that is suitable for a lazy Friday afternoon. Suppose you have a table (or other structure) that stores a trillion name/value pairs. You need to look up values from this table millions of times as fast as possible, but you don't have enough memory to simply store the table in memory. One thing you do notice, though, is that the same values tend to be requested multiple times over short periods of time. So for 1 minute you may only be accessing 1000 values, repeatedly, then another minute - or hour (who knows) - you may be accessing an entirely different set of 1000 values. You can't cache the entire table. The challenge is to provide a caching algorithm that will automatically adapt to the changing subset of values being requested. Pseudo code is fine but ASM gets you Man Points.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
The solution to that problem is "memcached" (http://memcached.org/[^]). Of course, you can write your own, but being the code opensource, I´d check at what they're doing. They say some of how it works, here: http://amix.dk/blog/post/19356[^] Basically: They focus primarily on memory fragmentation. About the algorithm: "why would you waste processor cycles on finding expired items when you're not receiving any requests for it (as in, no one sees the data) *and* you haven't reached your memory constraints yet ?"
-
OK, I'll throw one of our solutions into the ring seeing as we're not getting any actual code, nor even pseudo-code (though Chris Losinger[^] was closest)
Create a nice linked list - say 5000 elements.
Decide on the number of common requests (say 1000)
For every request, check to see if it's in the list by traversing from the head element
If the element is in the array
If the element is in the first 1000 items
return the value
else
move the value to the head of the cache
and drop the last item in the cache if we have more than 5000 items
and return the value
else
Look up the value from the table
and add it to the head of the list
and drop the last item in the cache if we have more than 5000 items
and return the valueThe specific situation this problem was motivated from was IP lookups and spiders. Generally IP lookups were random, but occasionally we'd have a single IP generating tens of thousands of lookups. We ended up running a very small (500-1000) size cache with a "quick lookup" section at the head of the list of 300 items. This ran faster than any other caching method we used at the time. We have since moved to a more general caching method that combines linked list and dictionary so we have much faster lookup, a nice "quick lookup" area, and a fast reordering. I keep meaning to post the code. One of these days...
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
Thanks for that Chris even from that pseudo code even I could implement a coded solution. Its always good learn something new!
Lobster Thermidor aux crevettes with a Mornay sauce, served in a Provençale manner with shallots and aubergines, garnished with truffle pate, brandy and a fried egg on top and Spam - Monty Python Spam Sketch
-
The solution to that problem is "memcached" (http://memcached.org/[^]). Of course, you can write your own, but being the code opensource, I´d check at what they're doing. They say some of how it works, here: http://amix.dk/blog/post/19356[^] Basically: They focus primarily on memory fragmentation. About the algorithm: "why would you waste processor cycles on finding expired items when you're not receiving any requests for it (as in, no one sees the data) *and* you haven't reached your memory constraints yet ?"
And what caching algorithm would you use with MemCache? There are trillions of values you need to store. Assume you can't hold them all in memory, even with a distributed cache. This isn't a hardware / memory / processor problem. It's about thinking through the actual problem.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
-
Here's a more involved problem that is suitable for a lazy Friday afternoon. Suppose you have a table (or other structure) that stores a trillion name/value pairs. You need to look up values from this table millions of times as fast as possible, but you don't have enough memory to simply store the table in memory. One thing you do notice, though, is that the same values tend to be requested multiple times over short periods of time. So for 1 minute you may only be accessing 1000 values, repeatedly, then another minute - or hour (who knows) - you may be accessing an entirely different set of 1000 values. You can't cache the entire table. The challenge is to provide a caching algorithm that will automatically adapt to the changing subset of values being requested. Pseudo code is fine but ASM gets you Man Points.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
-
Andrew Rissing wrote:
Diabolical!
"Inconceivable!"
-
OK, I'll throw one of our solutions into the ring seeing as we're not getting any actual code, nor even pseudo-code (though Chris Losinger[^] was closest)
Create a nice linked list - say 5000 elements.
Decide on the number of common requests (say 1000)
For every request, check to see if it's in the list by traversing from the head element
If the element is in the array
If the element is in the first 1000 items
return the value
else
move the value to the head of the cache
and drop the last item in the cache if we have more than 5000 items
and return the value
else
Look up the value from the table
and add it to the head of the list
and drop the last item in the cache if we have more than 5000 items
and return the valueThe specific situation this problem was motivated from was IP lookups and spiders. Generally IP lookups were random, but occasionally we'd have a single IP generating tens of thousands of lookups. We ended up running a very small (500-1000) size cache with a "quick lookup" section at the head of the list of 300 items. This ran faster than any other caching method we used at the time. We have since moved to a more general caching method that combines linked list and dictionary so we have much faster lookup, a nice "quick lookup" area, and a fast reordering. I keep meaning to post the code. One of these days...
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
Chris Maunder wrote:
If the element is in the first 1000 items return the value
Why not move it up the chain by adding a count of how many times this has been requested? You'll need to sort the 5,000 elements each time, but that shouldn't be a problem. To minimize sorting, you can make the counts integers modulo 100 to give each fifty or so the same number and not have to sort them each time.
-
Andrew Rissing wrote:
Diabolical!
"Inconceivable!"
-
Chris Maunder wrote:
If the element is in the first 1000 items return the value
Why not move it up the chain by adding a count of how many times this has been requested? You'll need to sort the 5,000 elements each time, but that shouldn't be a problem. To minimize sorting, you can make the counts integers modulo 100 to give each fifty or so the same number and not have to sort them each time.
I like the chunking idea, but resorting 5000 elements each time is onerous.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
-
And what caching algorithm would you use with MemCache? There are trillions of values you need to store. Assume you can't hold them all in memory, even with a distributed cache. This isn't a hardware / memory / processor problem. It's about thinking through the actual problem.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
Memcache sets a TTL (in milliseconds) when it adds the entry. After it expires it requeries. The parameters to set that TTL should be how often data changes in that table. I guess you could keep track of how many "visits" each item has and how often it changes in the original table. So, a simple algorithm would set to set the TTL based on a formula on which are visited a lot (increase TTL) and how fast they change (decrease TTL). Memcache itself uses MRU, LRU and lazy expired-LRU cleanup when memory is full.
it´s the journey, not the destination that matters
-
:rolleyes: I'm pulling out small puzzles we have already solved and that I enjoyed solving. It's easier for me to pose a question that I have already solved (at least to a point where it works sufficiently) than to rip off programming challenges from other sites and books that people can simply Google to get the answer to. So how about a different challenge for you: come up with your own programming challenge.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP
I can come up with a programming challenge. The Codeproject site is down. The owner of the site would rather sleep in with the minus-silly degree weather outside. You have one phone and you have to find a way to determine his number out of the X potential numbers in Canada. Find the quickest way to wake him up before he wakes up on his own.
-
I can come up with a programming challenge. The Codeproject site is down. The owner of the site would rather sleep in with the minus-silly degree weather outside. You have one phone and you have to find a way to determine his number out of the X potential numbers in Canada. Find the quickest way to wake him up before he wakes up on his own.
That's too easy. You know the location is in Toronto, so hack into the PetSmart online order system, look up the last years orders in Toronto, filter by Hamster feed, order by volume, descending, then take the first order, match to client details, and you have a phone number.
cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP