Friday's Coding Challenge

NormDroid

Arbitary found out during testing to get the *best* size for the cache.

Software Kinetics Wear a hard hat it's under construction
Metro RSS

Chris Maunder

The size of the cache would depend on the decay time. The problem says there is around 1000 common lookups, but it doesn't define for how long these items stay common. Could be a minute. Could be an hour. Could be a second.

cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP

Nagy Vilmos

Allow it to be self sizing. Fixed limit, 1k as Chris indicated, and up to that limit just test how often you're reading from disk vs retrieving from cache. If we're talking in the order of a million records we could maybe even hold a token for each record, or block or records, to determine if we're caching too much or too little.

Panic, Chaos, Destruction. My work here is done. Drink. Get drunk. Fall over - P O'H OK, I will win to day or my name isn't Ethel Crudacre! - DD Ethel Crudacre I cannot live by bread alone. Bacon and ketchup are needed as well. - Trollslayer Have a bit more patience with newbies. Of course some of them act dumb - they're often *students*, for heaven's sake - Terry Pratchett

hairy_hats

It's a "challenge", not a "question". ;)

Simon_Whale

I like these Challenges as they give me a chance to try something beyond what I do at work! I would also like to see what the possible answer could be too

Lobster Thermidor aux crevettes with a Mornay sauce, served in a Provençale manner with shallots and aubergines, garnished with truffle pate, brandy and a fried egg on top and Spam - Monty Python Spam Sketch

Chris Losinger

i'd try it with a fixed-size double-ended list. get a request for A check the list for A, starting at the 'front' if A is in the list, move it to the front if A isn't in the list add it to the front of the list if you just added and the list has more than N items, pull the item off the back end, and discard. frequently-used items will stay near the front of the list. infrequently-used items will get pushed out, eventually. (you could probably also do this with a circular buffer.)

image processing toolkits | batch image processing

jesarg

This suspiciously sounds like you want us to do your work for you. Academic, programming-competition-style questions are more fun, imo.

Chris Maunder

:rolleyes: I'm pulling out small puzzles we have already solved and that I enjoyed solving. It's easier for me to pose a question that I have already solved (at least to a point where it works sufficiently) than to rip off programming challenges from other sites and books that people can simply Google to get the answer to. So how about a different challenge for you: come up with your own programming challenge.

cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP

PIEBALDconsult

Are you sure it's a bottleneck? Have you tried throwing more hardware at it? Have you tried a specialized Spell Check Tree? :-D I'm not a big fan of caching dynamic sets of data. I'd simply let SQL Server figure it out. Edit:

Chris Maunder wrote:

a trillion name/value pairs

On the long scale? Or the short scale?

wout de zeeuw

I'd make a trillion web pages and let google index them, and then use google to lookup the result. ;P

Wout

Nish Nishant

When I worked on an app that needed to cache the most recently/frequently used media files (large videos/PNGs), what I did was to write a cache-manager that promoted items to a higher rank based on the frequency of access as well as considered most-recently-accessed-time as a factor. I don't remember if I kept the size of the cache fixed. That was not RDBMS-based (at that time) and used a custom binary data format (large GB+ files). BTW, Rama and I tried to get these programming discussions going here in the past. After getting poor responses (mostly humor), we tried to do it in GIT (where it got more attention), but later GITians lost interest too. Kinda ironic that the guys who are most likely to have tried to respond to these threads don't post here all that much anymore (Rama, John, Shog, CG).

Regards, Nish

My technology blog: voidnish.wordpress.com

Lost User

How about a fully associative LRU cache of "around" 1000 entries?

Andrew Rissing

This sounds oddly like something for CodeProject. Are you trying to cut overhead costs by outsourcing to the people who visit this site? :D Diabolical! [Edit: Ha...sounds like I wasn't the first to think such[^].]

Richard Andrew x64

Whoa, somebody missed the joke icon!

The difficult we do right away... ...the impossible takes slightly longer.

Michael Bergman

I would use an LRFU[^] (a hybrid of LRU least recently used and LFU least frequently used) algorithm.

m.bergman

For Bruce Schneier, quanta only have one state : afraid.

To succeed in the world it is not enough to be stupid, you must also be well-mannered. -- Voltaire

Honesty is the best policy, but insanity is a better defense. -- Steve Landesberg

Rajesh R Subramanian

I did see the joke icon, but I'm sick of seeing someone or the other replying with this same "joke" every time a programming related thread is started in the lounge. Not that I'm voting on that post, but if it really is meant to be a joke, it's not even mildly funny.

"Real men drive manual transmission" - Rajesh.

Chris Maunder

OK, I'll throw one of our solutions into the ring seeing as we're not getting any actual code, nor even pseudo-code (though Chris Losinger[^] was closest)

Create a nice linked list - say 5000 elements.
Decide on the number of common requests (say 1000)
For every request, check to see if it's in the list by traversing from the head element
If the element is in the array
If the element is in the first 1000 items
return the value
else
move the value to the head of the cache
and drop the last item in the cache if we have more than 5000 items
and return the value
else
Look up the value from the table
and add it to the head of the list
and drop the last item in the cache if we have more than 5000 items
and return the value

The specific situation this problem was motivated from was IP lookups and spiders. Generally IP lookups were random, but occasionally we'd have a single IP generating tens of thousands of lookups. We ended up running a very small (500-1000) size cache with a "quick lookup" section at the head of the list of 300 items. This ran faster than any other caching method we used at the time. We have since moved to a more general caching method that combines linked list and dictionary so we have much faster lookup, a nice "quick lookup" area, and a fast reordering. I keep meaning to post the code. One of these days...

cheers, Chris Maunder The Code Project | Co-founder Microsoft C++ MVP

jesarg

I love programming problems, but I have meetings all afternoon long today and won't be able to do anything on the forums until this evening. Try me again next Friday.

ErnestoNet

The solution to that problem is "memcached" (http://memcached.org/[^]). Of course, you can write your own, but being the code opensource, I´d check at what they're doing. They say some of how it works, here: http://amix.dk/blog/post/19356[^] Basically: They focus primarily on memory fragmentation. About the algorithm: "why would you waste processor cycles on finding expired items when you're not receiving any requests for it (as in, no one sees the data) *and* you haven't reached your memory constraints yet ?"

Simon_Whale

Thanks for that Chris even from that pseudo code even I could implement a coded solution. Its always good learn something new!

Lobster Thermidor aux crevettes with a Mornay sauce, served in a Provençale manner with shallots and aubergines, garnished with truffle pate, brandy and a fried egg on top and Spam - Monty Python Spam Sketch