filling redis with test data

I've been toying with the idea of a simple log searching system based on redis. One approach I am considering is creating sets of log messages aggregated for each second of the day. This might be a terrible idea but I want to see what I can squeeze out of naive methods before dealing with full text search.

To prototype my idea, I am filling redis with test data - one random word from the local dictionary file for each second of the day. Below is a haskell program to accompish this

gist

updated with useful performance notes

The program above performs very poorly. To fill redis on my laptop, it was taking over five minutes. Something was clearly wrong. the issue was my use of "!!" with a large haskell list. Random access of this very large list (nearly 100k words) was killing performance. My fix was to replace the list with a Vector, a data structure that provides for advanced performance features and is often recommended as a solution to some problems with traditional haskell lists. The new code runs in about a second, a massive improvement over the multi-minute execution time of the above code.

gist

And finally I present a c++11 version as well below. I was sure that this would be substantially faster than the haskell version, but I was wrong. The haskell version consistently finished in about a second, while the c++11 version was consistently over five seconds. The haskell version utilized about 99% of cpu while the c++11 version achieved about half of that. It is likely that my c++ could be tuned considerably, it is far more verbose than the equivalent (and faster) haskell

gist

last update 2012-05-29