Redis: Another Hash db FTW
Posted by feydr | Posted in Uncategorized | Posted on 24-05-2009
View Comments
So I have to admit — I kinda have a hardon for key-value stores, also known as hash dbs. Why? Cause they are hashes which is the data strucuture I’ll use right before deciding to make something a class and they are incredibly fast.
Relational databases are about fucking retarded when it comes to storing data. The whole idea here is to persist data structures in a manner that is efficient right? If you had the option of storing already ordered data in an un-ordered fashion or storing it in the order it is already in what choice would you choose? The more acute observers amongst you would probably say O(1) versus O(n) is the right choice and you’d be correct.
Sometimes mysql just does not cut it when it comes to storing data. Actually, let’s not kid ourselves, mysql is awfully slow. If it weren’t for the need of persistence mysql probably wouldn’t be one of the inherent staples of web applications. I don’t know about you but I can usually max out my mysql with around 10-11k tx/second IF I have upped the ram limit.
So how fast is redis? Well, for list operations I was doing 48k tx/second — very noice!
Want to install it? Lets!
(notice: the rake task auto dls and installs the redis c for you — why?? cause ruby hackers are hot like that)
git clone git://github.com/ezmobius/redis-rb.git rake redis:install # start it up redis-server
now let’s check out a simple SET/GET
cyn0n@spicetrader:~/work/$ telnet 127.0.0.1 6379 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. SET foo 3 bar +OK GET foo $3 bar ^] telnet> quit
This is your pretty stereotypical hash storage. We tell the server that we are giving it a variable of 3 bytes. We terminate that line with ‘\r\n’ then we give it the data which is ‘bar’ and terminate it again with ‘\r\n’.
The server gives us a response saying we are good to go.
Now we tell the server we want our variable back. The server responds with a ‘$’ stating that data is to follow up to the next newline/carriage return. Then it proceeds to give us our requested data. Very straightforward.
Benchmarking
Ahh, this is what everyone comes for huh? So my benching wasn’t as awesome as tokyo cabinet was (however, keep in mind that redis comes with a built-in networking layer so it’s a lot faster than tokyo tyrant is)
Here they are using the included benchmarking boilerplate from ezra:
user system total real set 33.300000 2.980000 36.280000 ( 37.641193) set (pipelined) 14.390000 1.350000 15.740000 ( 26.550251) push+trim 36.520000 2.520000 39.040000 ( 40.420952) push+trim (pipelined) 0.410000 0.040000 0.450000 ( 10.630631)
which as you can see result in the following:
# sets set 606 tx/sec set (pipelined) 1428 tx/sec # lists push+trim 555 tx/sec push+trim (pipelined) 48780 tx/sec
obviously sets being unordered do not have O(1) as our lists do, however it makes them useful for things such as storing state
Features of Redis
- keys can store diff. data type rather than just strings
- master-slave replication (can move databases with a simple cp)
- persists data asynchronously versus tokyo cabinet that does this synchronously and memcache that does not at all
- sets are tables of hash keys
- can perform numerous server-side operations on sets such as set-intersections, unions, subtractions, and additions
- push head/tail O(1), range of lists
- list trims to implement circular buffers (ala rrdtool)
Another useful thing is that most (all?) primative operations are atomic such as the PUSH/POP/INRC/DECRs. What does this mean? This means that there is NO NEED for locking algorithims satisfying the A in ACID compliance and one main reason for the speed that is witnessed. There is more than one way to skin a cat and redis having dropped the requirement of having to lock is in my opinion an excellent choice!
Unlike tokyo cabinet, redis comes with a built-in networking layer which is much much faster than tokyo tyrant (the tokyo cabinet networking layer). I assume that the creators of tokyo cabinet were under the impression that you’d just embed the c lib into whatever program you wanted as the tyrant seems to have come as an afterthought.
Since data is written synchronously concerns of data loss pop up but since there is a chance you might lose a couple of keys here or there, however with the master/slave replication this defends against this pretty well.
If you look at redis you’ll see that another performance boost comes from the fact that certain objs are put into a free list and then recycled later on when needed negating the need to allocate/de-allocate system resources which ties up cpu — very smart indeed.

Memory usage is another point of contention when it comes to concerns about hash databases. While this is a valid concern it really amounts to a bullshit argument by someone who is taking facts out of context. Yes, it is true that you are unable to store more data at any given time in the store as you have ram. This is shocking to some people surprisingly, however it is not uncommon for a bottom-shelf production server to have 16gigs of ram to start out with and that’s only one server — NOT partitioning. If it’s a simple app that you wrote for yourself on a development box I am going to guess you don’t need more than the 2 or 3 gig anyways — if you do I’d suggest looking into getting real hardware instead of your laptop before you start bitching about performance.
As an after-note to this if you are not going to heed this advice do yourself a favor and perform this:
echo 1 > /proc/sys/vm/overcommit_memory
What this does is tell linux to allow you to use more memory then you actually have. Whenever redis saves, it forks the process over using twice the amount of memory. Although, apparently this never really happens and the memory is shared. Anyways, if you find yourself doing this quite a lot this is a warning to upgrade your hardware — as if the sluggish connections wouldn’t have been a clue to begin with.
More Reading
If you wish to do more research on redis ezra gave a lightning talk at the MountainWest Ruby Conf
here and antirez has done a pretty damn good job of documenting his code. His comments are also very good considering if he had wanted to do he could’ve named all of his vars and written his comments in Italian. (on that note I know the Italian hacker community is known for it’s hackerspaces and badass hacker conferences)
Drop the relational and enjoy your scalable, speed-demon database!

