Blog calendar
— or —
Blog tags
24 Apr 2009 10:26
Weekend is coming and I have a very small pet-project for it. I would still keep the idea non-public, but it involves processing hundreds of entries per second, analyzing data from multiple sources. It would have a dead-simple web interface.
The nature of the project requires really fast data backend, capable of storing and retrieving a few thousand items per second. The dataset would be approximately 5GB, average item size: 0.5KB.
When it came to tools selection, after short considerations I have chosen Sinatra for web interface, and Redis as a memory-only (with disk dumps) key-value datastore. It should be capable of handling 100 000 requests per second and deal well with large datasets, so fits perfectly. It also differs from Memcached or MemcachedDB because it has great higher-level structures like Lists and Sets, basic sorting and selection commands.
Recently there is a lot of hype about (distributed) key-value storage, for more info I recommend a nice article by Richard Jones Anti-RDBMS: A list of distributed key-value stores. HighScalability blog also has a lot of references and articles.
Redis looks like the way to go. The only problem with it is that the whole dataset (database) must fit in the RAM, otherwise performance might degrade terribly (because of swapping). Performance itself is not an issue, and you would need several concurrent clients to actually face this as a limit.
Anyway, initially I wanted to deploy the project at Amazon EC2 - because of hyped scalability, price etc. But here comes a surprise — the performance simply sucks. I guess this is because the instances share common hardware and you might have actual memory bandwidth limited.
Here are my results of running
./redis-benchmark -n 100000
Amazon Small instance ($0.10/h)
====== PING ======
100042 requests completed in 11.95 seconds
50 parallel clients
3 bytes payload
keep alive: 1
8369.61 requests per second
====== SET ======
100023 requests completed in 12.13 seconds
50 parallel clients
3 bytes payload
keep alive: 1
8247.28 requests per second
====== GET ======
100004 requests completed in 14.26 seconds
50 parallel clients
3 bytes payload
keep alive: 1
7010.94 requests per second
====== INCR ======
100000 requests completed in 14.40 seconds
50 parallel clients
3 bytes payload
keep alive: 1
6945.89 requests per second
====== LPUSH ======
100000 requests completed in 12.24 seconds
50 parallel clients
3 bytes payload
keep alive: 1
8171.27 requests per second
====== LPOP ======
100000 requests completed in 14.22 seconds
50 parallel clients
3 bytes payload
keep alive: 1
7033.83 requests per second
The small instance is a no-go if you want to use it for Redis. Keep in mind it is AMD-based and in general the High CPU instances (with Intel Xeons) outperform their AMD brothers dramatically.
Amazon High CPU Medium ($0.20/h)
====== PING ======
100007 requests completed in 6.52 seconds
50 parallel clients
3 bytes payload
keep alive: 1
15333.79 requests per second
====== SET ======
100006 requests completed in 2.22 seconds
50 parallel clients
3 bytes payload
keep alive: 1
44986.95 requests per second
====== GET ======
100009 requests completed in 2.21 seconds
50 parallel clients
3 bytes payload
keep alive: 1
45252.94 requests per second
====== INCR ======
100000 requests completed in 2.35 seconds
50 parallel clients
3 bytes payload
keep alive: 1
42625.75 requests per second
====== LPUSH ======
100009 requests completed in 2.24 seconds
50 parallel clients
3 bytes payload
keep alive: 1
44686.78 requests per second
====== LPOP ======
100011 requests completed in 2.28 seconds
50 parallel clients
3 bytes payload
keep alive: 1
43787.66 requests per second
This is much better, but still sucks. For a similar price you could get a dedicated box at SoftLayer, our current provider, with more than a double performance AND good upgrade options.
Surprisingly, more expensive EC2 instances could not deliver any much higher performance, being in every respect less performant than any decent dedicated box. You could find more benchmarks at the Redis website. Our office quad-core server was also able to get about 100 000 inserts per second.
I know the power of Amazon is not exactly the "inexpensive hardware", but rather flexibility, range of added services, probably easier administration… but there are kind of services you really do not want to put in virtualized environment. Talking to "bare metal" is extremely important when running Redis, and probably any memory-intensive software.
Also, since Redis datasets must fit in the memory, it would be nice to be able to get cheap boxes (slow drives are ok) with lots of ram. Still, it is worth considering if using Amazon EC2 is the best option.
Still, I am considering running the project on EC2 in the initial period, but you really need to be careful about the choice.
How it refers to Wikidot?
When I was testing EC2 instances with PostgreSQL installed, populated with a copy of Wikidot.com database, I was getting only 50% of the performance of the dedicated server for queries that for sure all used only cached data, even on the fastest instances. So it looks like moving our database server to EC2 would significantly decrease our performance. At this moment it is not acceptable. This post on Amazon forums would suggests memory bandwidth problems in EC2 instances.
Previously I have been presenting a possible migration to Amazon EC2 services. After a while it looks like our whole database / webserver infrastructure would need to be reconsidered to benefit from EC2 architecture. In the end we will need to partition our datasets (sharding) and probably modify storage for uploaded files, but honestly I would rather move this moment in time as far as I can, and as long as we still have plenty of options within our current setup.
BTW: A weekend (short) project is a kind of project that should take only a few days to complete, or at least to build a reasonably working and functional prototype. It should be fun and educational, give a chance to explore new solutions and technologies. Perfectly I would welcome more people on-board.
rating: 0, tags: amazon ec2 redis sinatra
I do not know if you have red this..
I read to today in Sun's "News" of their Virtual Data Center - ( with "computer clouding" on Solaris, Linux or windows). under the title "Give room ( make place ) - Amazon…"
Have you seen this?
Here are some infos:
http://www.infoworld.com/t/platforms/sun-challenges-amazon-cloud-dominance-053
http://www.informationweek.com/news/software/hosted/showArticle.jhtml?articleID=215802006&pgno=1&queryText=&isPrev
I for my own have nothing done in java or sun systems till today..
Regards
Helmut
Service is my success. My webtips:www.blender.org (Open source), Wikidot-Handbook.
Sie können fragen und mitwirken in der deutschsprachigen » User-Gemeinschaft für WikidotNutzer oder
im deutschen » Wikidot Handbuch ?
Interesting (new) Documen foundt:
[http://www.sun.com/offers/docs/cloud_computing_primer.pdf]
Service is my success. My webtips:www.blender.org (Open source), Wikidot-Handbook.
Sie können fragen und mitwirken in der deutschsprachigen » User-Gemeinschaft für WikidotNutzer oder
im deutschen » Wikidot Handbuch ?
You'll probably have read by now that Redis 2.0, coming out very soon (and already available in a release candidate) gets past the requirement for all the data to fit in memory, by implementing a form of virtual memory. This looks extremely promising - that was previously something of a stumbling block on many projects.
Rackspace has a product with virtual servers and an API. Maybe it's not as sophisticated as Amazon's but you could try how it performs in your application and benchmarks.
You get about 68,000 req/sec for SET on an XXL High-Mem for ./redis-benchmark -d 500
I like your blog. I enjoyed reading your blog. It was amazing. Thanks a lot.
Thanks for sharing this nice article. and i wish to again on your new blog keep sharing the article.
Thanks For Share.
Your blog is very nice and I like it your blog keep sharing with your new article.
Thanks for sharing this nice article it have some great useful knowledge.Really a nice post shared by you.
Thanks for sharing this nice article. I read it completely and get some interesting knowledge from this. I again thanks for sharing such a nice blog.