1) It's worth to try Redis 2.6 against this. It is possible that it will perform better or worse, not sure, but more probably better.
2) Believe it or not Redis Pub/Sub was never tuned for speed so far, nor profiled / optimized, because as far as I can tell nobody asked for more performances given that with the order of magnitude we can see with both Redis and ZMQ, it is pretty hard to hit the wall. However there are demanding applications, so probably it's worth doing it.
3) Maybe ZMQ only uses one core as well, otherwise to have an absolutely fair comparison, N Redis nodes should be used simultaneously. Pub/Sub is the kind of application where sharding sometimes it is really really easy, just by channel. In general with Redis you have three options to go distributed with Pub/Sub.
Option A) Have N nodes and shard by channel.
Option B) Use replication, as it also does PUBLISH of messages on slaves.
Option C) Use Redis Cluster, but currently it is in alpha. However it already does message propagation across all the cluster so it is very easy to implement a reliable HA Pub/Sub system with it. However currently the propagation is not smart, every message is propagated to every node, however in Redis the cost of Pub/Sub is proportional to the number of receivers, so this is usually not a big issue, but we'll improve this aspect in the future anyway.
Zed Shaw benchmarked epoll vs poll a while back[0]. It looks like it really depends on the proportion of "active" clients to the total number. I would expect similar results for kqueue vs select (and as you point out, kqueue was horribly broken in OS X for a while).
I'll wait for AntiRez to chime in. It is possible that ZMQ has better performance than Redis, because the Redis server has to parse the request and then act on it, and then there's the marshalling on the client side; with ZMQ, you're just sending a command. I'm not sure if Redis has an efficient binary protocol, but having that may eliminate some of the bottleneck?
From the graphs it doesn't look that ZMQ has generally better peformance; it depends on the number of clients and whether they run on Python or Go. It's pretty interesting:
- For up to 4 clients, (buffered) redis is better than 0MQ in Python but worse in Go.
- For more than 4 clients it's exactly the opposite: redis is worse in Python but better in Go.
I'd be interested to hear an explanation for this, even if it turns out that a graph line was mislabeled :)
Seems that latency was not measured. I'd expect it to be much lower in 0mq. First there is no intermediate server, and secondly you don't have to wait for the flush thread to send a message.
I think there must be something seriously wrong with zmq golang bindings if the performance plummets like that. I mean, it's significantly slower than the python version.
If all things are equal (as they appear to be here)), I would go with 0MQ if only for the lack of adding another server to manage and that hassle. However that being said 0MQ is quite the black box and that offers it's own disadvantages.
I would be curious to see what would happen if gevent were added to the Python code.
In the Python flavour of these tests there's no concurrency occurring within a single OS process, all the work is CPU bound - adding gevent wouldn't achieve anything.
Also as njharman mentioned, you're still running a separate broker with zmq, so the number of components doesn't change.
This is a fun read, and the external links alone are worth it for me (msgpack, 0mq guide, python multiprocessing). But frankly, the conclusion says it all: "Conclusion. What can we take away from all of this? To be brutally honest, not much... ."
This is a really good question. I don't know the details around how zmq approaches buffering, I suspect it's much smarter than the approach used by the buffered Redis client in these tests (flush on every 1000 messages and every 200ms).
A few remarks:
1) It's worth to try Redis 2.6 against this. It is possible that it will perform better or worse, not sure, but more probably better.
2) Believe it or not Redis Pub/Sub was never tuned for speed so far, nor profiled / optimized, because as far as I can tell nobody asked for more performances given that with the order of magnitude we can see with both Redis and ZMQ, it is pretty hard to hit the wall. However there are demanding applications, so probably it's worth doing it.
3) Maybe ZMQ only uses one core as well, otherwise to have an absolutely fair comparison, N Redis nodes should be used simultaneously. Pub/Sub is the kind of application where sharding sometimes it is really really easy, just by channel. In general with Redis you have three options to go distributed with Pub/Sub.
Option A) Have N nodes and shard by channel.
Option B) Use replication, as it also does PUBLISH of messages on slaves.
Option C) Use Redis Cluster, but currently it is in alpha. However it already does message propagation across all the cluster so it is very easy to implement a reliable HA Pub/Sub system with it. However currently the propagation is not smart, every message is propagated to every node, however in Redis the cost of Pub/Sub is proportional to the number of receivers, so this is usually not a big issue, but we'll improve this aspect in the future anyway.