I went through a length evaluation process of Riak recently, and came away with a generally positive impression.
First of all, it's beautifully engineered, as long as you just need a KV store or a graph DB (I wasn't in love with the MapReduce stuff, but that's another story). None of the hassle that Hadoop/Hbase have about some nodes being special (HBase Master, HDFS Namenode, etc). Also, no running multiple daemons per server (e.g., no distinction between regionserver and datanode daemons, like HBase). Easy config files, simple HTTP API (so you can just throw an off the shelf load balancer like HAProxy in front of it), and lots of little things that just make life easier.
I also really like how it's very explicit about the CAP tradeoffs it makes - with powerful conflict resolution available for when consistency has been sacrificed (instead of trying to sweep the issue under the rug, like many other distributed dbs do).
However, there are a few downsides.
First, as mentioned in the article, with the default backend (a kv-store called 'bitcask') all the keys per node have to fit in memory (and each key has, on the order of, 20 bits of overhead, IIRC). Annoyingly, this fact isn't noted anywhere in the Riak documentation that I saw (although, there is a work-around by using the InnoDB backed). This won't matter too much for many use cases, but it can be pretty painful if you're not expecting it.
Second, you can guard against data loss by specifying N copies of each piece of data are stored on your cluster. However, under some conditions (https://issues.basho.com/show_bug.cgi?id=228), the data may be replicated on less than N distinct nodes. So, the failure of less than N nodes could result in dataloss.
Finally, to the best of my knowledge, the largest Riak clusters in production are around 60 nodes, while Hadoop has 2000 node clusters running (e.g., FB) in production. Perfectly acceptable for most users, but just one more thing to worry about if you're planning to roll out a large cluster (more potential for 'unknown unknowns', so to speak).
I've given a few talks about NOSQL in general and Riak in particular and blogged[0] about both. I agree with all of your comments and will try to add a bit. One of the major driving design considerations of Riak is that of predictable scalability. Meaning that for each unit of hardware that you add to the system you are returned a predictable level of performance. You see this throughout the design of the system. Homogeneity amongst the nodes - meaning that all nodes are the same and there are no special nodes - is a big win. So is tunable CAP, as you already mentioned. Another major win from my perspective is the modular nature of the code base. I think it is one of the best layouts of a large system that I have seen yet. If you look at the main Riak repo you will see that it is comprised of a few sub modules. Basically most of the system is compartmentalized in this fashion. Major advantage is that development of components can proceed at their own pace and components can be reused/mixed more easily.
Re. Key limitations when using the default bitcask backend:
There is a spreadsheet[1] that outlines the number of keys you can have in your system based on a few variables. Definitely worth checking out. Two things to note here. There is the hard limit of keys per machine - due to each machines max memory - and there is the larger limit of keys per cluster based on the max memory of the cluster. Note that all this applies specifically when using the bitcask backend. There has been lots of talk about how to change this going forward and I know the Basho team is looking into it. Riak is quite interesting in that it can support a number of different backends - at the same time even. So you could have a bitcask backend for some data and a memory backend for other data. Since Riak is distributed, the implication is that outside of thinking about a single machines resources you should also think of the total clusters resources. Particularly, cluster total memory, total cpu cores and total disk spindles, that last one quite important but under-considered.
Re. max nodes in production:
One of the major considerations when dealing with a Dynamo derived system like Riak is cross node chatter. Riak does a lot of its magic by way of a gossip channel that is sending around all kinds of data. As the number of nodes in the system increases the level of chatter increases. I think there needs to be more work in optimizing that chatter for larger cluster sizes and I that has been happening if you follow the change logs.
If you follow the mailing list or irc you will notice that the primary concern amongst people new to Riak is that of querying. Riak has no secondary indexes, outside of Riak search, which is a separate download that is built on top of Riak (see code modularization). Riak has no native ordering. All of those things need to happen in the m/r phase. As it stands I think this is one of the major friction points to further adoption and why I have consistently recommended pairing Riak with Redis whenever possible.
These limitations aside, Riak represents, IMHO, the best mix of distribution, ease of use, raw power and growth potential of the currently in production NOSQL persisted datastore offerings. Philosophically I am very much attuned to the Dynamo world view and as a strict adherent think that Riak is the best representative from that perspective. Take that for what you will.
Disclaimer - Riak and Redis are my nosql databases of choice.
I believe the limit using the current bitcask backend is (40 bytes + average key size) * (replication factor) / (number of cluster nodes * memory capacity of the smallest node). If that factor grows above 1, you can't store any more.
IIRC correctly a 64-node cluster with 24gb of ram per node will handle a few billion 32-byte keys, replicated to three nodes. For larger keyspaces, the current recommendation is to use innodb, which doesn't need to keep keys in memory.
Given that Riak seems to implement buckets "for free" by essentially making them key prefixes under-the-hood, does the bucket name size need to be considered as part of the key size? e.g., if my bucket name is 32 chars and my keys in that bucket are 32 chars, should I be using 64 bytes for average key size?
This is a question I've gotten conflicting answers to.
I think 20 bits is too small. I've been looking for a source without success; the closest I've found[0] is 32-75 bytes per key which is something like 11 million keys per GB of RAM.
First of all, it's beautifully engineered, as long as you just need a KV store or a graph DB (I wasn't in love with the MapReduce stuff, but that's another story). None of the hassle that Hadoop/Hbase have about some nodes being special (HBase Master, HDFS Namenode, etc). Also, no running multiple daemons per server (e.g., no distinction between regionserver and datanode daemons, like HBase). Easy config files, simple HTTP API (so you can just throw an off the shelf load balancer like HAProxy in front of it), and lots of little things that just make life easier.
I also really like how it's very explicit about the CAP tradeoffs it makes - with powerful conflict resolution available for when consistency has been sacrificed (instead of trying to sweep the issue under the rug, like many other distributed dbs do).
However, there are a few downsides.
First, as mentioned in the article, with the default backend (a kv-store called 'bitcask') all the keys per node have to fit in memory (and each key has, on the order of, 20 bits of overhead, IIRC). Annoyingly, this fact isn't noted anywhere in the Riak documentation that I saw (although, there is a work-around by using the InnoDB backed). This won't matter too much for many use cases, but it can be pretty painful if you're not expecting it.
Second, you can guard against data loss by specifying N copies of each piece of data are stored on your cluster. However, under some conditions (https://issues.basho.com/show_bug.cgi?id=228), the data may be replicated on less than N distinct nodes. So, the failure of less than N nodes could result in dataloss.
Finally, to the best of my knowledge, the largest Riak clusters in production are around 60 nodes, while Hadoop has 2000 node clusters running (e.g., FB) in production. Perfectly acceptable for most users, but just one more thing to worry about if you're planning to roll out a large cluster (more potential for 'unknown unknowns', so to speak).