Congratulations on launching! It looks like a great product. Some technical ques...

bluestreak · on July 28, 2020

thank you!

- replication is in the works, this is going to be both TCP and UDP based, column-first, very fast.

- yes, benchmarks are indeed are done on second pass over the mmaped pages. First pass would trigger IO, which is OS-driven and dependant on disk speed. We've seen well over 1.5Gb/s on disks that support this speed. Columns are mapped into memory separately and they are lazy accessed. So the memory footprint depends on what data your SQLs actually lift. We go quite far to minimize false disk reads by working with rowids as much and possible. For example 'order by' will need memory for 8 x row_count bytes in most cases.

- durability is something we want user to have control over. Under the hood we have these commit modes:

https://github.com/questdb/questdb/blob/master/core/src/main...

NOSYNC = means OS flushes memory whenever. That said, we use sliding 16MB memory window when writing. Flushes will trigger by unmapping pages. ASYNC = we call msync(async) SYNC = we call msync(sync)

biztos · on July 28, 2020

Definitely enjoyed the story and I find the product interesting! I especially like the time-series aggregation clauses since it makes it easy to "think in SQL."

I was also going to ask about replication. Any idea when it's going to be done?

Oh and kudos for the witty (previous) company name: Appsicle, haha, love that.

patrick73_uk · on July 28, 2020

Hi, I'm a questdb dev working on replication, we should have something working within a couple of months. If you have any questions feel free to ask me.

roskilli · on July 28, 2020

Curious: What is your strategy on replication? Is it some form of synchronous replication or asynchronous (i.e. active/passive with potential for data loss in event of hard loss of primary)? Also curious why you might look at UDP replication given unless using a protocol like QUIC on top of it, UDP replication would be inherently lossy (i.e. not even eventually consistent).

bluestreak · on July 28, 2020

The strategy is to multicast data to several nodes simultaneously. Data packets are sequence to allow receiver identify data loss. When loss is detected receiver finds breathing space to send a NACK. The packet and the nack would identify missing data chunk with O(1) complexity and sender then re-sends. Overall this method is lossless and avoids overhead of contacting nodes individually and sending same data over the network multiple times. This is useful in scenarios where several nodes participate in query execution and getting them up to date quickly is important.

judofyr · on July 29, 2020

This reminds me a bit of Aeron (https://github.com/real-logic/aeron) which is a reliable UDP uni/multicast transport library with built-in flow control. It's written in Java and seems to have superb performance (I haven't used it myself). Might be an interesting alternative if you don't want to write it all yourself.