I used XTDB on a side project (written in Clojure) a couple years ago and was blown away by something that I think isn't even really stressed as a headliner feature for XTDB. That was the ability to treat queries as first-class data. XTDB's datalog queries can be provided as just native Clojure data structures, and that allows you to build complex queries using code. What this means is that it's much more DRY than copy-pasting and editing SQL templates, there's no ORM gumming up your performance and expressivity as your queries get more complex, and there's no half-baked implementation of an API for query composition: the _programming language_ is your tool for query composition. I couldn't tell you if there are other ways of getting this, as I'm not really a software engineer anymore, but it was a jaw-dropping experience for me to see how easy it could be to write a database layer, and far and away the best experience I've ever had writing one in any programming language.
I suppose what's at the heart of this, more than anything, is that Datalog has a near-trivial syntax that can transparently be accommodated by two simple data structures: lists and hash tables. Not so with SQL, whose COBOL-y nature obscures its syntax and makes it less straightforward to represent in everyday data structures.
In Clojure-land, we are also using HoneySQL [1] which has similar characteristics. You are still working within SQL semantics so it's a bit more complicated, but we are doing great complicated things with just maps, no API necessary.
We use XTDB 1 and have a multi replica production instance with about 600GB of data at ~100 million transactions. The biggest problem we have is with minor versions bumping the underlying RocksDB i dex version, requiring a full reindex of the main "golden" data store taking days to weeks.
AFAIK this is not solved in XTDB 1. I'm hoping XTDB 2 provides a better upgrade path for modestly sized databases like ours.
XTDB 2 will compute and maintain incremental indexes on-the-fly based on the raw data, so index updates will present far less operational impact. This also means the transaction log is now ~ephemeral in the new architecture (no more event-sourcing-style replays required, ever).
I don't want to distract from the main points in the slightest (this looks really cool) - but I'm just curious what is the business plan? The main landing page offers booking a free demo.. but I didn't really get where they start making money?
The whole thing looks really nice and polished. JUXT releases some really cool opensource stuff (I use tick all the time). But.. is there some catch? Like a PRO version or something? How are they paying the bills?
XTDB is a very interesting database to me. Not only is it distributed, but the 'bi-temporality' which is mentioned on this page means that it has essentially a built-in audit trail mechanism. I'm working on writing an RDF library for Clojure, and am looking forward to seeing whether XTDB really is as good as an RDF datastore as I suspect it is!
After seeing what seems like oodles of sqlite and rocksdb plus a bespoke cross-machine coordinator databases posted on HN, is this like the new employment guaranty architecture? Do any of these have actual ability over something like ScyllaDB (I'm taking cassandra out because of the GC pauses probably being a problem)
RocksDB in particular is a single node SSD-tuned log-structured merge tree "database" written in a non-GC language. ScyllaDB is the same, but comes with the cassandra-protocol scaling and multiple datacenters.
Sure, go ahead and market your whatever, but if you're just (IMO) repackaging a log structured merge tree engine, with various newly implemented distributed transaction / conflict resolution coordination schemes .... you should probably show some good due diligence on if they are, you know, somewhat correct.
The Jepsen test suite basically shows every distributed data processing engine as flawed the first couple times. So much so I don't really trust your new fangled engine unless you pass Jepsen to some degree.
Is there some comparison metrics (lies, big lies, and benchmarks and all that) showing why these semi-bespoke databases are being used versus other, say, more proven, schemes?
XTDB is a deterministic, single-writer system that leans on existing tech to handle all the hard distributed & durability aspects (e.g. piggy-backing on Kafka for ACID + HA), so Jepsen would have relatively limited applicability but it couldn't hurt to sanity check that!
I generally sympathise with your cynicism though that distributed consistency is much harder to get correct than most people realise. My own attraction to niche databases is the promise of better abstractions - the world desperately need those.
My understanding is the XTDB is either a fork of Datomic, or was inspired by it? I suppose it's timely, given the top story on HN right now is "Datomic is Free" [0]
Is there anyone who's familiar with both and would like to share their thoughts on the two?
Datomic's "immutable database + Datalog" was definitely the biggest source of inspiration but XTDB has a lot of differences - I wrote this FAQ entry up a few years back: https://docs.xtdb.com/resources/faq/#comparisons (note this comparison is for XTDB 1.x and Datomic On-Prem only, so needs revisiting!)
Arguably the biggest difference is that XTDB has a schemaless, dynamic core. Having a sane schema is of course important when building complex things, but we believe that the flexibility to experiment with schema (fully) in userspace is essential for moving the state of the art forwards. For instance a lots Clojure users these days prefer relying on https://github.com/metosin/malli for _all_ their schema needs.
They have completely different data models though, XTDB is a schemaless document data store, think mongodb but with SQL/Datalog querying and bi-temporality.
Datomic's data model is something they call the 'universal schema' where you just specify the attributes and then create entities however you like from them. Datomic aligns more with how you structure data in Clojure, avoiding the impedance mismatch.
hey I did check the wiki, but I didn't clearly understand and how it connects with a relational database, hence I asked here.
one more benefit of asking here, is people also share their practical experiences and how it is out there in the wild. Such stuff is usually missing from the wiki.
Given a person, get a set of the names of that person's friends.
Incidentally, all variables (prefixed with ? by convention, not requirement) are valid inputs or outputs from the query. So for the same query, you could give `?friend-name` and extract `?person`, to get the IDs of the people who are friends with a person of a particular name. This just requires a change in the inputs and outputs, not in the query itself.
The format is [ID field value]. If `value` happens to be an ID, you can navigate by it.
I suppose what's at the heart of this, more than anything, is that Datalog has a near-trivial syntax that can transparently be accommodated by two simple data structures: lists and hash tables. Not so with SQL, whose COBOL-y nature obscures its syntax and makes it less straightforward to represent in everyday data structures.