I don't hate AI. What I hate is while billionaires are promising us a utopian future where work is optional, the price of food, housing, and healthcare in the USA is through the roof. Many people my age (millennials) cannot afford to buy a house for themselves like prior generations were able to. The supposed riches being produced by AI are not being realized for the majority of Americans.
100% agree. There's no substance to the argument, just the same cultish rhetoric from the aristocracy trying to fleece us into thinking that, while they simultaneously push mass layoffs and aim to drive down the price of labor, they are actually doing this to benefit us in the long run. "Just wait," they say. Once the AI future comes to fruition, you will eat "peaches and cream" and "bask in the sun all day." "You will dance in this utopian paradise." How could anyone possibly take this seriously, and are we expected not to see, plainly, the self-interested agenda being dressed up in the language of collective uplift?
It's not uncommon (hex.ai, etc all do this, as do developers, MCP tools, etc). One thing we do at Ardent is enable obfuscated read replicas. We can strip PII in the replicas, so your agents are operating on realistic (but not sensitive) data. Moreover, they can do so in a way that doesn't impact your production database and is fast enough to wire into your CI/CD processes.
Jeremy is correct, though. The main risk/concern is primarily agents with write access. There are two high profile instances in the last year of agents dropping production databases (even when, in one case, after being given explicit instructions to never do such a thing). While read-replicas of a primary DB solve the "agents can't destroy things" problem, they don't solve things like testing schema migrations (in particular) or updates to the data.
It's not uncommon (Hex.ai, etc all do this, as do developers, MCP tools, etc). One thing we do at Ardent is enable obfuscated read replicas. We can strip PII in the replicas, so your agents are operating on very realistic (but not sensitive) data. Moreover, they can do so in a way that doesn't impact your production database and is fast enough to wire into your CI/CD processes.
Jeremy is correct, though. The main risk/concern is primarily agents with write access. There are two high profile instances in the last year of agents dropping production databases (even when, in one case, after being given explicit instructions to never do such a thing). While read-replicas of a primary DB solve the "agents can't destroy things" problem, they don't solve things like testing schema migrations (in particular) or updates to the data.
Jedberg... Wow an internet legend replied to me! ><
> I'm much more worried about people who give full write access to their agents! But at least this solves that problem.
Yeah it goes without saying that write access would be crazy... But, it seems like people don't really care about the fact that they are just giving their private data to companies like Anthropic, OpenAI and Google.
> Branch anonymization
Branches default to a full copy of your production data.
<-- This doesn't seem a safe default to me...
Perhaps a data policy should be required to be in place before a branch can be cloned... The default configuration giving the LLM full prod data access by default, is a bad standard to set, I think.
> Jedberg... Wow an internet legend replied to me!
Hey, I put on my pants the same way you do: by having my staff hold them up while I jump into them.
> But, it seems like people don't really care about the fact that they are just giving their private data to companies like Anthropic/Open AI and Google.
This isn't quite as risky as it seems. All of them have a TOS that says if you pay them enough money they won't train on your data. But you're right that there are probably a lot of people who aren't on those plans sharing private data.
> > Branch anonymization Branches default to a full copy of your production data.
> <-- This doesn't seem a safe default to me...
Agreed, and I'm sure it will cause trouble if you don't also bring along with the copies the internal controls around access logging.
But also, for smaller companies, this isn't an issue since they don't have SOC2 and the other compliance needs yet. So it's probably a sane starting place for Ardent at this time. Most small startups let everyone in the company access the full database anyway.
> Perhaps a data policy should be required to be in place before a branch can be cloned... The default configuration giving the LLM full prod data access by default, is a bad standard to set, I think.
Or at least an easy way to copy it from the database you're branching from.
>> I'm sure it will cause trouble if you don't also bring along with the copies the internal controls around access logging
Yep! Agreed. We've tried to combat this with the "branch_hooks" being team/org level policy objects so we can do enforcement of any kind on the branches before they're ever actually handed to users. This would be things like access control + defined anonymization rules. The broader hope with this class of objects/policies is they can serve as enforcement barriers and essentially allow scoped access at the org level across branches.
The proxy we run in the middle also helps a lot here. Since the URL is minted by our control plane and is not the "real" DB url we can authenticate each user from the URL they're using and enforce RBAC controls.
for example:
User 1's API key is 1234
The CLI can auto-construct urls like: postgresql://{APIKEY}:{ANYTHING}@{IDENTIFIER}--postgres.routing.tryardent.com:5432/DB_NAME?{params}
Your API key is something that can be scoped per user
This is an off the cuff example but essentially we have a way of knowing who is calling the host and thus can enforce if APIKEY = You can't access this DB based on whatever rules.
Curious to understand what additional pieces would be helpful here because this is 100% very important to get right.
AI Datacenters are not how all or probably even most HNers get paid...
> Most data centers colo multiple types of compute, not just those dedicated to inference or model training. Additonally, strangling the economics of the infrastructure layer makes entire ecosystems move abroad.
Sure but we are talking about whether the enormous investment into AI infrastructure is prudent or not. Also I reckon most people on here made a living just fine before everything moved to remote data centers, and many if not most HNers workloads could run on individual machines... But that's another conversation.
I think language grammars are the an interesting way to define a ruleset too. Forget REST API's or MCP Servers for a second... Define a domain specific language, and let the language model generate a valid instruction within the confines of that grammar.
Than pass the program, your server or application can parse the instructions and work from the generated AST to do all sorts of interesting things, within the confines of your language features.
It's verifiable, since you are providing within the defined grammar, and with the parser provided.
It is implicitly sandboxed by the powers you give (or rather exclude) to your runtime via an interpreter/compiler
I've tried this before for a grammar I defined for searching documents, and found it to be quite good at creating valid often complex search instructions.
Previously I made a chrome extension that removes them from web... But I haven't updated it in a while. Basically just inspects the HTML/CSS patterns of the shorts components and removes them from the page. You could probably code/vibe code a similar extension in 10m.