Ask HN: Do you have any evidence that agentic coding works?

xsh6942 · 2026-01-21T09:55:11 1768989311

It really depends by what you mean by "it works". A retrospective of the last 6months.

I've had great success coding infra (terraform). It at least 10x the generation of easily verifiable and tedious to write code. Results were audited to death as the client was highly regulated.

Professional feature dev is hit and miss for sure, although getting better and better. We're nowhere near full agentic coding. However, by reinvesting the speed gains from not writing boilerplate into devex and tests/security, I bring to life much better quality software, maintainable and a boy to work with.

I suddenly have the homelab of my dreams, all the ideas previously in the "too long to execute" category now get vibe coded while watching TV or doing other stuff.

As an old jaded engineer, everything code was getting a bit boring and repetitive (so many rest APIs). I guess you get the most value out of it when you know exactly what you want.

Most importantly though, and I've heard this from a few other seniors: I've found joy in making cool fun things with tech again. I like that new way of creating stuff at the speed of thought, and I guess for me that counts as "it works"

raphaelj · 2026-01-21T11:07:23 1768993643

Same experience here.

On some tasks like build scripts, infra and CI stuff, I am getting a significant speedup. Maybe I am 2x faster on these tasks, when measured from start to PR.

I am working on a HPC project[1] that requires more careful architectural thinking. Trying to let the LLM do the whole task most often fail, or produce low quality code (even with top models like Opus 4.5).

What works well though is "assisted" coding. I am usually writing the interface code (e.g. headers in C++) with some help from the agent, and then let the LLM do the actual implementation of these functions/methods. Then I do final adjustments. Writing a good AGENTS.md helps a lot. I might be 30% faster on these tasks.

It seems to match what I see from the PRs I am reviewing: we are getting these slightly more often than before.

---

[1] https://github.com/finos/opengris-scaler

BrandoElFollito · 2026-01-21T10:18:04 1768990684

> I guess you get the most value out of it when you know exactly what you want.

Oh yes. I am amateur-developping for 35 years and when I vibe code I let the basic, generic stuff happen and then tell the AI to refactor the way I want. It usually works.

I had the same "too boring to code" approach and AI was a revelation. It takes off the typing but allows, when used correctly, for the creative part. I love this.

spopejoy · 2026-01-21T15:45:56 1769010356

The OP question was about agentic utility specifically. I've also gotten great side-project utility from AI codegen without having to marry my project to CC or give up on looking at code by simply prompting when I need something from whatever LLM.

Nothing wrong with CC, but I keep hearing the same kind of app being built -- home automation, side-project CRUD.

What I'm deeply skeptical of is the ability for agentic to integrate with a team maintaining+shipping a critical offering. If you're using LLMs for one-off PRs, great but then agentic seems like a band aid for memory etc.

Meamwhile if you're full CC/agentic it seems like a team would get out of sync.

theshrike79 · 2026-01-21T11:32:54 1768995174

> I suddenly have the homelab of my dreams, all the ideas previously in the "too long to execute" category now get vibe coded while watching TV or doing other stuff.

This is the true game changer.

I have a large-ish NAS that's not very well organised (I'm trying, it's a consolidated mess of different sources from two deacades - at least they're all in the same place now)

It was faster to ask Claude to write me a search database backend + frontend than try to click through the directories and wait for the slow SMB shares to update to find where that one file was I knew was in there.

Now I have a Go backend that crawls my NAS every night, indexes files to a FTS5 sqlite database with minimal metadata (size + mimetype + mtime/ctime) and a simple web frontend I can use to query the database

...actually I kinda want a cli search tool that uses the same schema. Brb.

Done.

AI might be a bubble etc. but I'll still have that search tool (and two dozen other utilities) in 5 years when Claude monthly subsciption is 2000€ and a right to harvest your organs on non-payment.

martinosis · 2026-01-23T17:41:41 1769190101

This is exactly where LLMS shines, but when you get to a larger project,for me everything falls apart since most of the time the application gets way to complex because the LLM try to guess what you want. This is ok for small project but quite bad for larger ones.

theshrike79 · 2026-01-23T17:58:38 1769191118

Depends on so many things. Like the definition of “large” and what you’re asking the LLM to do and how the project is set up for LLM use.

It doesn’t need to guess if it has the tools and documentation available.

donw · 2026-01-21T10:09:50 1768990190

Same here. You have to slice things small enough for the agent to execute effectively, but beyond that, it’s magic.

hahahahhaah · 2026-01-26T02:53:21 1769396001

Terraform is a great use case:

* Unrefactorable and highly boilerplatey

* Probably too big a job and low impact to rewrite as IaC

* AI can do all that tedious plumbing well

* Since result is a depoyment not executable code it suffices to check correct resources are created.

andy_ppp · 2026-01-21T10:17:35 1768990655

I honestly find AI quite poor at writing good well thought through tests, potentially because:

1. writing testable code is part of writing good tests

2. testing is actually poorly done in all the training data because humans are also bad at writing tests

3. tests should be more focused around business logic and describing the application than arbitrarily testing things in an uncanny valley of AI slop

theshrike79 · 2026-01-21T11:40:29 1768995629

When Vibe coding/engineering I don't think of tests in the same way as when testing human written code.

I use unit tests to "lock down" current behavior so an agent rummaging around feature F doesn't break features A and B and will get immediate feedback if that happens.

I'm not trying to match every edge case, but focus more on end to end tests where input and output are locked golden files. "If this comes in, this exact thing must come out the other end." type of thing.

The AI can figure out what went wrong if the tests fail.

andy_ppp · 2026-01-21T14:50:55 1769007055

Yeah, I need to start accepting to some degree the world has changed - in the past when I want to understand a system I'd have read the tests, but with AI I can just ask cursor to explain what the code is doing and it's fairly good at explaining the functionality to me.

I'm not sure I feel truly comfortable yet with huge blocks of code that are not cleanly understood by humans but it's happening whether I like it or not.

resonious · 2026-01-21T06:43:37 1768977817

I think one fatal flaw is letting the agent build the app from scratch. I've had huge success with agents, but only on existing apps that were architected by humans and have established conventions and guardrails. Agents are really bad at architecture, but quite good at following suit.

Other things that seem to contribute to success with agents are:

- Static type systems (not tacked-on like Typescript)

- A test suite where the tests cover large swaths of code (i.e. not just unit testing individual functions; you want e2e-style tests, but not the flaky browser kind)

With all the above boxes ticked, I can get away with only doing "sampled" reviews. I.e. I don't review every single change, but I do review some of them. And if I find anything weird that I had missed from a previous change, I to tell it to fix it and give the fix a full review. For architectural changes, I plan the change myself, start working on it, then tell the agent to finish.

BatteryMountain · 2026-01-21T09:11:38 1768986698

C# works great for agents but it works due to established patterns, strict compiler & strong typing, compiler flag for "Treat Warnings as Errors", .editorconfig with many rules and enforcement of them. You have to tell it to use async where possible, to do proper error handling and logging, xml comments above complex methods and so on. It works really well once you got it figured out. It also helps to give it separate but focussed tasks, so I have a todo.txt file that it can read to keep track of tasks. Basically you have to be strict with it. I cannot imagine how people trust outputs for python/javascript as there are no strong typing or compilers involved, maybe some linting rules that can save you. Maybe Typescript with strict mode can work but then you have to be a purest about it and watch it like a hawk, which will drain you fast. C# + claude code works really well.

tcgv · 2026-01-21T12:56:41 1769000201

Upvote.

That's my experience too. Agent coding works really well for existing codebases that are well-structured and organized. If your codebase is mostly spaghetti—without clear boundaries and no clear architecture in place—then agents won't be of much help. They'll also suffer working in those codebases and produce mediocre results.

Regarding building apps and systems from scratch with agents, I also find it more challenging. You can make it work, but you'll have to provide much more "spec" to the agent to get a good result (and "good" here is subjective). Agents excel at tasks with a narrower scope and clear objectives.

The best use case for coding agents is tasks that you'd be comfortable coding yourself, where you can write clear instructions about what you expect, and you can review the result (and even make minor adjustments if necessary before shipping it). This is where I see clear efficiency gains.

theshrike79 · 2026-01-21T11:43:38 1768995818

I've found Go to be the most efficient language with LLMs

The language is "small", very few keywords and hasn't changed much in a decade. It also has a built in testing system with well known patterns how to use it properly.

Along with robust linters I can be pretty confident LLMs can't mess up too badly.

They do tend to overcomplicate structures a bit and require a fresh context and "see if you can simplify this" or "make those three implement a generic interface" type of prompts to tear down some of the repetition and complexity - but again it's pretty easy with a simple language.

nl · 2026-01-21T08:40:31 1768984831

Typescript is a great type system for agents to use. It's expressive and the compiler is much faster than rust, so turn around is much quicker.

I'm slowly accepting that Python's optional typing is mistake with AI agents, especially with human coders too. It's too easy for a type to be wrong and if someone doesn't have typechecking turned on that mistake propagates.

maleldil · 2026-01-21T13:42:19 1769002939

> I'm slowly accepting that Python's optional typing is mistake with AI agents

Don't make it optional, then. Use pyright or mypy in strict mode. Make it part of your lint task, have the agent run lint often, forbid it from using `type: ignore`, and review every `Any` and `cast` usage.

If you're using CI, make a type error cause the job to fail.

It's not the same as using a language with a proper type system (e.g. Rust), but it's a big step in the right direction.

davidfstr · 2026-01-21T09:07:35 1768986455

You should not be using Python types without a type checker in use to enforce them.

With a type checker on, types are fantastic for catching missed cases early.

K0IN · 2026-01-21T09:04:22 1768986262

Same for typescript, by default you still got `any`, best case (for humans and LLM) is a strict linter that will give you feedback on what is wrong. But then (and I saw this a couple times with non-experienced devs), you or the AI has to know it. Write a strict linter config, use it, and as someone with not that much coding knowledge, you may be unfamiliar and thus not asking.

resonious · 2026-01-21T11:30:49 1768995049

Whenever I have an agent use Typescript, they always cast things to `any` and circumvent the types wherever convenient. And sometimes they don't even compile it - they just run it through Bun or similar.

I know I can configure tools and claude.md to fix this stuff but it's a drag when I could just use a language that doesn't have these problems to begin with.

embedding-shape · 2026-01-21T11:13:27 1768994007

> I'm slowly accepting that Python's optional typing is mistake with AI agents, especially with human coders too. It's too easy for a type to be wrong and if someone doesn't have typechecking turned on that mistake propagates.

How would you end up using types but not have any type checking? What's the point of the types?

barnabee · 2026-01-21T17:26:19 1769016379

I’m currently experimenting (alongside working as usual) with a reasonably non-trivial rust project that will be designed “project managed”[0], built, and tested by LLM agents (mostly Claude, via OpenCode) based on me providing high level requirements and then prompting it to complete things, as well as course correcting (rule: I don’t edit the code, specifications, or tasks directly).

It’s too early to tell how it will work out but things are going better than I expected. It’s probably 20% built after a couple of days, in which I’ve mostly done other work, and it’s working for quite long periods without input from me.

When I do have to provide input, the prompt is often just “Continue working according to the project standards and rules”.

I have no idea if it’ll meet the requirements. I didn’t expect it to get this far, but a month or two ago I didn’t think the chances were high enough to even make it worth trying.

[0] I asked it to create additional documentation for project standards and rules to refer to only when needed (referenced from AGENTS.md). This included git workflow, maintaining a set of specifications, and an overall ROADMAP.md as well TASKS.md (detailed next steps from the roadmap) and STATUS.md (status of each of the tasks).

malloryerik · 2026-01-21T12:54:08 1769000048

I've found good results with Clojure and Elixir despite them being dynamic and niche.

spopejoy · 2026-01-21T15:52:42 1769010762

Not really production level or agentic, but I've been impressed with LLMs for Haskell.

I think that while these langs are "niche" they still have quality web resources and codebases available for training.

I worry about new languages though. I guess maybe model training with synthetic data will become a requirement?

dysoco · 2026-01-21T17:54:26 1769018066

> I worry about new languages though. I guess maybe model training with synthetic data will become a requirement?

I read a (rather pessimistic) comment here yesterday claiming that the current generation of languages is most likely going to be the last, since the already existing corpus of code for training is going to trump any other possible feature the new language might introduce, and most of the code will be LLM generated anyways.

malloryerik · 2026-01-22T09:57:39 1769075859

I've wondered to myself here and there if new languages wouldn't be specifically written for LLM agentic coding, and what that might look like.

defatigable · 2026-01-21T01:58:46 1768960726

I use Augment with Claud Opus 4.5 every day at my job. I barely ever write code by hand anymore. I don't blindly accept the code that it writes, I iterate with it. We review code at my work. I have absolutely found a lot of benefit from my tools.

I've implemented several medium-scale projects that I anticipate would have taken 1-2 weeks manually, and took a day or so using agentic tools.

A few very concrete advantages I've found:

* I can spin up several agents in parallel and cycle between them. Reviewing the output of one while the others crank away.

* It's greatly improved my ability in languages I'm not expert in. For example, I wrote a Chrome extension which I've maintained for a decade or so. I'm quite weak in Javascript. I pointed Antigravity at it and gave it a very open-ended prompt (basically, "improve this extension") and in about five minutes in vastly improved the quality of the extension (better UI, performance, removed dependencies). The improvements may have been easy for someone expert in JS, but I'm not.

Here's the approach I follow that works pretty well:

1. Tell the agent your spec, as clearly as possible. Tell the agent to analyze the code and make a plan based on your spec. Tell the agent to not make any changes without consulting you.

2. Iterate on the plan with the agent until you think it's a good idea.

3. Have the agent implement your plan step by step. Tell the agent to pause and get your input between each step.

4. Between each step, look at what the agent did and tell it to make any corrections or modifications to the plan you notice. (I find that it helps to remind them what the overall plan is because sometimes they forget...).

5. Once the code is completed (or even between each step), I like to run a code-cleanup subagent that maintains the logic but improves style (factors out magic constants, helper functions, etc.)

This works quite well for me. Since these are text-based interfaces, I find that clarity of prose makes a big difference. Being very careful and explicit about the spec you provide to the agent is crucial.

marcus_holmes · 2026-01-21T08:05:49 1768982749

This. I use it for coding in a Rails app when I'm not a Ruby expert. I can read the code, but writing it is painful, and so having the LLM write the code is beneficial. It's definitely faster than if I was writing the code, and probably produces better code than I would write.

I've been a professional software developer for >30 years, and this is the biggest revolution I've seen in the industry. It is going to change everything we do. There will be winners and losers, and we will make a lot of mistakes, as usual, but I'm optimistic about the outcome.

defatigable · 2026-01-21T08:16:23 1768983383

Agreed. In the domains where I'm an expert, it's a nice productivity boost. In the domains where I'm not, it's transformative.

As a complete aside from the question of productivity, these coding tools have reawakened a love of programming in me. I've been coding for long enough that the nitty gritty of everyday programming just feels like a slog - decrypting compiler errors, fixing type checking issues, factoring out helper functions, whatever. With these tools, I get to think about code at a much higher level. I create designs and high level ideas and the AI does all the annoying detail work.

I'm sure there are other people for whom those tasks feel like an interesting and satisfying puzzle, but for me it's been very liberating to escape from them.

ZitchDog · 2026-01-21T15:38:20 1769009900

> In the domains where I'm an expert, it's a nice productivity boost. In the domains where I'm not, it's transformative.

Is it possible that the code you are writing isn't good, but you don't know it because you're not an expert?

defatigable · 2026-01-21T19:19:25 1769023165

No, I'm quite confident that I'm very strong in these languages. Certainly not world-class but I write very good code and I know well-written code when I see it.

If you'd like some evidence, I literally just flipped a feature flag to change how we use queues to orchestrate workflows. The bulk of this new feature was introduced in a 1300-line PR, touching at least four different services, written in Golang and Python. It was very much AI agent driven using the flow I described. Enabling the feature worked the first time without a hiccup.

(To forestall the inevitable quibble, I am aware that very large PRs are against best practice and it's preferable to use smaller, stacked PRs. In this case for clarity purposes and atomicity of rollbacks I judged it preferable to use a single large PR.)

jesse__ · 2026-01-21T06:53:49 1768978429

> I've implemented several medium-scale projects that I anticipate would have taken 1-2 weeks manually

A 1-week project is a medium-scale project?! That's tiny, dude. A medium project for me is like 3 months of 12h days.

defatigable · 2026-01-21T07:10:42 1768979442

You are welcome to use whatever definition of "small/medium/large" you like. Like you, 1-2 weeks is also far from the largest project I've worked on. I don't think that's particularly relevant to the point of my post.

The point that I'm trying to emphasize is that I've had success with it on projects of some scale, where you are implementing (e.g.) multiple related PRs in different services. I'm not just using it on very tightly scoped tasks like "implement this function".

jesse__ · 2026-01-21T17:20:27 1769016027

I mean, if it's working for you, great.

The observation I was trying to make is that at the scope of one week, there's very little you actually get done, and it's likely mostly mechanical work. Given that, I suppose I'm unsurprised LLMs are proving useful. Seems like that's the type of thing they're excelling at.

defatigable · 2026-01-21T18:22:13 1769019733

That's not my experience. I agree that a project of any real size takes quite a bit longer than a week. But it's composed of lots of, well, week or two long subprojects. And if the AI coding tool is condensing week long projects into a day, that's a huge benefit.

Concretely speaking (well as concretely as I feel like being without piercing pseudonymity), at my last job I worked on a multi year rewrite of one of our core services. Within that rewrite were ton of much smaller projects that were a few weeks to a month long - refactor this algorithm, improve the load balancing, add a new sharding strategy, etc. An AI tool would definitely not have sped up the whole process. It's not going to, say, speed up figuring out and handling intra-team dependencies or figuring out product design. But speeding up those smaller coding subprojects would have been a huge benefit.

I'm not making any strong claims in my post. I don't have the experience of AI projects allowing me to one shot large projects. But OP asked if anyone has concrete experience with AI coding tools speeding up development, and the answer is yes, I do.

drewstiff · 2026-01-21T09:05:09 1768986309

Well a medium project for me takes 3 years, so obviously I am the best out of everyone /s

monkeydust · 2026-01-21T08:48:04 1768985284

1. And 2. I.e. creating a spec which is the source of truth (or spec driven development) is key to getting anything production grade from our experience.

defatigable · 2026-01-21T14:37:34 1769006254

Yes. This was the key thing I learned that let me set the agents loose on larger tasks. Before I started iterating on specs with them, I mostly had them doing very small scale, refactor-this-function style tasks.

The other advice I've read that I haven't yet internalized as much is to use an "adversarial" approach with the LLMs: i.e. give them a rigid framework that they have to code against. So, e.g., generate tests that the code has to work against, or sample output that the code has to perfectly match. My agents do write tests as part of their work, and I use them to verify correctness, but I haven't updated my flow to emphasize that the agents should start with those, and iterate on them before working on the main implementation.

mountainriver · 2026-01-23T05:08:15 1769144895

Same, Opus 4.5 is nothing short of amazing. I’m really shocked to see so many posts claiming it doesn’t work.

We write whole full scale Rust SaaS apps with few regressions.

I do novel machine learning research in about a 1/10 of the time it would have taken me.

A big thing is telling it to excessively log so it can see the execution

laserlight · 2026-01-21T12:41:33 1768999293

I wouldn't consider the proposed workflow agentic. When you review each step, give feedback after each step, it's simply development with LLMs.

defatigable · 2026-01-21T14:31:52 1769005912

Interesting. What would make the workflow "agentic" in your mind? The AI implementing the task fully autonomously, never getting any human feedback?

To me "agentic" in this context essentially that the LLM has the ability to operate autonomously, so execute tools on my behalf, etc. So for example my coding agents will often run unit tests, run code generation tools, etc. I've even used my agents to fix issues with git pre-commit hooks, in which case they've operated in a loop, repeatedly trying to check in code and fixing errors they see in the output.

So in that sense they are theoretically capable of one-shot implementing any task I set them to, their quality is just not good enough yet to trust them to. But maybe you mean something different?

laserlight · 2026-01-21T16:31:51 1769013111

IMHO, agentic workflow is the autonomous execution of a detailed plan. Back-and-forth between LLM and developer is fine in the planning stage. Then, the agent is supposed to overcome any difficulties or devise solutions to unplanned situations. Otherwise, Cursor had been able to develop in a tight loop of writing and running tests, followed by fixing bugs, before “agentic” became a buzzword.

Perhaps “agentic” initially referred to this simple loop, but the milestone was achieved so quickly that the meaning shifted. Regardless, I could be wrong.

defatigable · 2026-01-21T18:59:54 1769021994

Yeah, I have no idea what the consensus definition of the term is, and I suppose I can't say for sure what OP meant. I haven't used Cursor. My understanding was that it exercises IDE functions but does not execute arbitrary shell commands, maybe I'm wrong. I've specifically had good experiences with the tools being able to run arbitrary commands (like the git debugging example I mentioned).

In my experience reading discussions like this, people seem to be saying that they don't believe that Claude Code and similar tools provide much of a productivity boost on relatively open ended domains (i.e. the AI is driving the writing of the code, not just assisting you in writing your own code faster). And that's certainly not my experience.

I agree with you that success with the initial milestone ("agent operates in a self-contained loop and can execute arbitrary commands") was achieved pretty quickly. But in my experience a lot of people don't believe this. :-)

tkgally · 2026-01-21T03:08:56 1768964936

Great advice.

> Tell the agent your spec, as clearly as possible.

I have recently added a step before that when beginning a project with Claude Code: invoke the AskUserQuestionTool and have it ask me questions about what I want to do and what approaches I prefer. It helps to clarify my thinking, and the specs it then produces are much better than if I had written them myself.

I should note, though, that I am a pure vibe coder. I don't understand any programming language well enough to identify problems in code by looking at it. When I want to check whether working code produced by Claude might still contain bugs, I have Gemini and Codex check it as well. They always find problems, which I then ask Claude to fix.

None of what I produce this way is mission-critical or for commercial use. My current hobby project, still in progress, is a Japanese-English dictionary:

https://github.com/tkgally/je-dict-1

https://www.tkgje.jp/

defatigable · 2026-01-21T07:05:34 1768979134

Great idea! That's actually the very next improvement I was planning on making to my coding flow: building a sub agent that is purely designed to study the codebase and create a structured implementation plan. Every large project I work on has the same basic initial steps (study the codebase, discuss the plan with me, etc) so it makes sense to formalize this in an agent I specialize for the purpose.

marcus_holmes · 2026-01-21T08:08:35 1768982915

Is it just me, or does every post starting with "Great Idea!" or "Great point!" or "You're so right!" or similar just sound like an LLM is posting?

Or is this a new human linguistic tic that is being caused by prolonged LLM usage?

Or is it just me?

defatigable · 2026-01-21T08:20:44 1768983644

:-) I feel you. Perhaps I should have ended my post with "Would you like me to construct a good prompt for your planning agent?" to really drive us into the uncanny valley?

(My writing style is very dry and to the point, you may have noticed. I looked at my post and thought, "Huh, I should try and emotionally engage with this poster, we seem like we're having a shared experience." And so I figured, heck, I'll throw in an enthusiastic interjection. When I was in college, my friends told me I had "bonsai emotions" and I suppose that still comes through in my writing style...)

marcus_holmes · 2026-01-22T00:13:26 1769040806

Excellent reply :) And yes, maybe that's it, that the LLM emotion feels forced so any forced emotion now feels like an LLM wrote it.

solaris2007 · 2026-01-21T05:40:56 1768974056

[flagged]

djmips · 2026-01-21T08:06:01 1768982761

"Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes"

molteanu · 2026-01-21T06:31:39 1768977099

That's a very good point.

The OP is "quite weak at JavaScript" but their AI "vastly improved the quality of the extension." Like, my dude, how can you tell? Does the code look polished, it looks smart, the tests pass, or what?! How can you come forward and be the judge of something you're not an expert in?

I mean, at this point, I'm beginning to be skeptical about half the content posted online. Anybody can come up with any damn story and make it credible. Just the other day I found out about reddit engagement bots, and I've seen some in the wild myself.

I'm waiting for the internet bubble to burst already so we can all go back to our normal lives, where we've left it 20 years or so ago.

defatigable · 2026-01-21T06:54:03 1768978443

How can I tell? Yes, the code looks quite a bit more polished. I'm not expert enough in JS to, e.g., know the cleanest method to inspect and modify the DOM, but I can look at code that does and tell if the approach it's using is sensible or not. Surely you've had the experience of a domain where you can evaluate the quality of the end product, even if you can't create a high quality product on your own?

Concretely in this case, I'd implemented an approach that used jQuery listeners to listen for DOM updates. Antigravity rewrote it to an approach that avoided the jQuery dependency entirely, using native MutationObservers. The code is sensible. It's noticeably more performant than the approach I crafted by hand. Antigravity allowed me to easily add a number of new features to my extension that I would have found tricky to add by hand. The UI looks quite a bit nicer than before I used AI tools to update it. Would these enhancements have been hard for an expert in Chrome extensions to implement? Probably not. But I'm not that expert, and AI coding tools allowed me to do them.

That was not actually the main thrust of my post, it's just a nice side benefit I've experienced. In the main domain where I use coding tools, at work, I work in languages where I'm quite a bit more proficient (Golang/Python). There, the quality of code that the AI tools generate is not better than I write by hand. The initial revisions are generally worse. But they're quite a bit faster than I write by hand, and if I iterate with the coding tools I can get to implementations that are as good as I would write by hand, and a lot faster.

I understand the bias towards skepticism. I have no particular dog in this fight, it doesn't bother me if you don't use these tools. But OP asked for peoples' experiences so I thought I'd share.

achierius · 2026-01-21T06:39:24 1768977564

JavaScript isn't the only programming language around. I'm not the strongest around with JS either but I can figure it out as necessary -- knowing C/C++/Java/whatever means you can still grok "this looks better than that" for most cases.

defatigable · 2026-01-21T07:03:06 1768978986

Yep. I have plenty of experience in languages that use C-style syntax, enough to easily understand code written in other languages that occur nearby in the syntactical family tree. I'm not steeped in JS enough to know the weird gotchas of the type system, or know the standard library well, etc. But I can read the code fine.

If I'd asked an AI coding tool to write something up for me in Haskell, I would have no idea if it had done a good job.

esailija · 2026-01-21T09:03:24 1768986204

I don't think so. Imagine it was vice versa, someone saying they knew JS and were weak at C/C++/Java.

defatigable · 2026-01-21T14:48:21 1769006901

This doesn't sound right to me. If someone who were expert in JS looked at a relatively simple C++ program, I think they could reasonably well tell if the quality of code were good or not. They wouldn't be able to, e.g., detect bugs from default value initialization, memory leaks, etc. But so long as the code didn't do any crazy templating stuff they'd be able to analyze it at a rough "this algorithm seems sensible" level".

Analogously I'm quite proficient at C++, and I can easily look at a small JS program and tell if it's sensible. But if you give me even a simple React app I wouldn't be able to understand it without a lot of effort (I've had this experience...)

I agree with your broad point: C/C++/Java are certainly much more complex than JS and I would expect someone expert in them to have a much easier time picking up JS than the reverse. But given very high overlap in syntax between the four I think anyone who's proficient in one can grok the basics of the others.

defatigable · 2026-01-21T06:38:21 1768977501

I've never had a job where writing Javascript has been the primary language (so far it's been C++/Java/Golang). The JS Chrome Extension is a fun side project. Using Augment in a work context, I'm primarily using it for Golang and Python code, languages where I'm pretty proficient but AI tools give me a decent efficiency boost.

I understand the emotional satisfaction of letting loose an easy snarky comment, of course, but you missed the mark I'm afraid.

solaris2007 · 2026-01-21T10:43:48 1768992228

[flagged]

christophilus · 2026-01-21T11:42:32 1768995752

> If you are any good with those four languages, you are leagues ahead of anyone who does Javascript full time.

That is a priggish statement, and comes across as ignorant.

I’ve been paid to program in many different languages over the years. Typescript is what I choose for most tasks these days. I haven’t noticed any real difference between my past C#, C++, C, Java, Ruby, etc programming peers and my current JavaScript ones.

solaris2007 · 2026-01-21T12:22:58 1768998178

> That is a priggish statement

A cursory glance at the definition of "prig" shows that what I wrote there is categorically not. You should at least try to look up that word and if you look it up and still don't get it then what you have is a reading comprehension issue.

> Typescript is what I choose for most tasks these days.

So you're smart on this, at least. Cantrill said it really well, Typescript brought "fresh water" to Javascript.

> haven’t noticed any real difference between my past C#, C++, C, Java, Ruby, etc programming peers and my current JavaScript ones.

You might still be on their level. I see that you didn't mention Rust or at least GoLang. Given the totality of your responses, you're certainly not writing any safe C (not ever).

sirwhinesalot · 2026-01-20T15:05:37 1768921537

The only approach I've tried that seems to work reasonably well, and consistently, was the following:

Make a commit.

Give Claude a task that's not particularly open ended, the closer to pure "monkey work" boilerplate nonsense the task is, the better (which is also the sort of code I don't want do deal with myself).

Preferably it should be something that only touches a file or two in the codebase unless it is a trivial refactor (like changing the same method call all over the place)

Make sure it is set to planning mode and let it come up with a plan.

Review the plan.

Let it implement the plan.

If it works, great, move on to review. I've seen it one-shot some pretty annoying tasks like porting code from one platform to another.

If there are obvious mistakes (program doesn't build, tests don't pass, etc.) then a few more iterations usually fix the issue.

If there are subtle mistakes, make a branch and have it try again. If it fails, then this is beyond what it can do, abort the branch and solve the issue myself.

Review and cleanup the code it wrote, it's usually a lot messier than it needs to be. This also allows me to take ownership of the code. I now know what it does and how it works.

I don't bother giving it guidelines or guardrails or anything of the sort, it can't follow them reliably. Even something as simple as "This project uses CMake, build it like this" was repeatedly ignored as it kept trying to invoke the makefile directly and in the wrong folder.

This doesn't save me all that much time since the review and cleanup can take long, but it serves a great unblocker.

I also use it as a rubber duck that can talk back and documentation source. It's pretty good for that.

This idea of having an army of agents all working together on the codebase is hilarious to me. Replace "agents" with "juniors I hired on fiverr with anterograde amnesia" and it's about how well it goes.

dwd · 2026-01-21T05:26:37 1768973197

+1 for the Rubber duck, and as an unblocker.

My personal use is very much one function at a time. I know what I need something to do, so I get it to write the function which I then piece together.

It can even come back with alternatives I may not have considered.

I might give it some context, but I'm mainly offloading a bunch of typing. I usually debug and fix it's code myself rather than trying to get it to do better.

crq-yml · 2026-01-20T18:33:06 1768933986

TBH I think the greatest benefit is on the documentation/analysis side. The "write the code" part is fine when it sits in the envelope of things that are 100% conventional boilerplate. Like, as a frontend to ffmpeg you can get a ton of value out of LLMs. As soon as things go open-ended and design-centric, brace yourself.

I get the sense that the application of armies of agents is actually a scaled-up Lisp curse - Gas Town's entire premise is coding wizardry, the emphasis on abstract goals and values, complete with cute, impenetrable naming schemes. There's some corollary with "programs are for humans to read and computers to incidentally execute" here. Ultimately the program has to be a person addressing another person, or nature, and as such it has to evolve within the whole.

theshrike79 · 2026-01-21T09:29:20 1768987760

> I don't bother giving it guidelines or guardrails or anything of the sort

Where do you give these guardrails? In the chat or CLAUDE.md?

Basic level information like how to build and test the project belong in CLAUDE.md, it knows to re-check that now and then.

sirwhinesalot · 2026-01-21T11:01:01 1768993261

Yeah, CLAUDE.md. Sometimes it just ignores what was in there after the context window gets big enough (as it tends to with planning mode).

laylower · 2026-01-20T15:55:25 1768924525

That's the way.

edude03 · 2026-01-20T14:34:35 1768919675

I have the same experience despite using claude every day. As an funny anecdote:

Someone I know wrote the code and the unit tests for a new feature with an agent. The code was subtly wrong, fine, it happens, but worse the 30 or so tests they added added 10 minutes to the test run time and they all essentially amounted to `expect(true).to.be(true)` because the LLM had worked around the code not working in the tests

monooso · 2026-01-20T15:25:10 1768922710

There was an article on HN last week (?) which described this exact behaviour in the newer models.

Older, less "capable", models would fail to accomplish a task. Newer models would cheat, and provide a worthless but apparently functional solution.

Hopefully someone with a larger context window than myself can recall the article in question.

SatvikBeri · 2026-01-20T15:58:26 1768924706

I think that article was basically wrong. They asked the agent not to provide any commentary, then gave an unsolvable task, and wanted the agent to state that the task was impossible. So they were basically testing which instructions the agent would refuse to follow.

Purely anecdotally, I've found agents have gotten much better at asking clarifying questions, stating that two requirements are incompatible and asking which one to change, and so on.

https://spectrum.ieee.org/ai-coding-degrades

sReinwald · 2026-01-20T16:00:16 1768924816

From my experience: TDD helps here - write (or have AI write) tests first, review them as the spec, then let it implement.

But when I use Claude code, I also supervise it somewhat closely. I don't let it go wild, and if it starts to make changes to existing tests it better have a damn good reason or it gets the hose again.

The failure mode here is letting the AI manage both the implementation and the testing. May as well ask high schoolers to grade their own exams. Everyone got an A+, how surprising!

edude03 · 2026-01-20T16:59:10 1768928350

> TDD helps here - write (or have AI write) tests first, review them as the spec

I agree, although I think the problem usually comes in writing the spec in the first place. If you can write detailed enough specs the agent will usually give you exactly what you asked for. If you're spec is vague, it's hard to eyeball if the tests or even the implementation of the tests matches what you're looking for.

jermaustin1 · 2026-01-20T15:32:19 1768923139

This happens with me every time I try to get claude to write tests. I've given up on it. Instead I will write the tests if I really care enough to have tests.

antonvs · 2026-01-20T15:12:15 1768921935

> they all essentially amounted to `expect(true).to.be(true)` because the LLM had worked around the code not working in the tests

A very human solution

netsharc · 2026-01-20T23:38:35 1768952315

I wonder if Volkswagen would've blamed AI if they got caught with Dieselgate nowadays...

In PR-lese: "To improve quality and reduce costs, we used AI to program some test code. Unfortunately the test code the AI generated fell below our standards, and it was missed during QA.".

Then again they got their supplier Bosch to program the "defeat device" and lied to them that "Oh don't worry, it's just for testing, we won't deploy it to production". (The "device" (probably just an algorithm) detects whether the steering wheel was being moved or not as the throttle is pushed, and if not, it assumes the car was undergoing emissions testing, and it runs the engine in the environmentally friendlier mode).

fotcorn · 2026-01-20T15:22:41 1768922561

I used Claude Opus 4.5 inside Cursor to write RISC-V Vector/SIMD code. Specifically Depthwise Convolution and normal Convolution layers for a CNN.

I started out by letting it write a naive C version without intrinsic, and validated it against the PyTorch version.

Then I asked it (and two other models, Gemini 3.0 and GPT 5.1) to come up with some ideas on how to make it faster using SIMD vector instructions and write those down as markdown files.

Finally, I started the agent loop by giving Cursor those three markdown files, the naive C code and some more information on how to compile the code, and also an SSH command where it can upload the program and test it.

It then tested a few different variants, ran it on the target (RISC-V SBC, OrangePI RV2) to check if it improves runtime, and then continue from there. It did this 10 times, until it arrived at the final version.

The final code is very readable, and faster than any other library or compiler that I have found so far. I think the clear guardrails (output has to match exactly the reference output from PyTorch, performance must be better than before) makes this work very well.

sifar · 2026-01-20T15:59:49 1768924789

I am really surprised by this. While I know it can generate correct SIMD code, getting a performant version is non trivial, especially for RVV, where the instruction choices and the underlying micro architecture would significantly impact the performance.

IIRC, Depthwise is memory bound so the bar might be lower. Perhaps you can try some thing with higher compute intensity like a matrix multiply. I have observed, it trips up with the columnar accesses for SIMD.

fotcorn · 2026-01-22T11:51:22 1769082682

I think the ability to actually run the code on the target helped a lot with understanding and optimizing for the specific micro architecture. Quite a few of the ideas turned out to not to be optimal and were discarded.

Also important to have a few test cases the agent can quickly check against, it will often generate wrong code, but if that is easily detectable the agent can fix it and continue quickly.

camel-cdr · 2026-01-20T16:06:41 1768925201

can you share the code?

zh3 · 2026-01-21T09:06:49 1768986409

I'll bite.

Here's my realtime Bluetooth heart rate monitor for linux, with text output and web interface.

   https://github.com/lowrescoder/BlueHeart

This was 100% written by Claude Code, my input was limited to mostly accepting Claude suggestions except a couple of cases where I could make suggestions to speed up development (skipping some tests I knew would work).

Particularly interesting because I didn't expect this to work, let along not to write any code. Note that I limited it to pure C with limited dependencies; initial prompt was just to get text output ("Heart Rate 76bpm"), when it got to that point I told Claude to add a web interface followed by creating a realtime graph to show the interface in use.

Every file is claude generated. AMA.

edit: this was particularly interesting as it had to test against the HRM sensor I was wearing during development, and to cope with bluetooth devices appearing and disappearing all the time. It took about a day for the whole thing and cost around $25.

further edit: I am by no means an expert with Claude (haven't even got to making a claude.md file); the one real objective here was to get a working example of using dBus to talk to blueZ in C, something I've failed at (more than once) before.

embedding-shape · 2026-01-21T09:58:57 1768989537

It's a good demonstration of when agents still don't get everything right when you place things into Markdown documentation. You have to be really valiant and verify everything from top to bottom, if you want to control how things are implemented to that degree, otherwise the agent will still take shortcuts where they can.

In https://github.com/lowrescoder/BlueHeart/blob/68ab2387a0c44e... for example, it doesn't actually do SSE at all, instead it queues up a complete HTTP response each time, returns once and then closes the stream, so basically a normal HTTP endpoint, "labeled" as a SSE one. SSE is mentioned a bunch of times in the docs, and the files/types/functions are labeled as such, but that doesn't seem to be what's going on internally, from what I could understand. Happy to stand corrected though!

zh3 · 2026-01-21T10:12:20 1768990340

Yes, I haven't even read most of the files, just threw it up there as an example for the OP (I too am tired of the lack of examples, so stepped up to the plate on this one).

As a personal bit of development last weekend. I can see inconsistences myself, some of which result from scope creep during development (starting with the idea of a text-only app and then grafting on the web side) - it literally only started because I wanted a working example of bluetooth and dBus in C, the rest of it just joined the ride.

As for the SSE, no expert on that myself, however if you watch the messages in the browser console it appears to push updates with sporadic notes about using polling instead.

embedding-shape · 2026-01-21T10:23:21 1768991001

> Yes, I haven't even read most of the files, just threw it up there as an example for the OP (I too am tired of the lack of examples, so stepped up to the plate on this one).

Right, kind of like an LLM skimming and missing the core points :)

OP didn't ask for "Anything you've vibe-coded" but explicitly asked for code written with LLMs that is high quality and structurally sound, and "creates more value than it creates technical debt". That's why I felt like reviewing the code in the first place, and why I gave the feedback.

I understand now that maybe it felt like my impromptu code review came out of nowhere, but I thought you were actually trying to give OP a accurate sample, so sorry if it felt like it came out of nowhere :)

zh3 · 2026-01-21T10:38:30 1768991910

NP, and the exact definition of vibe-coding is, I think, yet to be determined. This wasn't a yolo, it was read all the prompts and generally accept them. Overall I'd say the code and web page are at least of a quality I've seen in many commercial settings; the code itself looks reasonable and if I was to do anything to it for a real 'release', I'd update the documentation which has suffered due to the extensive scope creep during implementation.

embedding-shape · 2026-01-21T10:48:24 1768992504

> the exact definition of vibe-coding is, I think, yet to be determined

Huh? No, that's been established since Karpathy coined the term; you don't review the code, only use the agent and don't care about how it was done, just about the results.

The actual interesting stuff is how to use LLMs together with a human, to build high quality code. More "augmenting the human intellect" rather than "autonomous robots building for you".

Overall I'd say if someone handed you a specification that named SSE specifically, you created files with SSE in the name, and the implementation talks about doing SSE, yet it doesn't actually do SSE in the end, it's pretty much on par with code in commercial settings, yeah :) But maybe our bar should be slightly above the ground at least? :)

maerch · 2026-01-21T12:25:35 1768998335

> Huh? No, that's been established since Karpathy coined the term; you don't review the code, only use the agent and don't care about how it was done, just about the results.

However, nowadays it is used as a synonym for everything that is somehow generated by an LLM. Regardless of whether it is a spec-driven, carefully reviewed and iterative piece of software or some yolo-style one-prompter with no idea how it was done.

embedding-shape · 2026-01-21T12:38:03 1768999083

Yes, by people who don't actually understand what they're talking about, doesn't mean we need to fall to lowest common denominator here on HN too.

Most people understanding "hacking" differently than us, but we've made that work, we can talk about hacking here without other HN users believing we're cracking passwords, why not the same for other terms?

clbrmbr · 2026-01-21T13:16:25 1769001385

Yes, an understanding of sockets and timing of interprocess communication & networking seems to be a weak point of current models.

kqr · 2026-01-21T10:06:40 1768990000

Have you reviewed the code? What were the problems with it? Where did it do things better than you'd expect of humans? Have you compared the effort of making changes to it to the effort required for similar, human-written software?

I don't think anyone says it's not possible to get the LLM to write code. The problems OP has with them is that the code they write starts out good but then quickly devolves when the LLMs get stuck in the weird ruts they have.

zh3 · 2026-01-21T10:32:27 1768991547

Far short of a proper review, however I have scanned the code. Bear in mind this was a purely personal project, never intended to see the light of day and initially just done to create a small but operable chunk of dbus/blueZ glue code for another project.

I have no doubt that a C developer with sufficient knowledge of dBus, bluetooth, the HRM profile and linux could have written the C code in a day. Adding the HTTP server again would be easy if the developer also had experience of that (n.b. there was a minor compiler error when I tried it on another system due to a slightly different version of libmicrohttpd). Adding the API would be straighforward (but tedious) and similiarly the web page (the web page was an one-shot after Claude wrote the API, vis. "Create a web page to display a real time plot with history using the API").

So overall I'd answer that that human developers would could have pulled that off in a day are few and far between (and likely to cost a lot more than $25 plus a day of my time).

And do I think the code is good enough? Yes, more than good enough. I could take it and run with it, against that because it ended up 100% AI-generated I feel a bit like leaving it as a monument to "pure AI".

After all, I never intended to release it - it was this thread that made my throw it up on Github as an example for the OP.

59nadir · 2026-01-21T09:51:24 1768989084

Thank you for actually posting an example that people can look at; I think most other responders misunderstood the post as asking for more pointless anecdotes filled with superlatives and "trust me bro" sentiments.

bobmcnamara · 2026-01-21T09:50:49 1768989049

Pretty neat!

Is there a name for the UI style of the web server page? I've noticed several web apps have a similar style to that.

zh3 · 2026-01-21T10:03:17 1768989797

I didn't ask for a style, it's just what Claude came up with by default. Here's what happened just now when I asked it:-

   What was the inspiration for the CSS styling of this web page?                                                                    

  ● Looking at the CSS styling in live_plot.html, the inspiration appears to be Tailwind CSS and modern minimalist design trends.

embedding-shape · 2026-01-21T10:07:33 1768990053

Just as a heads up, LLMs doesn't actually understand why they do what they do, you asking about it will make them reason about why it happened, but it's not the "motivation", it's essentially guesses with no anchoring to reality.

Just thought I'd clarify as I've seen prompts like this and people thinking this is the actual motivation from the "inside the LLM" or whatever, which is a bit far away from the truth.

zh3 · 2026-01-21T10:14:10 1768990450

Fair enough, I did ask for "inspiration" through rather than "motivation" - mainly because I recall a comment on here a few days ago that LLMs are carefully trained to never reveal where the training material came from. So the prompt was aimed at working around that.

embedding-shape · 2026-01-21T10:21:19 1768990879

Yeah, inspiration, motivation, justification etc are synonyms in this case, the point I was trying to make was something like "LLMs don't know why they do what they do", and asking for them to provide it, will make them come up with it on the spot afterwards, not actually share what the inspiration/motivation/justification was at the time the tokens were sampled.

dagss · 2026-01-21T07:44:33 1768981473

I've been programming for 20 years, and I've always been under-estimating how long things will take (no, not pressured by anyone to give firm estimates, just talking about informally when prioritizing work order together).

The other day I gave an estimate to my co-worker and he said "but how long is it really going to take, because you always finish a lot quicker than you say, you say two weeks and then it takes two days".

The LLMs will just make me finish things a lot faster and my gut feel estimation for how long things will take still is not yet taking that into account.

(And before people talk about typing speed: No that isn't it at all. I've always been the fastest typer and fastest human developer among my close co-workers.)

Yes, I need to review the code and interact with the agent. But it's doing a lot better than a lot of developers I've worked with over the years, and if I don't like the style of the code it takes very few words and the LLM will "get it" and it will improve it..

Some commenters are comparing the LLM to a junior. In some sense that is right in that the work relationship may be the same as towards a (blazingly fast) junior; but the communication style and knowledge area and how few words I can use to describe something feels more like talking to a senior.

(I think it may help that latest 10 years of my career a big part of my job was reviewing other people's code, delegating tasks, being the one who knew the code base best and helping others into it. So that means I'm used to delegating not just coding. Recently I switched jobs and am now coding alone with AI.)

trashb · 2026-01-21T10:42:27 1768992147

I see your point in that you can use advanced terms with the LLM which makes it more like peer programming with a senior instead of a junior.

> "but how long is it really going to take, because you always finish a lot quicker than you say, you say two weeks and then it takes two days"

However these statement just kinda makes your comment smell of r/thatHappend. Since it is such a tremendous speed up.

Therefore I am intrigued what kind of problems you working on? Does it require a lot of boilerplate code or a lot of manually adjusting settings?

dagss · 2026-01-21T11:21:34 1768994494

I obviously don't know that my past two days of work would have taken two weeks in the alternative route, but it's my feeling for this particular work:

I'm implementing a drawing tool on top of maps for fire departments (see demo.syncmap.no -- it's only in Norwegian for now though, plan to launch in English and Show HN it in some months). Typescript, Svelte, Go, Postgres.

This week I have been making the drawing tools more powerful (not deployed publicly yet).

* Gesture recognition to turn wobbly lines into straight lines in some conditions

* Auto-fill closed shapes: Vector graphics graph algorithms to segment the graph and compute the right fill regions that feel natural in the UI (default SVG fill regions were not right, took some trial and error to find something that just feels natural enough)

* Splines to make smoother curves .. fitting Catmull-Rom, converting those to other splines for SVG representation etc

* Constraints when dragging graph nodes around that shapes don't intersect when I don't want them to etc

I haven't been working all that much with polygon graphics before, so the LLM is very helpful in a) explaining me the concepts and b) providing robust implementations for whatever I need.

And I've had many dead ends that didn't feel natural in UI that I could discard after trying them out in full, without loosing huge investment.

These are all things that are very algorithm and formula intensive and where I would have had do to a lot of reading and research to do things right myself. (I could deal with it, but it takes a lot of time to read up on it.)

I review to see that it "looks sensible", not every single addition and division in the spline interpolations, or every step of the graph segmentation algorithms used to compute fill regions. I review function signatures and overall architecture, not the small details (in frontend -- obviously the backend authorization code is reviewed line by line..)

dagss · 2026-01-21T12:39:01 1768999141

Fresh example:

I described a problem on UI level. LLM suggested the Ramer-Douglas-Peucker algorithm to solve it, which I have never heard about before. It implemented it. Works perfectly. It is 40 lines of code (of which I only really need to review the function signature, and note the fact that it's a recursive bisection algorithm). I would have spent a very long trying to figure out what to do here otherwise and the LLM handed me the solution.

trashb · 2026-01-22T11:41:56 1769082116

Yes this kind of work will be sped up a lot by AI since you are not familiar with the intricacies of the subject matter. Especially with well documented but complex formats it can assist (vector graphics are not necessarily intuitive). Additionally in my experience UI design is quite pattern and boilerplate heavy.

The suggesting of algorithms sounds good, I don't know how you got there but I would ask for several algorithms that fit the bill and narrow it down myself (the first suggestion isn't always optimal).

Thank you for taking the time to shine some light onto what you're doing as I can see how you get that kind of speedup from using AI in this scenario.

keybored · 2026-01-21T08:49:31 1768985371

I was expecting that evidence part that OP asked for in the top comments.

wewewedxfgdf · 2026-01-21T02:04:31 1768961071

You fundamentally misunderstand AI assisted coding if you think it does the work for you, or that it gets it right, or that it can be trusted to complete a job.

It is an assistant not a team mate.

If you think that getting it wrong, or bugs, or misunderstandings, or lost code, or misdirections, are AI "failing", then yes you will fail to understand or see the value.

The point is that a good AI assisted developer steers through these things and has the skill to make great software from the chaotic ingredients that AI brings to the table.

And this is why articles like this one "just don't get it", because they are expecting the AI to do their job for them and holding it to the standards of a team mate. It does not work that way.

ummonk · 2026-01-21T08:15:36 1768983336

What is the actual value of using agentic LLMs (rather than just LLM-powered autocomplete in your IDE) if it requires this much supervision and handholding? When is it actually faster / more effective?

dagss · 2026-01-21T10:58:54 1768993134

Why use a nailgun instead of a hammer, if the nailgun still requires supervision and handholding?

Example: Say I discover a problem in the SPA design that can be fixed by tuning some CSS.

Without LLM: Dig around the code until I find the right spot. If it's been some months since I was there this can easily cost five minutes.

With LLM: Explain what is wrong. Perhaps description is vague ("cancel button is too invisible, I need another solution") or specific ("1px more margin here please"). The LLM makes a best effort fix within 30 secs. The diff points to just the right location so you can fine tune it.

FiberBundle · 2026-01-21T08:58:02 1768985882

The primary value is accrued by the AI labs. You pay hundreds or thousands of dollars a month to train their AI models. While you probably do increase your productivity saving time typing all the code, the feedback that you give the agent after it has produced mediocre or poor code is extremely valuable to the companies, because they train their reinforcement learning models with them. Now while you're happy you have such a great "assistant" that helps you type out code, you will at some point realize that your architectural/design skills really weren't all that special in the first place. All the models lacked to be good at that was sufficient data containing the correct rewards. Thankfully software engineers are some of the most naive people in the world, and they gave them that data by actually paying for it.

terabytest · 2026-01-21T06:13:07 1768975987

That’s not what I meant. What I’m asking is whether there’s any evidence that the latest “techniques” (such as Ralph) can actually lead to high quality results both in terms of code and end product, and if so, how.

cheema33 · 2026-01-21T08:24:18 1768983858

I used Ralph recently, in Claude Code. We had a complex SQL script that was crunched large amounts of data and was slow to run even on tables that are normalized, have indexes for the right columns etc. We, the humans spent significant amount of time tweaking it. We were able to get some performance gains, but eventually hit a wall. That is when I let Ralph take a stab at it. I told it to create a baseline benchmark and I gave it the expected output. I told to keep iterating on the script until there was at least 3x improvement in performance number while the output was identical. I set the iteration limit to 50. I let it loose and went to dinner. When I came back, it had found a way to get 3x performance and stopped on the 20th iteration.

Is there another human that could get me even better performance given the same parameters. Probably yes. In the same amount of time? Maybe, but unlikely. In any case, we don't have anybody on our team that can think of 20 different ways to improve a large and complex SQL script and try them all in a short amount of time.

These tools do require two things before you can expect good results:

1. An open mind. 2. Experience. Lots of it.

BTW, I never trust the code an AI agent spits out. I get other AI agents, different LLMs, to review all work, create deterministic tests that must be run and must pass before the PR is ever generated. I used to do a lot of this manually. But now I create Claude skills that automate a lot of this away.

gbalduzzi · 2026-01-21T07:09:59 1768979399

I don't understand what kind of evidence you expect to receive.

There are plenty of examples from talented individuals, like Antirez or Simonw, and an ocean of examples from random individuals online.

I can say to you that some tasks that would take me a day to complete are done in 2h of agentic coding and 1h of code review, with the additional feature that during the 2h of agenti coding I can do something else. Is this the kind of evidence you are looking for?

xg15 · 2026-01-21T07:01:26 1768978886

"You're holding it wrong"

wewewedxfgdf · 2026-01-21T04:34:58 1768970098

[flagged]

tjr · 2026-01-21T05:04:57 1768971897

Given the claims that AI is replacing jobs left and right, that there’s no more need for software developers or computer science education, then it had jolly well better be able to code a perfect application while I watch baseball.

dagss · 2026-01-21T06:41:57 1768977717

As long as it makes already senior engineers work as quickly alone as when working in a team together with 3 juniors, it can lead to replacing jobs without producing code that doesn't need review.

al_borland · 2026-01-21T00:41:09 1768956069

A principal engineer at Google posted on Twitter that Claude Code did in an hour what the team couldn’t do in a year.

Two days later, after people freaked out, context was added. The team built multiple versions in that year, each had its trade offs. All that context was given to the AI and it was able to produce a “toy” version. I can only assume it had similar trade offs.

https://xcancel.com/rakyll/status/2007659740126761033#m

My experience has been similar to yours, and I think a lot of the hype is from people like this Google engineer who play into the hype and leave out the context. This sets expectations way out of line from reality and leads to frustration and disappointment.

another_twist · 2026-01-21T13:59:08 1769003948

Thats because getting promoted requires thought leadership and fulfilling AI mandates. Hence the tweet from this PE at Google, another from one at Microsoft wanting to rewrite the entire c++ base to Rust, few other projects also from MS all about getting the right Markdown files etc etc

keybored · 2026-01-21T07:21:10 1768980070

> A principal engineer at Google posted on Twitter that Claude Code did in an hour what the team couldn’t do in a year.

I’ll bring the tar if you bring the feathers.

That sounds hyperbolic but how can someone say something so outrageoulsy false.

thornewolf · 2026-01-21T07:52:53 1768981973

as someone who worked at the company, i understood the meaning behind the tweet without the additional clarification. i think she assumed too much shared context when making the tweet

keybored · 2026-01-21T07:59:40 1768982380

A principal engineer at Google made a public post on the World Wide Web and assumed some shared Google/Claude-context. Do you hear yourself?

tokioyoyo · 2026-01-21T11:02:36 1768993356

Working in a large scale org gets you accustomed to general problems in decision making that aren’t that obvious. Like I totally understood what she means and in my head nodded with “yeah that tracks”.

keybored · 2026-01-21T14:30:03 1769005803

Maybe it helps them sleep at night.

grayhatter · 2026-01-21T16:11:57 1769011917

People make mistakes, it's not that deep. The correct incentive to encourage is admitting, and understand and forgiving when necessary because you don't want to encourage people to hide mistakes out of shame. That only makes things worse.

Especially considering forgetting the delta between yours and someone else's shared context is extremely common. And the least egregious mistake you can make when writing an untargeted promo post.

keybored · 2026-01-21T17:09:31 1769015371

My bad. I will be more mindful tomorrow when someone at a big tech company yet again make-a-mistake in the same direction of AI hyping. Maybe with a later addendum. Like journalists that write about a Fatal Storm In Houston and you read down to the eighth paragraph and it turns out the fatality were among pigeons.

> when writing an untargeted promo post.

lol.

grayhatter · 2026-01-21T18:24:18 1769019858

> My bad. I will be more mindful tomorrow when someone at a big tech company yet again make-a-mistake in the same direction of AI hyping.

Are you mad at them for playing the game, or mad that that's the game they have to play to advance at their company?

> Like journalists that write about a Fatal Storm In Houston and you read down to the eighth paragraph and it turns out the fatality were among pigeons.

I don't know; I guess I hold people who post on twitter so they can self-promo, or who have attention because they work at $company, to a slightly different standard than I would hold a journalist writing a news article?

keybored · 2026-01-21T21:53:02 1769032382

> I don't know; I guess I hold people who post on twitter so they can self-promo, or who have attention because they work at $company, to a slightly different standard than I would hold a journalist writing a news article?

I know. One of them has a higher salary.

xboxnolifes · 2026-01-22T18:11:56 1769105516

Do you think people who work at Google are perfect?

JimDabell · 2026-01-21T11:34:08 1768995248

Who are you referring to here? If you follow the link, you will see that the Google engineer did not say that.

keybored · 2026-01-21T13:25:40 1769001940

I am quoting the person that I responded to. Which linked to this: https://xcancel.com/rakyll/status/2007659740126761033#m

> I’m not joking and this isn’t funny. We have been trying to build distributed agent orchestrators at Google since last year. There are various options, not everyone is aligned... I gave Cloud Code a description of the problem, it generated what we built last year in an hour.

So I see one error. GP said “couldn’t do”. The engineer really said “matched”.

woooooo · 2026-01-21T16:50:02 1769014202

The key words in the quote are "not everyone is aligned". It's not about execution ability.

keybored · 2026-01-22T19:08:53 1769108933

The key words are “one hour”. So what if there is some mealy-mouthed preamble?

woooooo · 2026-01-23T09:10:35 1769159435

At big tech companies "were still finding alignment" is code for leadership not being able to make decisions and unblock execution.

keybored · 2026-01-23T11:25:41 1769167541

Dearest leadership not unblocking execution. What's that got to do with what we’re complaining about? Did Claude cure cancer (this is hyperbole) in one hour or not?

woooooo · 2026-01-23T16:10:50 1769184650

The point of the story to me was, with a clear idea of what they want, the person got a year's worth of Big Tech done in a few hours with Claude. Could have been a couple days with a tight team, either way, the problem wasn't the coding ability or typing speed.

tech_tuna · 2026-01-23T15:10:44 1769181044

Yeah, exactly. There is no way Claude could do that much work in one hour, starting from scratch. You can even ask Claude if it could do that and it will say the same.

The LLM/AI tools are powerful and have a ton of use cases unlike technologies like crypto, but the hype train is running full steam and no one really knows where things will land over the next 5-10 years.

dysleixc · 2026-01-21T06:38:14 1768977494

[flagged]

cocoto · 2026-01-21T07:24:36 1768980276

From the very beginning everyone tells us “you are using the wrong model”. Fast forward a year, the free models become as good as last year premium models and the result is still bad but you still hear the same message “you are not using the last model”… I just stopped caring to try the new shiny model each month and simply reevaluate the state of the art once a year for my sanity. Or maybe my expectation is clearly too high for these tools.

coryrc · 2026-01-21T12:19:15 1768997955

Are you sure you haven't moved the goalposts? The context here is "agentic coding" i.e. it does it all, while in the past the context was, to me anyway, "you describe the code you want and it writes it and you check it's what you asked for". The latter does work on free models now.

laserlight · 2026-01-21T12:29:56 1768998596

When one is not happy with LLM output, agentic workflow rarely improves quality --- even though it may improve functionality. Now, instead of making sure that LLM is on track at each step, it goes down a rabbit hole, at which point it's impossible to review the work, let alone make it do it your way.

amoss · 2026-01-21T07:48:08 1768981688

This discussion is a request for positive examples to demonstrate any of the recent grandiose claims about ai assisted development. Attempting to switch instead to attacking the credentials of posters only seems to supply evidence that there are no positive examples, only hype. It doesn't seem to add to the conversation.

aprdm · 2026-01-21T18:33:53 1769020433

There's people spending 5k a month on tokens, if you're work generates 7-8 figures per year, that's peanuts and companies will happily pay for that per engineer

consp · 2026-01-21T07:24:50 1768980290

> would call out the AI hype bubble

Which is what it is by describing it as a tool needing thousands of dollars and years of time in learning fees while being described as "replaces devs" in an instant. It is a tool and when used sparingly by well trained people, works. To the extend that any large statistical text predictor would.

DoesntMatter22 · 2026-01-21T14:48:44 1769006924

I’ve mostly used the 20 a month cursor plan and I’ve gotten to the point I can code huge things with rarely the need to do anything manually

DoesntMatter22 · 2026-01-21T14:48:44 1769006924

I’ve mostly used the 20 a month cursor plan and I’ve gotten to the point I can code huge things with rarely the need to do anything manually

tw061023 · 2026-01-21T11:12:56 1768993976

That, uh, says a lot about Google, doesn't it?

srcport56445 · 2026-01-21T04:48:13 1768970893

Humans regularly design entire Uber, google, youtube, twitter, whatsapp etc in 45 mins in system design interviews. So AI designing some toy version is meh.

hahahahhaah · 2026-01-21T06:46:12 1768977972

Yeah that was bullshit (like most AI related crap... lies, damn lies, statistics, ai benchmarks). Like saying my 5 year old said words that would solve the Greenland issue in an hour. But words not put to test lol, just put on a screen and everyone say woah!!! AI can't ship. That stil needs humans.

edanm · 2026-01-21T05:01:17 1768971677

You're choosing to focus on specific hype posts (which were actually just misunderstandings of the original confusingly-worded Twitter post).

While ignoring the many, many cases of well-known and talented developers who give more context and say that agentic coding does give them a significant speedup (like Antirez (creator of Reddit), DHH (creator of RoR), Linus (Creator of Linux), Steve Yegge, Simon Wilison).

NoPicklez · 2026-01-21T05:51:01 1768974661

Why not in that case provide an example to rebut and contribute as opposed to knocking someone elses example even if it was against the use of agentic coding.

edanm · 2026-01-21T06:37:32 1768977452

Serious question - what kind of example would help at this point?

Here are a sample of (IMO) extremely talented and well known developers who have expressed that agentic coding helps them: Antirez (creator of Reddit), DHH (creator of RoR), Linus (Creator of Linux), Steve Yegge, Simon Wilison. This is just randomly off the top of my head, you can find many more. None of them claim that agentic coding does a years' worth of work for them in an hour, of course.

In addition, pretty much every developer I know has used some form of GenAI or agentic coding over the last year, and they all say it gives them some form of speed up, most of them significant. The "AI doesn't help me" crowd is, as far as I can tell, an online-only phenomenon. In real life, everyone has used it to at least some degree and finds it very valuable.

trashb · 2026-01-21T10:32:12 1768991532

Those are some high profile (celebrity) developers.

I wonder if they have measured their results? I believe that the perceived speed up of AI coding is often different from reality. The following paper backs this idea https://arxiv.org/abs/2507.09089 . Can you provide data that objects this view, based on these (celebrity) developers or otherwise?

embedding-shape · 2026-01-21T11:12:21 1768993941

Almost off-topic, but got me curious: How can I measure this myself? Say I want to put concrete numbers to this, and actually measure, how should I approach it?

My naive approach would be to just implement it twice, once together with an LLM and once without, but that has obvious flaws, most obvious that the order which you do it with impacts the results too much.

So how would I actually go about and be able to provide data for this?

disgruntledphd2 · 2026-01-21T14:52:18 1769007138

> My naive approach would be to just implement it twice, once together with an LLM and once without, but that has obvious flaws, most obvious that the order which you do it with impacts the results too much.

You'd get a set of 10-15 projects, and a set of 10-15 developers. Then each developer would implement the solution with LLM assistance and without such assistance. You'd ensure that half the developers did LLM first, and the others traditional first.

You'd only be able to detect large statistical effects, but that would be a good start.

If it's just you then generate a list of potential projects and then flip a coin as to whether or not to use the LLM and record how long it takes along with a bunch of other metrics that make sense to you.

embedding-shape · 2026-01-21T15:05:57 1769007957

The initial question was:

> wonder if they have measured their results?

Which seems to indicate that there would be a suitable way for a single individual to be able to measure this by themselves, which is why I asked.

What you're talking about is a study and beyond the scope of a single person, and also doesn't give me the information I'd need about myself.

> If it's just you then generate a list of potential projects and then flip a coin as to whether or not to use the LLM and record how long it takes along with a bunch of other metrics that make sense to you.

That sounds like I can just go by "yeah, feels like I'm faster", which I thought exactly was parent wanted to avoid...

disgruntledphd2 · 2026-01-21T16:59:31 1769014771

> That sounds like I can just go by "yeah, feels like I'm faster", which I thought exactly was parent wanted to avoid...

No it doesn't, but perhaps I assumed too much context. Like, you probably want to look up the Quantified Self movement, as they do lots of social science like research on themselves.

> Which seems to indicate that there would be a suitable way for a single individual to be able to measure this by themselves, which is why I asked.

I honestly think pick a metric you care about and then flip a coin to use an LLM or not is the best you're gonna get within the constraints.

embedding-shape · 2026-01-21T17:03:13 1769014993

> Like, you probably want to look up the Quantified Self movement, as they do lots of social science like research on themselves.

I guess I was looking for something bit more concrete, that one could apply themselves, which would answer the "if they have measured their results? [...] Can you provide data that objects this view" part of parents comment.

> then flip a coin to use an LLM or not is the best you're gonna get within the constraints.

Do you think trashb who made the initial question above would take the results of such evaluation and say "Yeah, that's good enough and answers my question"?

disgruntledphd2 · 2026-01-22T10:35:50 1769078150

> I guess I was looking for something bit more concrete, that one could apply themselves, which would answer the "if they have measured their results? [...] Can you provide data that objects this view" part of parents comment.

This stuff is really, really hard. Social science is very difficult as there's a lot of variance in human ability/responses. Added to that is the variance surrounding setup and tool usage (claude code vs aider vs gemini vs codex etc).

Like, there's a good reason why social scientists try to use larger samples from a population, and get very nerdy with stratification et al. This stuff is difficult otherwise.

The gold standard (rather like the METR study) is multiple people with random assignment to tasks with a large enough sample of people/tasks that lots of the random variance gets averaged out.

On a 1 person sample level, it's almost impossible to get results as good as this. You can eliminate the person level variance (because it's just one person), but I think you'd need maybe 100 trials/tasks to get a good estimate.

Personally, that sounds really implausible, and even if you did accomplish this, I'd be sceptical of the results as one would expect a learning effect (getting better at both using LLM tools and side projects in general).

The simple answer here (to your original question) is no, you probably can't measure this yourself as you won't have enough data or enough controls around the collection of this data to make accurate estimates.

To get anywhere near a good estimate you'd need multiple developers and multiple tasks (and a set of people to rate the tasks such that the average difficulty remains constant.

Actually, I take that back. If you work somewhere with lots and lots of non-leetcode interview questions (take homes etc) you could probably do the study I suggested internally. If you were really interested in how this works for professional development, then you could randomise at the level of interviewee and track those that made it through and compare to output/reviews approx 1 year later.

But no, there's no quick and easy way to do this because the variance is way too high.

> Do you think trashb who made the initial question above would take the results of such evaluation and say "Yeah, that's good enough and answers my question"?

I actually think trashb would have been OK with my original study, but obviously that's just my opinion.

trashb · 2026-01-22T11:32:25 1769081545

To wrap this up, what I was trying to say is that the feeling of being faster may not align with the reality. Even for people that have a good understanding of the matter it may be difficult to estimate. So I would say be skeptical of claims like this and try to somehow quantize it in a way that matters for the tasks you do. This is something managers of software projects have been trying to tackling for a while now.

There is no exact measurement in this case but you could get an idea by testing certain types of implementations. For example if you are finishing similar tasks on average 25% faster during a longer testing period with and without AI. Just the act of timing yourself doing tasks with or without AI may already give a crude indication of the difference.

You could also run a trail implementing coding tasks like leet code however you will introduce some kind of bias due to having done it previously. And additionally the tasks may not align with your daily activities.

A trail with multiple developers working on the same task pool with or without AI could lead to more substantial results but you won't be able to do that by yourself.

embedding-shape · 2026-01-22T15:56:44 1769097404

So there seems to be an shared underestanding how difficult "measure your results" would be in this case, so could we also agree that asking someone:

> I wonder if they have measured their results? [...] Can you provide data that objects this view, based on these (celebrity) developers or otherwise?

isn't really fair? Because not even you or I really know how to do so in a fair and reasonable manner, unless we start to involve trials with multiple developers and so on.

disgruntledphd2 · 2026-01-23T14:57:20 1769180240

> isn't really fair? Because not even you or I really know how to do so in a fair and reasonable manner, unless we start to involve trials with multiple developers and so on.

I think in a small conversation like this, it's probably not entirely fair.

However, we're hearing similar things from much larger organisations who definitely have the resources to do studies like this, and yet there's very little decent work available.

In fact, lots of the time they are deliberately misleading people (25% of our code generated by AI being copilot/other autocomplete). Like, that 25% stat was probably true historically with JetBrains products and using any form of code generations (for protobufs et al) so it's wildly deceptive et al.

edanm · 2026-01-23T13:08:19 1769173699

> I wonder if they have measured their results?

This is a notoriously difficult thing to measure in a study. More relevantly though, IMO, it's not a small effect that might be difficult to notice - it's a huge, huge speedup.

How many developers have measured whether they are faster when programming in Python vs assembly? I doubt many have. And I doubt many have chosen Python over assembly because of any study that backs it up. But it's also not exactly a subtle difference - I'm fairly 99% of people will say that, in practice, it's obvious that Python is faster for programming than assembly.

I talked literally yesterday to a colleague who's a great senior dev, and he made a demo in an hour and a half that he says would've taken him two weeks to do without AI. This isn't a subtle, hard to measure difference. Of course this is in an area where AI coding shines (a new codebase for demo purposes) - but can we at least agree that in some things AI is clearly an order of magnitude speedup?

Adrig · 2026-01-21T10:03:31 1768989811

A lot of comments reads like a knee jerk reaction to the Twitter crowd claiming they vibe code apps making 1m$ in 2 weeks.

As a designer I'm having a lot of success vibe coding small use cases, like an alternative to lovable to prototype in my design system and share prototypes easily.

All the devs I work with use cursor, one of them (front) told me most of the code is written by AI. In the real world agentic coding is used massively

margorczynski · 2026-01-21T09:10:57 1768986657

I think it is a mix of ego and fear - basically "I'm too smart to be replaced by a machine" and "what I'm gonna do if I'm replaced?".

The second part is something I think a lot about now after playing around with Claude Code, OpenCode, Antigravity and extrapolating where this is all going.

menaerus · 2026-01-21T10:30:42 1768991442

I agree it's about the ego .. about the other part I am also trying to project few scenarios in my head.

Wild guess nr.1: large majority of software jobs will be complemented (mostly replaced) with the AI agents, reducing the need for as many people doing the same job.

Wild guess nr.2: demand for creating software will increase but the demand for software engineers creating that software will not follow the same multiplier.

Wild guess nr.3: we will have the smallest teams ever with only few people on board leading perhaps to instantiating the largest amount of companies than ever.

Wild guess nr.4: in near future, the pool of software engineers as we know them today, will be drastically downsized, and only the ones who can demonstrate they can bring the substantial value over using the AI models will remain relevant.

Wild guess nr.5: getting the job in software engineering will be harder than ever.

akoboldfrying · 2026-01-21T07:39:05 1768981145

Nit: s/Reddit/Redis/

Though it is fun to imagine using Reddit as a key-value store :)

neomantra · 2026-01-21T13:57:25 1769003845

That is hilarious.... and to prove the point of this whole comment thread, I created reddit-kv for us. It seems to work against a mock, I did not test it against Reddit itself as I think it violates ToS. My prompts are in the repo.

https://github.com/ConAcademy/reddit-kv/blob/main/README.md

akoboldfrying · 2026-01-21T22:01:13 1769032873

Typo-Driven Development!

edanm · 2026-01-21T09:18:29 1768987109

Aaarg I was typing quickly and mistyped. :face-palm:

Thanks for the correction.

grayhatter · 2026-01-21T20:02:12 1769025732

You haven't provided a sample either... But sure, lets dig in.

> Antirez

When I first read his recent article, I found the whole article, uncompelling. https://antirez.com/news/158 (don't buy into the anti-AI hype) But gave it a 2nd chance; and re-read it. I'm gonna have to resist going line by line, because I find some of it outright objectionable.

> Whatever you believe about what the Right Thing should be, you can't control it by refusing what is happening right now. Skipping AI is not going to help you or your career.

Setting aside the rhetorical/argumentative deficiencies, and the fact this is just FUD because (he next suggests if you disagree, just keep trying it every few months? which suggests to me even he knows it's BS). He writes that in the context of the ethical or moral objections he raises. So he's suggesting that the best way to advance in your career, is to ignoring the social and ethical concerns and just get on board?

Gross.

Individual careers aside, I'm not impressed by the correctness of the code emitted, by AI and committed by most AI users. I'm unconvinced that AI will improve the industry, and it's reputation as a whole.

But the topic is supposed to be specific examples of code, so lets do that. He mentions adding utf-8 to his toy terminal input project -> https://github.com/antirez/linenoise/commit/c12b66d25508bd70... It's a very useful feature to add, without a doubt! His library is better than it was before. But parsing utf-8, while something that's very easy to implement without care, or incompletely, i.e. something that's very easy to trip over if you're careless. The implementation specifics of it are fairly described as a solved problem. It's been done so many times, if you're willing to re-implement from another existing source, It wouldn't take very long to do this without AI. (And if you're not, why are you using AI? I'm ethically opposed to the laundered provenience of source material) Then, it absolutely would take more time to verify that the code is correct if you did it by hand. The thing everyone keeps telling me I have to ensure that the AI hasn't made a mistake, so either I trust the vibes, or I'm still spending that time. Even Simon Willison agrees with me[1].

> Simon Willison

Is another suggested, so he's perfect to go next. I normally would exclude someone who's clearly best know as an AI influencer, but he's without a doubt an engineer too to fair game. Especially given he's answered a similar question just recently https://news.ycombinator.com/item?id=46582192 I've been searching for a counter point to my personal anti-AI hype, so was eager to see what the experts are making.... it's all boilerplate. I don't mean to say there's nothing valuable or that there's nothing useful there. Only that the vast majority of the code in these repos, is boilerplate that has no use out of context. The real value is just a few lines of code, something that I believe would only take 30m if you wrote the code without AI for the project you were already working on. It'd take a few hours to make any of this myself (assuming I'm even good enough to figure it out).

And I do admit, 10m on BART vs 3-4hours on a weekend is a very significant time delta. But also, I like writing code. So what was I really gonna do with that time? Make share holder value go up no doubt!

> Linus Torvalds

I can't find a single source where he's an advocate for AI. I've seen the commit, and while some of the github comments are gold. I wasn't able to draw any meaningful conclusions from the commit in isolation. Especially not when the last I read about it, he used it because he doesn't write python code. So I don't know what conclusions there are I can pull from this commit, other than AI can emit code. I knew that.

I don't have enough context to comment on the opinions of Steve Yegge or his AI generated output. I simply don't know enough, and after a quick search nothing other than AI influencer jumped out at me.

Then I try to care about who I give my time and attention to, or who I associate with so this is the end of list.

I contrast these, examples with all the hype that's proven over and over to be a miscommunication if I'm being charitable, or an outright lie if I'm not. I also think it's important to consider the incentives leading to these "miscommunications" when evaluating how much good faith you assign them.

On top of that, there's the countless examples of AI confidently lying to me about something. Explaining my fundamental concrete objection to being lied to; would take another hour I shouldn't spend on a HN comment.

What specific examples of impressive things/projects/commits/code am I missing? What output, makes all the downsides of AI a worthwhile trade off?

> In addition, pretty much every developer I know has used some form of GenAI or agentic coding over the last year, and they all say it gives them some form of speed up

I remember reading something that when tested, they're not actually faster. Any source on this other than vibes?

[1]: https://simonwillison.net/2025/Dec/18/code-proven-to-work/

NBJack · 2026-01-21T06:12:24 1768975944

Citation needed. Talk, especially in the 'agentic age', is cheap.

everfrustrated · 2026-01-20T23:42:14 1768952534

Learning how to drive the models is a legit skill - and I don't mean "prompt engineering". There are absolutely techniques that help and because things are moving fast there is little established practice to draw from. But it's also been interesting seeing experienced coders struggle - I've found my time as a manager has been more help to me than my time as a coder. How to keep people on task and focused etc is very similar to managing humans. I suspect much of the next 5 years will be people rediscovering existing human and project management techniques and rebranding them as AI something.

Some techniques I've found useful recently:

- If the agent struggled on something once it's done I'll ask it "you were struggling here, think about what happened and if there are is anything you learned. Put this into a learnings document and reference it in agents.md so we don't get stuck next time"

- Plans are a must. Chat to the agent back and forth to build up a common understanding of the problem you want solved. Make sure to say "ask me any follow up questions you think are necessary". This chat is often the longest part of the project - don't skimp on it. You are building the requirements and if you've ever done any dev work you understand how important having good requirements are to the success of the work. Then ask the model to write up the plan into an implementation document with steps. Review this thoroughly. Then use a new agent to start work on it. "Implement steps 1-2 of this doc". Having the work broken down into steps helps to be able to do work more pieces (new context windows). This part is the more mindless part and where you get to catch up on reading HN :)

- The GitHub Copilot chat agent is great. I don't get the TUI folks at all. The Pro+ plan is a reasonable price and can do a lot with it (Sonnet, Codex, etc all available). Being able to see the diffs as it works is helpful (but not necessary) to catch problems earlier.

marwamc · 2026-01-21T01:36:14 1768959374

+1 for generating plans and then clearing context. I typically have a skill and an agent. I use the skill to generate an initial plan for an atomic unit of work, clear context and then use the agent to review said plan. Finally clear context and use the skill to implement the plan phase by phase, ensuring to review each phase for consistency with the next phase and the overall plan. I've had moderate success with this.

throwup238 · 2026-01-21T05:16:36 1768972596

Another important thing to do is to instruct the agent to keep a <plan-name>-NOTES.md file where it tracks its progress and keeps implementation notes. The notes are usually short with Opus 4.5 but very helpful, especially when you need to reset mid-phase and restart it with a fresh context.

If you keep the notes around in repo, you can instruct future plan writers to review implementation notes from relevant plans to keep continuity.

linesofcode · 2026-01-20T14:55:30 1768920930

When you first began learning how to program were you building and shipping apps the next day? No.

Agentic programming is a skill-set and a muscle you need to develop just like you did with coding in the past.

Things didn’t just suddenly go downhill after an arbitrary tipping point - what happened is you hit a knowledge gap in the tooling and gave up.

Reflect on what went wrong and use that knowledge next time you work with the agent.

For example, investing the time in building a strong test suite and testing strategy ahead of time which both you and the agent can rely on.

Being able to manage the agent and getting quality results on a large, complex codebase is a skill in itself, it won’t happen over night.

It takes practice and repetition with these tools to level-up, just like any thing else.

terabytest · 2026-01-20T15:22:00 1768922520

Your point is fair, but it rests on a major assumption I'd question: that the only limit lies with the user, and the tooling itself has none. What if it’s more like “you can’t squeeze blood from a stone”? That is, agentic coding may simply have no greater potential than what I've already tried. To be fair I haven't gone all the way in trying to make it work but, even if some minor workarounds exist, the full promise being hyped might not be realistically attainable.

linesofcode · 2026-01-20T15:33:22 1768923202

How can one judge potential without fully understanding or having used it to its full potential?

I don’t think agentic programming is some promised land of instant code without bugs.

It’s just a force multiplier for what you can do.

aban-m · 2026-01-23T11:44:11 1769168651

The point is precisely this. How do you know you have used it to its full potential? "You're holding it wrong" has no limits.

arjie · 2026-01-21T08:58:18 1768985898

I think I have evidence it works for me in that a bunch of unfinished projects suddenly finished themselves and work for me in the way I want them to. So whatever delta there was between my ideas and my execution, it has been closed for me.

If I'm being honest, the people who get utility out of this tool don't need any tutorials. The smattering of ideas that people mention is sufficient. The people who don't get utility out of this tool are insistent that it is useless, which isn't particularly inspiring to the kind of person who would write a good tutorial.

Consequently, you're probably going to have to pay someone if you want a handholding. And at the end you might believe it wasn't worth it.

keybored · 2026-01-21T09:40:30 1768988430

So this is like Jesus.

cwoolfe · 2026-01-20T15:42:13 1768923733

Hang in there. Yes it is possible; I do it every day. I also do iOS and my current setup is: Cursor + Claude Opus 4.5.

You still need to think about how you would solve the problem as an engineer and break down the task into a right-sized chunk of work. i.e. If 4 things need to change, start with the most fundamental change which has no other dependencies.

Also it is important to manage the context window. For a new task, start a new "chat" (new agent). Stay on topic. You'll be limited to about five back-and-forths before performance starts to suffer. (cursor shows a visual indicator of this in the for of the circle/wheel icon)

For larger tasks, tap the Plan button first, and guide it to the correct architecture you are looking for. Then hit build. Review what it did. If a section of code isn't high-quality, tell Claude how to change it. If it fails, then reject the change.

It's a tool that can make you 2 - 10x more productive if you learn to use it well.

proc0 · 2026-01-20T13:56:58 1768917418

My experience is the same. In short, agents cannot plan ahead, or plan at a high level. This means they have a blindspot for design. Since they cannot design properly, it limits the kind of projects that are viable to smaller scopes (not sure exactly how small but in my experience, extremely small and simple). Anything that exceeds this abstract threshold has a good chance of being a net negative, with most of the code being unmantainable, unextensible, and unreliable.

Anyone who claims AI is great is not building a large or complex enough app, and when it works for their small project, they extrapolate to all possibilities. So because their example was generated from a prompt, it's incorrectly assumed that any prompt will also work. That doesn't necessarily follow.

The reality is that programming is widely underestimated. The perception is that it's just syntax on a text file, but it's really more like a giant abstract machine with moving parts. If you don't see the giant machine with moving parts, chances are you are not going to build good software. For AI to do this, it would require strong reasoning capabilities, that lets it derive logical structures, along with long term planning and simulation of this abstract machine. I predict that if AI can do this then it will be able to do every single other job, including physical jobs as it would be able to reason within a robotic body in the physical world.

To summarize, people are underestimating programming, using their simple projects to incorrectly extrapolate to any possible prompt, and missing the hard part of programming which involves building abstract machines that work on first principles and mathematical logic.

linsomniac · 2026-01-20T15:09:20 1768921760

>Anyone who claims AI is great is not building a large or complex enough app

I can't speak for everyone, but lots of us fully understand that the AI tooling has limitations and realize there's a LOT of work that can be done within those limitations. Also, those limitations are expanding, so it's good to experiment to find out where they are.

Conversely, it seems like a lot of people are saying that AI is worthless because it can't build arbitrarily large apps.

I've recently used the AI tooling to make a docusign-like service and it did a fairly good job of it, requiring about a days worth of my attention. That's not an amazingly complex app, but it's not nothing either. Ditto for a calorie tracking web app. Not the most complex app, but companies are making legit money off them, if you want a tangible measure of "worth".

proc0 · 2026-01-20T19:41:42 1768938102

Right, it has a lot of uses. As a tool it has been transformative on many levels. The question is whether it can actually multiply productivity across the board for any domain and at production level quality. I think that's what people are betting on, and it's not clear to me yet that it can. So far that level looks more like a tradeoff. You can spend time orchestrating agents, gaining some speedup at the cost of quality, or you can use it more like a tool and write things "manually" which is a lot higher quality.

antonvs · 2026-01-20T15:17:58 1768922278

> Anyone who claims AI is great is not building a large or complex enough app

That might be true for agentic coding (caveat below), but AI in the hands of expert users can be very useful - "great" - in building large and complex apps. It's just that it has to be guided and reviewed by the human expert.

As for agentic coding, it may depend on the app. For example, Steve Yegge's "beads" system is over a quarter million lines of allegedly vibe-coded Go code. But developing a CLI like that may be a sweet spot for LLMs, it doesn't have all the messiness of typical business system requirements.

znsksjjs · 2026-01-20T16:06:06 1768925166

> For example, Steve Yegge's "beads" system is over a quarter million lines of allegedly vibe-coded Go code. But developing a CLI like that may be a sweet spot

Is that really a success? I was just reading an article talking about how sloppy and poorly implemented it is: https://lucumr.pocoo.org/2026/1/18/agent-psychosis/

I guess it depends on what you’re looking to get out of it.

antonvs · 2026-01-20T18:12:51 1768932771

I haven't looked into it deeply, but I've seen people claiming to find it useful, which is one metric of success.

Agentic vibe coding maximalists essentially claim that code quality doesn't matter if you get the desired functionality out of it. Which is not that different from what a lot of "move fast and break things" startups also claim, about code that's written by humans under time, cost, and demand pressure. [Edit: and I've seen some very "sloppy and poorly implemented" code in those contexts, as well as outside software companies, in companies of all sizes. Not all code is artisanally handcrafted by connoisseurs such as us :]

I'm not planning to explore the bleeding edge of this at the moment, but I don't think it can be discounted entirely, and of course it's constantly improving.

jsight · 2026-01-21T06:09:47 1768975787

I'd say it is a success at being useful, but yeah it does seem like the code itself has been a bit of a mess.

I've used a version that had a bd stats and a bd status that both had almost the same content in slightly different formats. Later versions appear to have made them an alias for the same thing. I've also had a version where the daemon consistently failed to start and there were no symptoms other than every command taking 5 seconds. In general, the optimization with the daemon is a questionable choice. It doesn't really need to be _that_ fast.

And yet, even after all of that it still has managed to be useful and generally fairly reliable.

proc0 · 2026-01-20T19:47:57 1768938477

Anything above a simple app and it becomes a tradeoff that needs to be carefully tuned so that you get the most out of it and it doesn't end up being a waste of time. For many use cases and domain combinations this is a net positive, but it's not yet consistent across everything.

From my experience it's better at some domains than others, and also better at certain kinds of app types. It's not nearly as universal as it's being made out to be.

SatvikBeri · 2026-01-20T15:55:28 1768924528

Sure, here are my own examples:

* I came up with a list of 9 performance improvement ideas for an expensive pipeline. Most of these were really boring and tedious to implement (basically a lot of special cases) and I wasn't sure which would work, so I had Claude try them all. It made prototypes that had bad code quality but tested the core ideas. One approach cut the time down by 50%, I rewrote it with better code and it's saved about $6,000/month for my company.

* My wife and I had a really complicated spreadsheet for tracking how much we owed our babysitter – it was just complex enough to not really fit into a spreadsheet easily. I vibecoded a command line tool that's made it a lot easier.

* When AWS RDS costs spiked one month, I set Claude Code to investigate and it found the reason was a misconfigured backup setting

* I'll use Claude to throw together a bunch of visualizations for some data to help me investigate

* I'll often give Claude the type signature for a function, and ask it to write the function. It generally gets this about 85% right

sauwan · 2026-01-20T20:14:39 1768940079

>My wife and I had a really complicated spreadsheet for tracking how much we owed our babysitter – it was just complex enough to not really fit into a spreadsheet easily. I vibecoded a command line tool that's made it a lot easier.

Ok, please help me understand. Or is this more of a nanny?

SatvikBeri · 2026-01-21T01:27:58 1768958878

Not technically a nanny, but not dissimilar. In this case, they do several types of work (house cleaning, watching 1-3 kids, daytime and overnights, taking kids out.) They are very competent – by far the best we've found in 3 years – and charge different rates for the different types of work. We also need to track mileage etc. for reimbursement.

They had a spreadsheet for tracking but I found it moderately annoying – it was taking 5-10 minutes a week, so normally I wouldn't have bothered to write a different tool, but with vibe coding it was fairly trivial.

abrookewood · 2026-01-21T02:06:26 1768961186

How did you give Clause access to AWS?

mickeyr · 2026-01-21T02:22:18 1768962138

It does ok with using the AWS cli

SatvikBeri · 2026-01-21T02:40:26 1768963226

Just awscli

mrdependable · 2026-01-20T17:44:51 1768931091

Why is your babysitting bill so complicated?

SatvikBeri · 2026-01-20T18:14:55 1768932895

There are several different types of work they can do, each one of which has a different hourly rate. The time of day affects the rate as well, and so can things like overtime.

It's definitely a bit of an unusual situation. It's not extremely complicated, but it was enough to be annoying.

whackernews · 2026-01-20T23:33:11 1768951991

Jesus, are you ok? Can’t you just, like, give em a 20 when you get home?

I find it quite funny you’ve invented this overly complex payment structure for your babysitter and then find it annoying. Now you’ve got a CLI tool for it.

mcpeepants · 2026-01-21T00:09:53 1768954193

why assume the billing model is being imposed by the customer rather than the service provider?

irlnanny · 2026-01-21T00:23:47 1768955027

GP has provided an anecdote with no supporting evidence, nor any code examples. So it is as fair to assume the story is a fabrication as much as it is to assume it has any truth to it

SatvikBeri · 2026-01-21T01:29:48 1768958988

I am really shocked at the response this trivial anecdote has gotten.

I could state it much more generically: we had an annoying Excel sheet that took ~10 minutes a week, I vibe coded a command line tool that brought it down to ~1 minute a week. I don't think this is unusual or hard to believe in any way.

garciasn · 2026-01-21T01:24:49 1768958689

Yes! You should absolutely always assume a random stranger on HN is outright lying about a trivial anecdote to farm meaningless karma.

fn-mote · 2026-01-21T01:57:00 1768960620

Or instigating conflict?