A few random notes from Claude coding quite a bit last few weeks

daxfohl · 2026-01-27T18:59:08 1769540348

I worry about the "brain atrophy" part, as I've felt this too. And not just atrophy, but even moreso I think it's evolving into "complacency".

Like there have been multiple times now where I wanted the code to look a certain way, but it kept pulling back to the way it wanted to do things. Like if I had stated certain design goals recently it would adhere to them, but after a few iterations it would forget again and go back to its original approach, or mix the two, or whatever. Eventually it was easier just to quit fighting it and let it do things the way it wanted.

What I've seen is that after the initial dopamine rush of being able to do things that would have taken much longer manually, a few iterations of this kind of interaction has slowly led to a disillusionment of the whole project, as AI keeps pushing it in a direction I didn't want.

I think this is especially true if you're trying to experiment with new approaches to things. LLMs are, by definition, biased by what was in their training data. You can shock them out of it momentarily, whish is awesome for a few rounds, but over time the gravitational pull of what's already in their latent space becomes inescapable. (I picture it as working like a giant Sierpinski triangle).

I want to say the end result is very akin to doom scrolling. Doom tabbing? It's like, yeah I could be more creative with just a tad more effort, but the AI is already running and the bar to seeing what the AI will do next is so low, so....

striking · 2026-01-27T21:17:53 1769548673

It's not just brain atrophy, I think. I think part of it is that we're actively making a tradeoff to focus on learning how to use the model rather than learning how to use our own brains and work with each other.

This would be fine if not for one thing: the meta-skill of learning to use the LLM depreciates too. Today's LLM is gonna go away someday, the way you have to use it will change. You will be on a forever treadmill, always learning the vagaries of using the new shiny model (and paying for the privilege!)

I'm not going to make myself dependent, let myself atrophy, run on a treadmill forever, for something I happen to rent and can't keep. If I wanted a cheap high that I didn't mind being dependent on, there's more fun ones out there.

raducu · 2026-01-28T07:39:48 1769585988

> let myself atrophy, run on a treadmill forever, for something

You're lucky to afford the luxury not to atrophy.

It's been almost 4 years since my last software job interview and I know the drills about preparing for one.

Long before LLMs my skills naturally atrophy in my day job.

I remember the good old days of J2ME of writing everything from scratch. Or writing some graph editor for universiry, or some speculative, huffman coding algorithm.

That kept me sharp.

But today I feel like I'm living in that netflix series about people being in Hell and the Devil tricking them they're in Heaven and tormenting them: how on planet Earth do I keep sharp with java, streams, virtual threads, rxjava, tuning the jvm, react, kafka, kafka streams, aws, k8s, helm, jenkins pipelines, CI-CD, ECR, istio issues, in-house service discovery, hierarchical multi-regions, metrics and monitoring, autoscaling, spot instances and multi-arch images, multi-az, reliable and scalable yet as cheap as possible, yet as cloud native as possible, hazelcast and distributed systems, low level postgresql performance tuning, apache iceberg, trino, various in-house frameworks and idioms over all of this? Oh, and let's not forget the business domain, coding standards, code reviews, mentorships and organazing technical events. Also, it's 2026 so nobody hires QA or scrum masters anymore so take on those hats as well.

So LLMs it is, the new reality.

aftergibson · 2026-01-28T08:09:39 1769587779

This is a very good point. Years ago working in a LAMP stack, the term LAMP could fully describe your software engineering, database setup and infrastructure. I shudder to think of the acronyms for today's tech stacks.

daxfohl · 2026-01-27T22:06:41 1769551601

Businesses too. For two years it's been "throw everything into AI." But now that shit is getting real, are they really feeling so coy about letting AI run ahead of their engineering team's ability to manage it? How long will it be until we start seeing outages that just don't get resolved because the engineers have lost the plot?

scorpioxy · 2026-01-28T03:49:08 1769572148

From what I am seeing, no one is feeling coy simply because of the cost savings that management is able to show the higher-ups and shareholders. At that level, there's very little understanding of anything technical and outages or bugs will simply get a "we've asked our technical resources to work on it". But every one understands that spending $50 when you were spending $100 is a great achievement. That's if you stop and not think about any downsides. Said management will then take the bonuses and disappear before the explosions start with their resume glowing about all the cost savings and team leadership achievements. I've experienced this first hand very recently.

daxfohl · 2026-01-28T04:01:41 1769572901

Of all the looming tipping points whereby humans could destroy the fabric of their existence, this one has to be the stupidest. And therefore the most likely.

throwup238 · 2026-01-28T00:53:22 1769561602

How long until “the LLM did it it” is just as effective as “AWS is down, not my fault”?

rurp · 2026-01-28T04:45:20 1769575520

I have deliberately moderated my use of AI in large part for this reason. For a solid two years now I've been constantly seeing claims of "this model/IDE/Agent/approach/etc is the future of writing code! It makes me 50x more productive, and will do the same for you!" And inevitabely those have all fallen by the wayside and been replaced by some new shiny thing. As someone who doesn't get intrinsic joy out of chasing the latest tech fad I usually move along and wait to see if whatever is being hyped really starts to take over the world.

This isn't to say LLMs won't change software development forever, I think they will. But I doubt anyone has any idea what kind of tools and approaches everyone will be using 5 or 10 years from now, except that I really doubt it will be whatever is being hyped up at this exact moment.

locknitpicker · 2026-01-28T07:30:52 1769585452

> It's not just brain atrophy, I think. I think part of it is that we're actively making a tradeoff to focus on learning how to use the model rather than learning how to use our own brains and work with each other.

I agree with the sentiment but I would have framed it differently. The LLM is a tool, just like code completion or a code generator. Right now we focus mainly on how to use a tool, the coding agent, to achieve a goal. This takes place at a strategic level. Prior to the inception of LLMs, we focused mainly on how to write code to achieve a goal. This took place at a tactical level, and required making decisions and paying attention to a multitude of details. With LLMs our focus shifts to a higher-level abstraction. Also, operational concerns change. When writing and maintaining code yourself, you focus on architectures that help you simplify some classes of changes. When using LLMs, your focus shifts to building context and aiding the model effectively implement their changes. The two goals seem related, but are radically different.

I think a fairer description is that with LLMs we stop exercising some skills that are only required or relevant if you are writing your code yourself. It's like driving with an automatic transmission vs manual transmission.

bandrami · 2026-01-28T07:50:11 1769586611

Previous tools have been deterministic and understandable. I write code with emacs and can at any point look at the source and tell you why it did what it did. But I could produce the same program with vi or vscode or whatever, at the cost of some frustration. But they all ultimately transform keystrokes to a text file in largely the same way, and the compiler I'm targeting changes that to asm and thence to binary in a predictable and visible way.

An LLM is always going to be a black box that is neither predictable nor visible (the unpredictability is necessary for how the tool functions; the invisibility is not but seems too late to fix now). So teams start cargo culting ways to deal with specific LLMs' idiosyncrasies and your domain knowledge becomes about a specific product that someone else has control over. It's like learning a specific office suite or whatever.

TeMPOraL · 2026-01-28T08:58:37 1769590717

> An LLM is always going to be a black box that is neither predictable nor visible (the unpredictability is necessary for how the tool functions; the invisibility is not but seems too late to fix now)

So basically, like a co-worker.

That's why I keep insisting that anthropomorphising LLMs is to be embraced, not avoided, because it gives much better high-level, first-order intuition as to where they belong in a larger computing system, and where they shouldn't be put.

bandrami · 2026-01-28T09:07:00 1769591220

> So basically, like a co-worker.

Arguably, though I don't particularly need another co-worker. Also co-workers are not tools (except sometimes in the derogatory sense).

nemothekid · 2026-01-27T23:13:21 1769555601

I think I should write more about but I have been feeling very similar. I've been recently exploring using claude code/codex recently as the "default", so I've decided to implement a side project.

My gripe with AI tools in the past is that the kind of work I do is large and complex and with previous models it just wasn't efficient to either provide enough context or deal with context rot when working on a large application - especially when that application doesn't have a million examples online.

I've been trying to implement a multiplayer game with server authoritative networking in Rust with Bevy. I specifically chose Bevy as the latest version was after Claude's cut off, it had a number of breaking changes, and there aren't a lot of deep examples online.

Overall it's going well, but one downside is that I don't really understand the code "in my bones". If you told me tomorrow that I had optimize latency or if there was a 1 in 100 edge case, not only would I not know where to look, I don't think I could tell you how the game engine works.

In the past, I could not have ever gotten this far without really understanding my tools. Today, I have a semi functional game and, truth be told, I don't even know what an ECS is and what advantages it provides. I really consider this a huge problem: if I had to maintain this in production, if there was a SEV0 bug, am I confident enough I could fix it? Or am I confident the model could figure it out? Or is the model good enough that it could scan the entire code base and intuit a solution? One of these three questions have to be answered or else brain atrophy is a real risk.

bedrio · 2026-01-28T05:02:53 1769576573

I'm worried about that too. If the error is reproducible, the model can eventually figure it out from experience. But a ghost bug that I can't pattern? The model ends up in a "you're absolutely right" loop as it incorrectly guesses different solutions.

mattmanser · 2026-01-28T07:43:13 1769586193

Are ghost bugs even real?

My first job had the Devs working front-line support years ago. Due to that, I learnt an important lessons in bug fixing.

Always be able to re-create the bug first.

There are no such thing as ghost bugs, you just need to ask the reporter the right questions.

Unless your code is multi-threaded, to which I say, good luck!

SpicyLemonZest · 2026-01-28T07:55:09 1769586909

Historically I would have agreed with you. But since the rise of LLM-assisted coding, I've encountered an increasing number of things I'd call clear "ghost bugs" in single threaded code. I found a fun one today where invoking a process four times with a very specific access pattern would cause a key result of the second invocation to be overwritten. (It is not a coincidence, I don't think, that these are exactly the kind of bugs a genAI-as-a-service provider might never notice in production.)

mh2266 · 2026-01-28T03:00:05 1769569205

> I've been trying to implement a multiplayer game with server authoritative networking in Rust with Bevy. I specifically chose Bevy as the latest version was after Claude's cut off, it had a number of breaking changes, and there aren't a lot of deep examples online.

I am interested in doing something similar (Bevy. not multiplayer).

I had the thought that you ought be able to provide a cargo doc or rust-analyzer equivalent over MCP? This... must exist?

I'm also curious how you test if the game is, um... fun? Maybe it doesn't apply so much for a multiplayer game, I'm thinking of stuff like the enemy patterns and timings in a soulslike, Zelda, etc.

I did use ChatGPT to get some rendering code for a retro RCT/SimCity-style terrain mesh in Bevy and it basically worked, though several times I had to tell it "yeah uh nothing shows up", at which point is said "of course! the problem is..." and then I learned about mesh winding, fine, okay... felt like I was in over my head and decided to go to a 2D game instead so didn't pursue that further.

nemothekid · 2026-01-28T04:39:41 1769575181

>I had the thought that you ought be able to provide a cargo doc or rust-analyzer equivalent over MCP? This... must exist?

I've found that there are two issues that arise that I'm not sure how to solve. You can give it docs and point to it and it can generally figure out syntax, but the next issue I see is that without examples, it kind of just brute forces problems like a 14 year old.

For example, the input system originally just let you move left and right, and it popped it into an observer function. As I added more and more controls, it began to litter with more and more code, until it was ~600 line function responsible for a large chunk of game logic.

While trying to parse it I then had it refactor the code - but I don't know if the current code is idiomatic. What would be the cargo doc or rust-analyzer equivalent for good architecture?

Im running into this same problem when trying to claude code for internal projects. Some parts of the codebase just have really intuitive internal frameworks and claude code can rip through them and provide great idiomatic code. Others are bogged down by years of tech debt and performance hacks and claude code can't be trusted with anything other than multi-paragraph prompts.

>I'm also curious how you test if the game is, um... fun?

Lucky enough for me this is a learning exercise, so I'm not optimizing for fun. I guess you could ask claude code to inject more fun.

InfinityByTen · 2026-01-28T09:29:06 1769592546

I find the atrophy and zoning out or context switching problematic, because it takes a few seconds/ minutes in "thinking" and then BAM! I have 500 lines of all sorts of buggy and problematic code to review and get a sycophantic, not-enough-mature entity to correct.

At some point, I find myself needing to disconnect out of overwhelm and frustration. Faster responses isn't necessarily better. I want more observability in the development process so that I can be a party to it. I really have felt that I need to orchestrate multiple agents working in tandem, playing sort of a bad-cop, good-cop and a maybe a third trying to moderate that discussion and get a fourth to effectively incorporate a human in the mix. But that's too much to integrate in my day job.

overfeed · 2026-01-28T03:27:21 1769570841

> Eventually it was easier just to quit fighting it and let it do things the way it wanted.

I wouldn't have believed it a few tears ago if you told me the industry would one day, in lockstep, decide that shipping more tech-debt is awesome. If the unstated bet doesn't pay off, that is, AI development will outpace the rate it generates cruft, then there will be hell to pay.

ithkuil · 2026-01-28T04:39:28 1769575168

Don't worry. This will create the demand for even more powerful models that are able to untangle the mess created by previous models.

Once we realize the kind of mess _those_ models created, well, we'll need even more capable models.

It's a variation on the theme of Kernighan insight about the more "clever" you are while coding the harder it will be to debug.

EDIT: Simplicity is a way out but it's hard under normal circumstances, now with this kind of pressure to ship fast because the colleague with the AI chimp can outperform you, aiming at simplicity will require some widespread understanding

bandrami · 2026-01-28T08:51:32 1769590292

"That's the brilliant part: when the winter comes the apes freeze to death!"

scorpioxy · 2026-01-28T03:41:40 1769571700

As someone who's been commissioned many times before to work on or salvage "rescue projects" with huge amounts of tech debt, I welcome that day. Still not there yet though I am starting to feel the vibes shifting.

This isn't anything new of course. Previously it was with projects built by looking for the cheapest bidder and letting them loose on an ill-defined problem. And you can just imagine what kind of code that produced. Except the scale is much larger.

My favorite example of this was a project that simply stopped working due to the amount of bugs generated from layers upon layers of bad code that was never addressed. That took around 2 years of work to undo. Roughly 6 months to un-break all the functionality and 6 more months to clean up the core and then start building on top.

sally_glance · 2026-01-28T07:09:37 1769584177

Are you not worried that the sibling comment is right and the solution to this will be "more AI" in the future? So instead of hiring a team of human experts to cleanup, management might just dump more money into some specialized AI refactoring platform or hire a single AI coordinator... Or maybe they skip to rebuild using AI faster, because AI is good at greenfield. Then they only need a specialized migration AI to automate the regular switchovers.

I used to be unconcerned, but I admit to be a little frightened of the future now.

scorpioxy · 2026-01-28T09:39:04 1769593144

Well, in general worrying about the future is not useful. Regardless of what you think, it is always uncertain. I specifically stay away from taking part in such speculative threads here on HN.

What's interesting to me though is that very similar promises were being made about AI in the 80s. Then came the "AI Winter" after the hype cycle and promises got very far from reality. Generative AI is the current cycle and who knows, maybe it can fulfill all the promises and hype. Or maybe not.

There's a lot of irrationality currently and until that settles down, it is difficult to see what is real and useful and what is smoke and mirrors.

daxfohl · 2026-01-28T03:50:15 1769572215

> unstated bet

(except where it's been stated, championed, enforced, and ultimated in no unequivocal terms by every executive in the tech industry)

overfeed · 2026-01-28T05:00:51 1769576451

I'm yet to encounter an AI-bull who admits the LLM tendency towards creating tech debt- outside of footnotes stating it can be fixed by better prompting (with no examples), or solved by whatever tool they are selling

TeMPOraL · 2026-01-28T08:05:30 1769587530

The industry decided that decades ago. We may like to talk about quality and forethought, but when you actually go to work, you quickly discover it doesn't matter. Small companies tell you "we gotta go fast", large companies demand clear OKRs and focusing on actually delivering impact - either way, no one cares about tech debt, because they see it as unavoidable fact of life. Even more so now, as ZIRP went away and no one can afford to pay devs to polish the turd ad infinitum. The mantra is, ship it and do the next thing, clean up the old thing if it ever becomes a problem.

And guess what, I'm finally convinced they're right.

Consider: it's been that way for decades. We may tell ourselves good developers write quality code given the chance, but the truth is, the median programmer is a junior with <5 years of experience, and they cannot write quality code to save their life. That's purely the consequence of rapid growth of software industry itself. ~all production code in the past few decades was written by juniors, it continues to be so today; those who advance to senior level end up mostly tutoring new juniors instead of coding.

Or, all that put another way: tech debt is not wrong. It's a tool, a trade-off. It's perfectly fine to be loaded with it, if taking it lets you move forward and earn enough to afford paying installments when they're due. Like with housing: you're better off buying it with lump payment, or off savings in treasury bonds, but few have that money on hand and life is finite, so people just get a mortgage and move on.

--

Edited to add: There's a silver lining, though. LLMs make tech debt legible and quantifiable.

LLMs are affected by tech debt even more than human devs are, because (currently) they're dumber, they have less cognitive capability around abstractions and generalizations[0]. They make up for it by working much faster - which is a curse in terms of amplifying tech debt, but also a blessing, because you can literally see them slowing down.

Developer productivity is hard to measure in large part because the process is invisible (happens in people's heads and notes), and cause-and-effect chains play out over weeks or months. LLM agents compress that to hours to days, and the process itself is laid bare in the chat transcript, easy to inspect and analyze.

The way I see it, LLMs will finally allow us to turn software development at tactical level from art into an engineering process. Though it might be too late for it to be of any use to human devs.

--

[0] - At least the out-of-distribution ones - quirks unique to particular codebase and people behind it.

CharlieDigital · 2026-01-28T00:36:22 1769560582

I ran into a new problem today: "reading atrophy".

As in if the LLM doesn't know about it, some devs are basically giving up and not even going to RTFM. I literally had to explain to someone today how something works by...reading through the docs and linking them the docs with screenshots and highlighted paragraphs of text.

Still got push back along the lines of "not sure if this will work". It's. Literally. In. The. Docs.

finaard · 2026-01-28T07:14:18 1769584458

That's not really a new thing now, it just shows differently.

15 years ago I was working in an environment where they had lots of Indians as cheap labour - and the same thing will show up in any environment where you go for hiring a mass of cheap people while looking more at the cost than at qualifications: You pretty much need to trick them into reading stuff that are relevant.

I remember one case where one had a problem they couldn't solve, and couldn't give me enough info to help remotely. In the end I was sitting next to them, and made them read anything showing up on the screen out loud. Took a few tries where they were just closing dialog boxes without reading it, but eventually we had that under control enough that they were able to read the error messages to me, and then went "Oh, so _that's_ the problem?!"

Overall interacting with a LLM feels a lot like interacting with one of them back then, even down to the same excuses ("I didn't break anything in that commit, that test case was never passing") - and my expectation for what I can get out of it is pretty much the same as back then, and approach to interacting with it is pretty similar. It's pretty much an even cheaper unskilled developer, you just need to treat it as such. And you don't pair it up with other unskilled developers.

globular-toast · 2026-01-28T07:43:59 1769586239

The mere existence of the phrase "RTFM" shows that this phenomenon was already a thing. LLMs are the worst thing to happen to people who couldn't read before. When HR type people ask what my "superpower" is I'm so tempted to say "I can read", because I honestly feel like it's the only difference between me and people who suck at working independently.

krupan · 2026-01-27T23:04:04 1769555044

I've been thinking along these lines. LLMs seem to have arrived right when we were all getting addicted to reels/tic tocks/whatever. For some reason we love to swipe, swipe, swipe, until we get something funny/interesting/shocking, that gives us a short-lasting dopamine hit (or whatever chemicals it is) that feels good for about 1 second, and we want MORE, so we keep swiping.

Using an LLM is almost exactly the same. You get the occasional, "wow! I've never seen it do that before!" moments (whether that thing it just did was even useful or not), get a short hit of feel goods, and then we keep using it trying to get another hit. It keeps providing them at just the right intervals for people to keep them going just like they do with tick tock

gritspants · 2026-01-27T22:04:44 1769551484

My disillusionment comes from the feeling I am just cosplaying my job. There is nothing to distinguish one cosplayer from another. I am just doordashing software, at this point, and I'm not in control.

solumunus · 2026-01-28T06:53:34 1769583214

I don’t get this at all. I’m using LLM’s all day and I’m constantly having to make smart architectural choices that other less experienced devs won’t be making. Are you just prompting and going with whatever the initial output is, letting the LLM make decisions? Every moderately sized task should start with a plan, I can spend hours planning, going off and thinking, coming back to the plan and adding/changing things, etc. Sometimes it will be days before I tell the LLM to “go”. I’m also constantly optimising the context available to the LLM, and making more specific skills to improve results. It’s very clear to me that knowledge and effort is still crucial to good long term output… Not everyone will get the same results, in fact everyone is NOT getting the same results, you can see this by reading the wildly different feedback on HN. To some LLM’s are a force multiplier while others claim they can’t get a single piece of decent output…

I think the way you’re using these tools that makes you feel this way is a choice. You’re choosing to not be in control and do as little as possible.

rustyhancock · 2026-01-28T08:56:49 1769590609

One challenge is, are those decisions making tangible differences?

We won't know until the code being produced especially greenfields hits any kind of maturity 5 years+ atleast?

sosomoxie · 2026-01-28T00:53:34 1769561614

I've gone years without coding and when I come back to it, it's like riding a bike! In each iteration of my coding career, I have become a better developer, even after a large gap. Now I can "code" during my gap. Were I ever to hand-code again, I'm sure my skills would be there. They don't atrophy, like your ability to ride a bike doesn't atrophy. Yes you may need to warm back up, but all the connections in your brain are still there.

Ronsenshi · 2026-01-28T07:45:22 1769586322

You might still have the skillset to write code, but depending on length of the break your knowledge of tools, frameworks, patterns would be fairly outdated.

I used to know a person like that - high in the company structure who would claim he was a great engineer, but all the actual engineers would make jokes about him and his ancient skills during private conversations.

withinboredom · 2026-01-28T08:52:32 1769590352

I’d push back on this framing a bit. There's a subtle ageism baked into the assumption that someone who stepped away from day-to-day coding has "ancient skills" worth mocking.

Yes, specific frameworks and tooling knowledge atrophy without use, and that’s true for anyone at any career stage. A developer who spent three years exclusively in React would be rusty on backend patterns too. But you’re conflating current tool familiarity with engineering ability, and those are different things.

The fundamentals: system design, debugging methodology, reading and reasoning about unfamiliar code, understanding tradeoffs ... those transfer. Someone with deep experience often ramps up on new stacks faster than you’d expect, precisely because they’ve seen the same patterns repackaged multiple times.

If the person you’re describing was genuinely overconfident about skills they hadn’t maintained, that’s a fair critique. But "the actual engineers making jokes about his ancient skills" sounds less like a measured assessment and more like the kind of dismissiveness that writes off experienced people before seeing what they can actually do.

Worth asking: were people laughing because he was genuinely incompetent, or because he didn’t know the hot framework of the moment? Because those are very different things.

Ronsenshi · 2026-01-28T09:08:25 1769591305

This has nothing to do about ageism. This applies to any person of any age who has ego big enough to think that their knowledge of industry is relevant after they take prolonged break and be socially inept enough to brag about how they are still "in".

I don't disagree with your point about fundamentals, but in an industry where there seems to be new JS framework any time somebody sneezes - latest tools are very much relevant too. And of course the big thing is language changes. The events I'm describing happened in the late 00s-early 10s. When language updates picked up steam: Python, JS, PHP, C++. Somebody who used C++ 98 can't claim to have up to date knowledge in C++ in 2015.

So to answer your question - people were laughing at his ego, not the fact that he didn't know some hot new framework.

runarberg · 2026-01-28T05:55:32 1769579732

Have you ever learnt a foreign language (say Mongolian, or Danish) and then never spoken it, nor even read anything in it for over 10 years? It is not like riding a bike, it doesn’t just come back like that. You have to actually relearn the language, practice it, and you will suck at it for months. Comprehension comes first (within weeks) but you will be speaking with grammatical errors, mispronunciations, etc. for much longer. You won‘t have to learn the language from scratch, second time around is much easier, but you will have to put in the effort. And if you use google translate instead of your brain, you won‘t relearn the language at all. You will simply forget it.

tayo42 · 2026-01-28T06:41:23 1769582483

Anecdotally, i burned out pretty hard and basically didn't open a text editor for half a year (unemployed too). Eventually i got an itch to write code again and it didn't really feel like I was really worse. Maybe it wasn't long enough atrophy but code doesn't seem to quite work like language though ime.

Ronsenshi · 2026-01-28T07:51:38 1769586698

Six months is definitely not long enough of a break for skills to degrade. But it's not just skills, as I wrote in another comment, the biggest thing is knowledge of new tools, new versions of language and its features.

I'd say there's at most around 2 years of knowledge runtime (maybe with all this AI stuff this is even shorter). After that period if you don't keep your knowledge up to date it fairly quickly becomes obsolete.

amluto · 2026-01-28T04:35:58 1769574958

I’ve actually found the tool that inspires the most worry about brain atrophy to be Copilot. Vscode is full of flashing suggestions all over. A couple days ago, I wanted to write a very quick program, and it was basically impossible to write any of it without Copilot suggesting a whole series of ways to do what it thought I was doing. And it seems that MS wants this: the obvious control to turn it off is actually just “snooze.”

I found the setting and turned it off for real. Good riddance. I’ll use the hotkey on occasion.

seer · 2026-01-28T03:18:43 1769570323

Honestly, this seems very much like the jump from being an individual contributor to being an engineering manager.

The time it happened for me was rather abrupt, with no training in between, and the feeling was eerily similar.

You know _exactly_ why the best solution is, you talk to your reports, but they have minds of their own, as well as egos, and they do things … their own way.

At some point I stopped obsessing with details and was just giving guidance and direction only in the cases where it really mattered, or when asked, but let people make their own mistakes.

Now LLMs don’t really learn on their own or anything, but the feeling of “letting go of small trivial things” is sorta similar. You concentrate on the bigger picture, and if it chose to do an iterative for loop instead of using a functional approach the way you like it … well the tests still pass, don’t they.

Ronsenshi · 2026-01-28T07:55:57 1769586957

The only issue is that as an engineering manager you reasonably expect that the team learns new things, improve their skills, in general grow as engineers. With AI and its context handling you're working with a team where each member has severe brain damage that affects their ability to form long term memories. You can rewire their brain to a degree teaching them new "skills" or giving them new tools, but they still don't actually learn from their mistakes or their experiences.

freediver · 2026-01-27T22:56:47 1769554607

My experience is the opposite - I haven't used my brain more in a while.. Typing characters was never what developers were valued for anyway. The joy of building is back too.

swader999 · 2026-01-27T23:07:46 1769555266

Same. I feel I need to be way more into the domain and what the user is trying to do than ever before.

zamalek · 2026-01-27T23:55:30 1769558130

> I worry about the "brain atrophy" part, as I've felt this too. And not just atrophy, but even moreso I think it's evolving into "complacency".

Not trusting the ML's output is step one here, that keeps you intellectually involved - but it's still a far cry from solving the majority of problems yourself (instead you only solve problems ML did a poor job at).

Step two: I delineate interesting and uninteresting work, and Claude becomes a pair programmer without keyboard access for the latter - I bounce ideas off of it etc. making it an intelligent rubber duck. [Edit to clarify, a caveat is that] I do not bore myself with trivialities such as retrieving a customer from the DB in a REST call (but again, I do verify the output).

bandrami · 2026-01-28T08:54:20 1769590460

> I do not bore myself with trivialities such as retrieving a customer from the DB in a REST call

Genuine question, why isn't your ORM doing that? I see a lot of use cases for LLMs that seem to be more expensive ways to do snippets and frameworks...

epolanski · 2026-01-27T23:30:25 1769556625

> Like if I had stated certain design goals recently it would adhere to them, but after a few iterations it would forget again and go back to its original approach, or mix the two, or whatever.

Context management, proper prompting and clear instructions, proper documentation are still relevant.

polytely · 2026-01-27T23:57:20 1769558240

I feel like I'm still a couple steps behind in skill level as my lead and is trying to gain more experience I do wonder if I am shooting myself in the foot if I rely too much on AI at this stage. The senior engineer I'm trying to learn from can very effectively use ai because he has very good judgement of code quality, I feel like if I use AI too much I might lose out on chance to improve my judgement. It's a hard dilemma.

mupuff1234 · 2026-01-28T05:56:26 1769579786

He didn't say "brain atrophy", he was talking about coding abilities.

Imustaskforhelp · 2026-01-27T19:33:18 1769542398

> I want to say it's very akin to doom scrolling. Doom tabbing? It's like, yeah I could be more creative with just a tad more effort, but the AI is already running and the bar to seeing what the AI will do next is so low, so....

Yea exactly, Like we are just waiting so that it gets completed and after it gets completed then what? We ask it to do new things again.

Just as how if we are doom scrolling, we watch something for a minute then scroll down and watch something new again.

The whole notion of progress feels completely fake with this. Somehow I guess I was in a bubble of time where I had always end up using AI in web browsers (just as when chatgpt 3 came) and my workflow didn't change because it was free but recently changed it when some new free services dropped.

"Doom-tabbing" or complete out of the loop AI agentic programming just feels really weird to me sucking the joy & I wouldn't even consider myself a guy particular interested in writing code as I had been using AI to write code for a long time.

I think the problem for me was that I always considered myself a computer tinker before coder. So when AI came for coding, my tinkering skills were given a boost (I could make projects of curiosity I couldn't earlier) but now with AI agents in this autonomous esque way, it has come for my tinkering & I do feel replaced or just feel like my ability of tinkering and my interests and my knowledge and my experience is just not taken up into account if AI agent will write the whole code in multi file structure, run commands and then deploy it straight to a website.

I mean my point is tinkering was an active hobby, now its becoming a passive hobby, doom-tinkering? I feel like I have caught up on the feeling a bit earlier with just vibe from my heart but is it just me who feels this or?

What could be a name for what I feel?

stuaxo · 2026-01-27T21:30:38 1769549438

LLMs have some terrible patterns, don't know what do ? Just chuck a class named Service in.

Have to really look out for the crap.

atonse · 2026-01-27T02:43:21 1769481801

> LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building.

I’ve always said I’m a builder even though I’ve also enjoyed programming (but for an outcome, never for the sake of the code)

This perfectly sums up what I’ve been observing between people like me (builders) who are ecstatic about this new world and programmers who talk about the craft of programming, sometimes butting heads.

One viewpoint isn’t necessarily more valid, just a difference of wiring.

ryandrake · 2026-01-27T18:28:04 1769538484

I noticed the same thing, but wasn't able to put it into words before reading that. Been experimenting with LLM-based coding just so I can understand it and talk intelligently about it (instead of just being that grouchy curmudgeon), and the thought in the back of my mind while using Claude Code is always:

"I got into programming because I like programming, not whatever this is..."

Yes, I'm building stupid things faster, but I didn't get into programming because I wanted to build tons of things. I got into it for the thrill of defining a problem in terms of data structures and instructions a computer could understand, entering those instructions into the computer, and then watching victoriously while those instructions were executed.

If I was intellectually excited about telling something to do this for me, I'd have gotten into management.

viccis · 2026-01-27T22:48:19 1769554099

Same. This kind of coding feels like it got rid of the building aspect of programming that always felt nice, and it replaced it entirely with business logic concerns, product requirements, code reviews, etc. All the stuff I can generally take or leave. It's like I'm always in a meeting.

>If I was intellectually excited about telling something to do this for me, I'd have gotten into management.

Exactly this. This is the simplest and tersest way of explaining it yet.

taytus · 2026-01-28T09:56:27 1769594187

Because you are not coding, you are building. I've been coding since I was 7 years old, now I'm building.

nunez · 2026-01-27T23:35:29 1769556929

Same same. Writing the actual code is always a huge motivator behind my side projects. Yes, producing the outcome is important, but the journey taken to get there is a lot of fun for me.

I used Claude Code to implement a OpenAI 4o-vision powered receipt scanning feature in an expense tracking tool I wrote by hand four years ago. It did it in two or three shots while taking my codebase into account.

It was very neat, and it works great [^0], but I can't latch onto the idea of writing code this way. Powering through bugs while implementing a new library or learning how to optimize my test suite in a new language is thrilling.

Unfortunately (for me), it's not hard at all to see how the "builders" that see code as a means to an end would LOVE this, and businesses want builders, not crafters.

In effect, knowing the fundamentals is getting devalued at a rate I've never seen before.

[^0] Before I used Claude to implement this feature, my workflow for processing receipts looked like this: Tap iOS Shortcut, enter the amount, snap a pic of the receipt, type up the merchant, amount and description for the expense, then have the shortcut POST that to my expenses tracking toolkit which, then, POSTs that into a Google Sheet. This feature amounted the need for me to enter the merchant and amount. Unfortunately, it often took more time to confirm that the merchant, amount and date details OpenAI provided were correct (and correct it when details were wrong, which was most of the the time) than it did to type out those details manually, so I just went back to my manual workflow. However, the temptation to just glance at the details and tap "This looks correct" was extremely high, even if the info it generated was completely wrong! It's the perfect analogue to what I've been witnessing throughout the rise of the LLMs.

polishdude20 · 2026-01-27T20:59:53 1769547593

What I have enjoyed about programming is being able to get the computer to do exactly what I want. The possibilities are bounded by only what I can conceive in my mind. I feel like with AI that can happen faster.

chrisjj · 2026-01-28T08:57:17 1769590637

Have you an example of getting a coding chatbot to do exactly what you want?

audience_mem · 2026-01-28T09:42:21 1769593341

Is this a joke? Are you genuinely implying that no one has ever got an LLM to write code that does exactly what they want?

testaccount28 · 2026-01-27T21:29:03 1769549343

> get the computer to do exactly what I want.

> with AI that can happen faster.

well, not exactly that.

polishdude20 · 2026-01-28T00:18:40 1769559520

For simple things it can. But then for more complex things that's where I step it

thepasch · 2026-01-28T08:07:06 1769587626

> I got into it for the thrill of defining a problem in terms of data structures and instructions a computer could understand, entering those instructions into the computer, and then watching victoriously while those instructions were executed.

You can still do that with Claude Code. In fact, Claude Code works best the more granular your instructions get.

chrisjj · 2026-01-28T09:00:21 1769590821

> Claude Code works best the more granular your instructions get.

So best feed it machine code?

smhinsey · 2026-01-28T03:38:13 1769571493

This gets at the heart of the quality of results issues a lot of people are talking about elsewhere here. Right now, if you treat them as a system where you can tell it what you want and it will do it for you, you're building a sandcastle. Instead of that, also describe the correct data structures and appropriate algorithms to use against them, as well as the particulars of how you want the problem solved, it's a different situation altogether. Like most systems, the quality of output is in some way determined by the quality of input.

There is a strange insistence on not helping the LLM arrive at the best outcome in the subtext to this question a lot of times. I feel like we are living through the John Henry legend in real time

atonse · 2026-01-27T20:54:14 1769547254

Funny you say that. Because I have never enjoyed management as much as being hands on and directly solving problems.

So maybe our common ground is that we are direct problem solvers. :-)

Ronsenshi · 2026-01-28T08:05:01 1769587501

For some reason this makes me think of a jigsaw puzzle. People usually complete these puzzles because they enjoy the process where on the end you get a picture that you can frame if you want to. Some people seem to want to get the resulting picture. No interest in process at all.

I guess that's the same people who went to all those coding camps during their hay day because they heard about software engineering salaries. They just want the money.

addisonj · 2026-01-27T18:39:17 1769539157

IMO, this isn't entirely a "new world" either, it is just a new domain where the conversation amplifies the opinions even more (weird how that is happening in a lot of places)

What I mean by that: you had compiled vs interpreted languages, you had types vs untyped, testing strategies, all that, at least in some part, was a conversation about the tradeoffs between moving fast/shipping and maintainability.

But it isn't just tech, it is also in methodologies and the words use, from "build fast and break things" and "yagni" to "design patterns" and "abstractions"

As you say, it is a different viewpoint... but my biggest concern with where are as industry is that these are not just "equally valid" viewpoints of how to build software... it is quite literally different stages of software, that, AFAICT, pretty much all successful software has to go through.

Much of my career has been spent in teams at companies with products that are undergoing the transition from "hip app built by scrappy team" to "profitable, reliable software" and it is painful. Going from something where you have 5 people who know all the ins and outs and can fix serious bugs or ship features in a few days to something that has easy clean boundaries to scale to 100 engineers of a wide range of familiarities with the tech, the problem domain, skill levels, and opinions is just really hard. I am not convinced yet that AI will solve the problem, and I am also unsure it doesn't risk making it worse (at least in the short term)

dpflan · 2026-01-27T20:46:37 1769546797

“””

Much of my career has been spent in teams at companies with products that are undergoing the transition from "hip app built by scrappy team" to "profitable, reliable software" and it is painful. Going from something where you have 5 people who know all the ins and outs and can fix serious bugs or ship features in a few days to something that has easy clean boundaries to scale to 100 engineers of a wide range of familiarities with the tech, the problem domain, skill levels, and opinions is just really hard. I am not convinced yet that AI will solve the problem, and I am also unsure it doesn't risk making it worse (at least in the short term)

“””

This perspective is crucial. Scale is the great equalizer / demoralizer, scale of the org and scale of the systems. Systems become complex quickly, and verifiability of correctness and function becomes harder. Companies that built from day with AI and have AI influencing them as they scale, where does complexity begin to run up against the limitations of AI and cause regression? Or if all goes well, amplification?

concats · 2026-01-28T08:33:17 1769589197

I remember leaving university going into my first engineering job, thinking "Where is all the engineering? All the problem solving and building complex system? All the math and science? Have I been demoted to a lowly programmer?"

Took me a few years to realize that this wasn't a universal feeling, and that many others found the programming tasks more fulfilling than any challenging engineering. I suppose this is merely another manifestation of the same phenomena.

coffeeaddict1 · 2026-01-27T19:47:52 1769543272

But how can you be a responsible builder if you don't have trust in the LLMs doing the "right thing"? Suppose you're the head of a software team where you've picked up the best candidates for a given project, in that scenario I can see how one is able to trust the team members to orchestrate the implementation of your ideas and intentions, with you not being intimately familiar with the details. Can we place the same trust in LLM agents? I'm not sure. Even if one could somehow prove that LLM are very reliable, the fact an AI agents aren't accountable beings renders the whole situation vastly different than the human equivalent.

handoflixue · 2026-01-28T07:31:25 1769585485

Trust but verify:

I test all of the code I produce via LLMs, usually doing fairly tight cycles. I also review the unit test coverage manually, so that I have a decent sense that it really is testing things - the goal is less perfect unit tests and more just quickly catching regressions. If I have a lot of complex workflows that need testing, I'll have it write unit tests and spell out the specific edge cases I'm worried about, or setup cheat codes I can invoke to test those workflows out in the UI/CLI.

Trust comes from using them often - you get a feeling for what a model is good and bad at, and what LLMs in general are good and bad at. Most of them are a bit of a mess when it comes to UI design, for instance, but they can throw together a perfectly serviceable "About This" HTML page. Any long-form text they write (such as that About page) is probably trash, but that's super-easy to edit manually. You can often just edit down what they write: they're actually decent writers, just very verbose and unfocused.

I find it similar to management: you have to learn how each employee works. Unless you're in the Top 1%, you can't rely on every employee giving 110% and always producing perfect PRs. Bugs happen, and even NASA-strictness doesn't bring that down to zero.

And just like management, some models are going to be the wrong employee for you because they think your style guide is stupid and keep writing code how they think it should be written.

inerte · 2026-01-27T19:57:41 1769543861

You don't simply put a body in a seat and get software. There are entire systems enabling this trust: college, resume, samples, referral, interviews, tests and CI, monitoring, mentoring, and performance feedback.

And accountability can still exist? Is the engineer that created or reviewed a Pull Request using Claude Code less accountable then one that used PICO?

coffeeaddict1 · 2026-01-27T20:14:50 1769544890

> And accountability can still exist? Is the engineer that created or reviewed a Pull Request using Claude Code less accountable then one that used PICO?

The point is that in the human scenario, you can hold the human agents accountable. You cannot do that with AI. Of course, you as the orchestrator of agents will be accountable to someone, but you won't have the benefit of holding your "subordinates" accountable, which is what you do in a human team. IMO, this renders the whole situation vastly different (whether good or bad I'm not sure).

polishdude20 · 2026-01-27T21:51:06 1769550666

You can switch to another LLM provider or stop using them altogether. It's even easier than firing a developer.

ipaddr · 2026-01-27T22:31:25 1769553085

It is as easy as getting rid of Microsoft Teams at your org.

chrisjj · 2026-01-28T00:10:30 1769559030

Of course he is - because he invested so much less.

chrisjj · 2026-01-28T08:52:21 1769590341

> > LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building.

This is much less significant than the fact LLMs split engineers on those who primarily like quality v. those who primarily like speed.

asimovDev · 2026-01-28T08:35:49 1769589349

To me this is similar to car enthusiasms. Some people absolutely love to build their project car, it's a major part of the hobby for them. Others just love the experience of driving, so they buy ready cars or just pay someone to work on the car.

stevenhuang · 2026-01-28T08:59:46 1769590786

Alternatively, others just want to get to their destination.

senderista · 2026-01-27T22:06:58 1769551618

Maybe there's an intermediate category: people who like designing software? I personally find system design more engaging than coding (even though I enjoy coding as well). That's different from just producing an opaque artifact that seems to solve my problem.

mkozlows · 2026-01-27T18:47:07 1769539627

I think he's really getting at something there. I've been thinking about this a lot (in the context of trying to understand the persistent-on-HN skepticism about LLMs), and the framing I came up with[1] is top-down vs. bottom-up dev styles, aka architecting code and then filling in implementations, vs. writing code and having architecture evolve.

[1] https://www.klio.org/theory-of-llm-dev-skepticism/

jamauro · 2026-01-28T06:26:50 1769581610

I like this framing. Nice typography btw, a pleasure to read.

codyb · 2026-01-28T03:35:26 1769571326

I think there's a place for both.

We have services deployed globally serving millions of customers where rigor is really important.

And we have internal users who're building browser extensions with AI that provide valuable information about the interface they're looking at including links to the internal record management, and key metadata that's affecting content placement.

These tools could be handed out on Zip drives in the street and it would just show our users some of the metadata already being served up to them, but it's amazing to strip out 75% of the process of certain things and just have our user (in this case though, it's one user who is driving all of this, so it does take some technical inclination) build out these tools that save our editors so much time when doing this before would have been months and months and months of discovery and coordination and designs that probably wouldn't actually be as useful in the end after the wants of the user are diluted through 18 layers of process.

verdverm · 2026-01-27T18:31:31 1769538691

I think the division is more likely tied to writing. You have to fundamentally change how you do your job, from one of writing a formal language for a compiler to one of writing natural language for a junior-goldfish-memory-allstar-developer, closer to management then to contributor.

This distinction to me separates the two primary camps

jimbokun · 2026-01-27T18:22:18 1769538138

The new LLM centered workflow is really just a management job now.

Managers and project managers are valuable roles and have important skill sets. But there's really very little connection with the role of software development that used to exist.

It's a bit odd to me to include both of these roles under a single label of "builders", as they have so little in common.

EDIT: this goes into more detail about how coding (and soon other kinds of knowledge work) is just a management task now: https://www.oneusefulthing.org/p/management-as-ai-superpower...

simianwords · 2026-01-27T21:18:37 1769548717

i don't disagree. at some point LLM's might become good enough that we wouldn't need exact technical expertise.

slaymaker1907 · 2026-01-27T18:40:21 1769539221

I enjoy both and have ended up using AI a lot differently than vibe coders. I rarely use it for generating implementations, but I use it extensively for helping me understand docs/apis and more importantly, for debugging. AI saves me so much time trying to figure out why things aren’t working and in code review.

I deliberately avoid full vibe coding since I think doing so will rust my skills as a programmer. It also really doesn’t save much time in my experience. Once I have a design in mind, implementation is not the hard part.

globular-toast · 2026-01-28T07:50:27 1769586627

I like building, but I don't fool myself into thinking it can be done by taking shortcuts. You could build something that looks like a house for half the cost but it won't be structurally sound. That's why I care about the details. Someone has to.

monkaiju · 2026-01-28T02:38:09 1769567889

So far I haven't seen it actually be effective at "building" in a work context with any complexity, and this despite some on our team desperately trying to make that the case.

FeepingCreature · 2026-01-28T06:51:04 1769583064

I have! You have to be realistic about the projects. The more irreducible local context it needs, the less useful it will be. Great for greenfield code, oneshots, write once read once run for months.

barrell · 2026-01-28T03:36:21 1769571381

Agreed. I don’t care for engineering or coding, and would gladly give it up the moment I can. I’m also running a one man business where every hour counts (and where I’m responsible for maintaining every feature).

The fact of the matter is LLMs produce lower quality at higher volumes in more time than it would take to write it myself, and I’m a very mediocre engineer.

I find this seperation of “coding” vs “building” so offensive. It’s basically just saying some people are only concerned with “inputs”, while others with “outputs”. This kind of rhetoric is so toxic.

It’s like saying LLM art is separating people into people who like to scribble, and people who like to make art.

Imustaskforhelp · 2026-01-27T19:25:13 1769541913

> I enjoy both and have ended up using AI a lot differently than vibe coders. I rarely use it for generating implementations, but I use it extensively for helping me understand docs/apis and more importantly, for debugging. AI saves me so much time trying to figure out why things aren’t working and in code review.

I had felt like this and still do but man, at some point, I feel like the management churn feels real & I just feel suffering from a new problem.

Suppose, I actually end up having services literally deployed from a single prompt nothing else. Earlier I used to have AI write code but I was interested in the deployment and everything around it, now there are services which do that really neatly for you (I also really didn't give into the agent hype and mostly used browsers LLM)

Like on one hand you feel more free to build projects but the whole joy of project completely got reduced.

I mean, I guess I am one of the junior dev's so to me AI writing code on topics I didn't know/prototyping felt awesome.

I mean I was still involved in say copy pasting or looking at the code it generates. Seeing the errors and sometimes trying things out myself. If AI is doing all that too, idk

For some reason, recently I have been disinterested in AI. I have used it quite a lot for prototyping but I feel like this complete out of the loop programming just very off to me with recent services.

I also feel like there is this sense of if I buy for some AI thing, to maximally extract "value" out of it.

I guess the issue could be that I can have vague terms or have a very small text file as input (like just do X alternative in Y lang) and I am now unable to understand the architectural decisions and the overwhelmed-ness out of it.

Probably gonna take either spec-driven development where I clearly define the architecture or development where I saw something primagen do recently which is that the AI will only manipulate code of that particular function, (I am imagining it for a file as well) and somehow I feel like its something that I could enjoy more because right now it feels like I don't know what I have built at times.

When I prototype with single file projects using say browser for funsies/any idea. I get some idea of what the code kind of uses with its dependencies and functions names from start/end even if I didn't look at the middle

A bit of ramble I guess but the thing which kind of is making me feel this is that I was talking to somebody and shwocasing them some service where AI + server is there and they asked for something in a prompt and I wrote it. Then I let it do its job but I was also thinking how I would architect it (it was some detect food and then find BMR, and I was thinking first to use any api but then I thought that meh it might be hard, why not use AI vision models, okay what's the best, gemini seems good/cheap)

and I went to the coding thing to see what it did and it actually went even beyond by using the free tier of gemini (which I guess didn't end up working could be some rate limit of my own key but honestly it would've been the thing I would've tried too)

So like, I used to pride myself on the architectural decisions I make even if AI could write code faster but now that is taken away as well.

I really don't want to read AI code so much so honestly at this point, I might as well write code myself and learn hands on but I have a problem with build fast in public like attitude that I have & just not finding it fun.

I feel like I should do a more active job in my projects & I am really just figuring out what's the perfect way to use AI in such contexts & when to use how much.

Thoughts?

markb139 · 2026-01-28T08:15:56 1769588156

I retired from paid sw dev work in 2020 when COVID arrived. I’ve worked on my small projects since with all development by hand. I’d followed the rise of AI, but not used it. Late last year I started a project that included reverse engineering some firmware that runs on an Intel 8096 based embedded processor. I’d never worked on that processor before. There are tools available, but they cost many $. So, I started to think about a simple disassembler. 2 weeks ago we decided to try Claude to see what it could do. We now have a disassembler, assembler and a partially working emulator. No doubt there are bugs and missing features and the code is a bit messy, but boy has it sped up the work. One thing did occur to me. Vendors of small utilities could be in trouble. For example I needed to cut out some pages from a pdf. I could have found a tool online(I’m sure there are several), write one myself. However, Claude quickly performed the task.

TeMPOraL · 2026-01-28T08:54:16 1769590456

> Vendors of small utilities could be in trouble. For example I needed to cut out some pages from a pdf. I could have found a tool online(I’m sure there are several), write one myself. However, Claude quickly performed the task.

Definitely. Making small, single-purpose utilities with LLMs is almost as easy these days as googling for them on-line - much easier, in fact, if you account for time spent filtering out all the malware, adware, "to finish the process, register an account" and plain broken "tools" that dominate SERP.

Case in point, last time my wife needed to generate a few QR codes for some printouts for an NGO event, I just had LLM make one as a static, single-page client-side tool and hosted it myself -- because that was the fastest way to guarantee it's fast, reliable, free of surveillance economy bullshit, and doesn't employ URL shorteners (surprisingly common pattern that sometimes becomes a nasty problem down the line; see e.g. a high-profile case of some QR codes on food products leading to porn sites after shortlink got recycled).

jedberg · 2026-01-28T00:33:04 1769560384

> You realize that stamina is a core bottleneck to work

There has been a lot of research that shows that grit is far more correlated to success than intelligence. This is an interesting way to show something similar.

AIs have endless grit (or at least as endless as your budget). They may outperform us simply because they don't ever get tired and give up.

Full quote for context:

Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased.

michalsustr · 2026-01-28T09:51:24 1769593884

The tenacity aspect makes me worried about the paper clip AI misalignment scenario more than before.

dust42 · 2026-01-28T08:22:20 1769588540

> AIs have endless grit (or at least as endless as your budget).

That is the only thing he doesn't address: the money it costs to run the AI. If you let the agents loose, they easily burn north of 100M tokens per hour. Now at $25/1M tokens that gets quickly expensive. At some point, when we are all drug^W AI dependent, the VCs will start to cash in on their investments.

Loeffelmann · 2026-01-28T07:35:02 1769585702

If you ever work with LLMs you know that they quite frequently give up.

Sometimes it's a

    // TODO: implement logic

or a

"this feature would require extensive logic and changes to the existing codebase".

Sometimes they just declare their work done. Ignoring failing tests and builds.

You can nudge them to keep going but I often feel like, when they behave like this, they are at their limit of what they can achieve.

wongarsu · 2026-01-28T09:33:22 1769592802

If I tell it to implement something it will sometimes declare their work done before it's done. But if I give Claude Code a verifiable goal like making the unit tests pass it will work tirelessly until that goal is achieved. I don't always like the solution, but the tenacity everyone is talking about is there

jedberg · 2026-01-28T08:29:50 1769588990

> If you ever work with LLMs you know that they quite frequently give up.

If you try to single shot something perhaps. But with multiple shots, or an agent swarm where one agent tells another to try again, it'll keep going until it has a working solution.

energy123 · 2026-01-28T08:00:43 1769587243

Using LLMs to clean those up is part of the workflow that you're responsible for (... for now). If you're hoping to get ideal results in a single inference, forget it.

0xbadcafebee · 2026-01-27T21:32:47 1769549567

> What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows a lot.

I was thinking about this the other day as relates to the DevOps movement.

The DevOps movement started as a way to accelerate and improve the results of dev<->ops team dynamics. By changing practices and methods, you get acceleration and improvement. That creates "high-performing teams", which is the team form of a 10x engineer. Whether or not you believe in '10x engineers', a high-performing team is real. You really can make your team deploy faster, with fewer bugs. You have to change how you all work to accomplish it, though.

To get good at using AI for coding, you have to do the same thing: continuous improvement, changing workflows, different designs, development of trust through automation and validation. Just like DevOps, this requires learning brand new concepts, and changing how a whole team works. This didn't get adopted widely with DevOps because nobody wanted to learn new things or change how they work. So it's possible people won't adapt to the "better" way of using AI for coding, even if it would produce a 10x result.

If we want this new way of working to stick, it's going to require education, and a change of engineering culture.

jimbokun · 2026-01-27T18:29:16 1769538556

I'm pretty happy with Copilot in VS Code. Type what change I want Claude to make in the Copilot panel, and then use the VS Code in context diffs to accept or reject the proposed changes. While being able to make other small changes on my own.

So I think this tracks with Karpathy's defense of IDEs still being necessary ?

Has anyone found it practical to forgo IDEs almost entirely?

everfrustrated · 2026-01-28T01:19:35 1769563175

I've found copilot chat is able to do everything I need. I tried the Claude plugin for vscode and it was a noticeably worse experience for me.

Mind you copilot has only supported agent mode relatively recently.

I really like the way copilot does changes in such a way you can accept or reject and even revert to point in time in the chat history without using git. Something about this just fits right with how my brain works. Using Claude plugin just felt like I had one hand tied behind my back.

thunfischtoast · 2026-01-28T07:16:26 1769584586

I find Claude Code in VS Code is sometimes horribly inefficient. I tell it to replace some print-statements with proper logging in the one file I have open and it first starts burning tokens to understand the codebase for the 13th time today, despite not needing to and having it laid out in the CLAUDE.md already.

vmbm · 2026-01-27T18:45:25 1769539525

I have been assigning issues to copilot in Github. It will then create a pull request and work on and report back on the issue in the PR. I will pull the code and make small changes locally using VSCode when needed.

But what I like about this setup is that I have almost all the context I need to review the work in a single PR. And I can go back and revisit the PR if I ever run into issues down the line. Plus you can run sessions in parallel if needed, although I don't do that too much.

simonw · 2026-01-27T21:02:36 1769547756

Are you letting it run your tests and run little snippets of code to try them out (like "python -c 'import module; print(module.something())'") or are you just using it to propose diffs for you to accept or reject?

This stuff gets a whole lot more interesting when you let it start making changes and testing them by itself.

maxdo · 2026-01-27T18:32:27 1769538747

Coplilot is not on par with cc or cursor even

jimbokun · 2026-01-27T18:41:08 1769539268

I use it to access Claude. So what's the difference?

nsingh2 · 2026-01-27T20:27:31 1769545651

This stuff is a little messy and opaque, but the performance of the same model in different harnesses depends a lot on how context is managed. The last time I tried Copilot, it performed markedly worse for similar tasks compared to Claude Code. I suspect that Copilot was being very aggressive in compressing context to save on token cost, but I'm not 100% certain about this.

Also note that with Claude models, Copilot might allocate a different number of thinking tokens compared to Claude Code.

Things may have changed now compared to when I tried it out, these tools are in constant flux. In general I've found that harnesses created by the model providers (OpenAI/Codex CLI, Anthropic/Claude Code, Google/Gemini CLI) tend to be better than generalist harnesses (cheaper too, since you're not paying a middleman).

walthamstow · 2026-01-27T21:14:20 1769548460

Different harnesses and agentic environments produce different results from the same model. Claude Code and Cursor are the best IME and Copilot is by far the worst.

WA · 2026-01-27T18:36:06 1769538966

Why not? You can select Opus 4.5, Gemini 3 Pro, and others.

spaceman_2020 · 2026-01-27T18:41:42 1769539302

Claude Code is a CLI tool which means it can do complete projects in a single command. Also has fantastic tools for scaffolding and harnessing the code. You can define everything from your coding style to specific instructions for designing frontpages, integrating payments, etc.

It's not about the model. It's about the harness

binarycrusader · 2026-01-27T22:22:12 1769552532

Claude Code is a CLI tool which means it can do complete projects in a single command

https://github.com/features/copilot/cli/

piker · 2026-01-27T21:20:16 1769548816

This would make some sense if VS Code didn't have a terminal built into it. The LLMs have the same bash capabilities in either form.

maxdo · 2026-01-27T18:41:53 1769539313

it's not a model limit anymore, it's tools , skills, background agents, etc. It's an entire agentic environment.

illnewsthat · 2026-01-27T18:59:31 1769540371

Github copilot has support for this stuff as well. Agent skills, background/subagents, etc.

jwilliams · 2026-01-28T02:56:05 1769568965

> It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later.

This is true... Equally I've seen it dive into a rabbit hole, make some changes that probably aren't the right direction... and then keep digging.

This is way more likely with Sonnet, Opus seems to be better at avoiding it. Sonnet would happily modify every file in the codebase trying to get a type error to go away. If I prompt "wait, are you off track?" it can usually course correct. Again, Opus seems way better at that part too.

Admittedly this has improved a lot lately overall.

gloosx · 2026-01-28T07:28:26 1769585306

So what is he even coding there all the time?

Does anybody have any info on what he is actually working on besides all the vibe-coding tweets?

There seems to be zero output from they guy for the past 2 years (except tweets)

beng-nl · 2026-01-28T09:42:29 1769593349

He's building Eureka Labs[1], an AI-first education company (can't wait to use it). He's both a strong researcher[2] and an unusually gifted technical communicator. His recent videos[3] are excellent educational material.

More broadly though: someone with his track record sharing firsthand observations about agentic coding shouldn't need to justify it by listing current projects. The observations either hold up or they don't.

[1] https://x.com/EurekaLabsAI

[2] PhD in DL, early OpenAI, founding head of AI at Tesla

[3] https://www.youtube.com/@AndrejKarpathy/videos

ayewo · 2026-01-28T09:24:11 1769592251

> There seems to be zero output from they guy for the past 2 years (except tweets)

Well, he made Nanochat public recently and has been improving it regularly [1]. This doesn't preclude that he might be working on other projects that aren't public yet (as part of his work at Eureka Labs).

1: https://github.com/karpathy/nanochat

ruszki · 2026-01-28T08:10:15 1769587815

I don’t know, but it’s interesting that he and many others come up with this “we should act like LLMs are junior devs”. There is a reason why most junior devs work on fairly separate parts of products, most of the time parts which can be removed or replaced easily, and not an integral part of products: because their code is usually quite bad. Like every few lines contains issues, suboptimal solutions, and full with architectural problems. You basically never trust junior devs with core product features. Yet, we should pretend that an “LLM junior dev” is somehow different. These just signal to me that these people don’t work on serious code.

augment_me · 2026-01-28T07:46:02 1769586362

This is the first question I ask, and every time I get the answer of some monolith that supposedly solves something. Imo, this is completely fine for any personal thing, I am happy when someone says they made an API to compare weekly shopping prices from the stores around them, or some recipe, this makes sense.

However more often than not, someone is just building a monolithic construction that will never be looked at again. For example, someone found that HuggingFace dataloader was slow for some type of file size in combination with some disk. What does this warrant? A 300000+ line non-reviewed repo to fix this issue. Not a 200-line PR to HuggingFace, no you need to generate 20% of the existing repo and then slap your thing on there.

For me this is puzzling, because what is this for? Who is this for? Usually people built these things for practice, but now its generated, so its not for practice because you made very little effort on it. The only thing I can see that its some type of competence signaling, but here again, if the engineer/manager looking knows that this is generated, it does not have the type of value that would come with such signaling. Either I am naive and people still look at these repos and go "whoa this is amazing", or it's some kind of induced egotrip/delusion where the LLM has convinced you that you are the best builder.

oxag3n · 2026-01-27T23:36:10 1769556970

> Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually... > Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it.

Until you struggle to review it as well. Simple exercise to prove it - ask LLM to write a function in familiar programming language, but in the area you didn't invest learning and coding yourself. Try reviewing some code involving embedding/SIMD/FPGA without learning it first.

sleazebreeze · 2026-01-27T23:40:14 1769557214

People would struggle to review code in a completely unfamiliar domain or part of the stack even before LLMs.

piskov · 2026-01-28T00:07:15 1769558835

That’s why you need to write code to learn it.

No-one has ever learned skill just by reading/observing

chrisjj · 2026-01-28T00:00:45 1769558445

No, because they wouldn't be so foolish as to try it.

AstroBen · 2026-01-28T01:52:10 1769565130

How would you find yourself in that situation before AI?

kshri24 · 2026-01-28T02:04:16 1769565856

Agree with Karpathy's take. Finally a down to Earth analysis from a respected source in the AI space. I guess I'll be using slopocalypse a lot more now :)

> I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media

It has arrived. Github will be most affected thanks to git-terrorists at Apna College refusing to take down that stupid tutorial. IYKYK.

ActorNightly · 2026-01-28T06:19:35 1769581175

The respect is unwarranted.

He ran Teslas ML division, but still doesnt know what a simple kalman filter is (in the sense where he claimed that lidar would be hard to integrate with cameras).

strogonoff · 2026-01-27T18:27:33 1769538453

LLM coding splits up engineers based on those who primarily like building and those who primarily like code reviews and quality assessment. I definitely don’t love the latter (especially when reviewing decisions not made by a human with whom I can build long-term personal rapport).

After certain experience threshold of making things from scratch, “coding” (never particularly liked that term) has always been 99% building, or architecture, and I struggle to see how often a well-architected solution today, with modern high-level abstractions, requires so much code that you’d save significant time and effort by not having to just type, possibly with basic deterministic autocomplete, exactly what you mean (especially considering you would have to also spend time and effort reviewing whatever was typed for you if you used a non-deterministic autocomplete).

OkayPhysicist · 2026-01-27T19:04:58 1769540698

See, I don't take it that extreme: LLMs make fantastic, never-before seen quality autocompletes. I hacked together a Neovim plugin that prompts an LLM to "finish this function" on command, and it's a big time save for the menial plumbing type operations. Think things like "this api I use expects JSON that encodes some subset of SQL, I want all the dogs with Ls in their name that were born on a Tuesday". Given an example of such API (or if the documentation ended up in its training), LLMs will consistently one-shot stuff like that.

Asking it to do entire projects? Dumb. You end up with spaghetti, unless you hand-hold it to a point that you might as well be using my autocomplete method.

gverrilla · 2026-01-28T03:37:16 1769571436

Depends on the scope of the project. If it's small, and you direct it correctly, it can one-shot yes. Or 2-3-shot.

jeffreygoesto · 2026-01-28T07:39:25 1769585965

> How much of society is bottlenecked by digital knowledge work?

I think not much. The real society bottleneck is that a growing number of peeps try to convince each other that life and society are a zero sum game.

They are so much more if we don't do that.

einrealist · 2026-01-27T18:44:42 1769539482

> It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later.

Somewhere, there are GPUs/NPUs running hot. You send all the necessary data, including information that you would never otherwise share. And you most likely do not pay the actual costs. It might become cheaper or it might not, because reasoning is a sticking plaster on the accuracy problem. You and your business become dependent on this major gatekeeper. It may seem like a good trade-off today. However, the personal, professional, political and societal issues will become increasingly difficult to overlook.

cyode · 2026-01-27T22:18:56 1769552336

This quote stuck out to me as well, for a slightly different reason.

The “tenacity” referenced here has been, in my opinion, the key ingredient in the secret sauce of a successful career in tech, at least in these past 20 years. Every industry job has its intricacies, but for every engineer who earned their pay with novel work on a new protocol, framework, or paradigm, there were 10 or more providing value by putting the myriad pieces together, muddling through the ever-waxing complexity, and crucially never saying die.

We all saw others weeded out along the way for lacking the tenacity. Think the boot camp dropouts or undergrads who changed majors when first grappling with recursion (or emacs). The sole trait of stubbornness to “keep going” outweighs analytical ability, leetcode prowess, soft skills like corporate political tact, and everything else.

I can’t tell what this means for the job market. Tenacity may not be enough on its own. But it’s the most valuable quality in an employee in my mind, and Claude has it.

noosphr · 2026-01-28T00:15:19 1769559319

There is an old saying back home: an idiot never tires, only sweats.

Claude isn't tenacious. It is an idiot that never stops digging because it lacks the meta cognition to ask 'hey, is there a better way to do this?'. Chain of thought's whole raison d'etre was so the model could get out of the local minima it pushed itself in. The issue is that after a year it still falls into slightly deeper local minima.

This is fine when a human is in the loop. It isn't what you want when you have a thousand idiots each doing a depth first search on what the limit of your credit card is.

Havoc · 2026-01-28T00:40:08 1769560808

> it lacks the meta cognition to ask 'hey, is there a better way to do this?'.

Recently had an AI tell me this code (that it wrote) is a mess and suggested wiping it and starting from scratch with a more structure plan. That seems to hint at some meta cognition outlines

zzrrt · 2026-01-28T00:47:49 1769561269

Haha, it has the human developer traits of thinking all old code is garbage, failing to identify oneself as the dummy who wrote this particular code, and wanting to start from scratch.

dpkirchner · 2026-01-28T00:53:47 1769561627

It's like NIH syndrome but instead "not invented here today". Also a very human thing.

globular-toast · 2026-01-28T07:54:52 1769586892

More like NIITS: Not Invented in this Session.

rurp · 2026-01-28T04:34:39 1769574879

Perhaps. I've had LLMs tell me some code is deeply flawed garbage that should be rewritten about code that exact same LLM wrote minutes before. It could be a sign of deep meta cognition, or it might be due to some cognitive gaps where it has no idea why it did something a minute ago and suddenly has a different idea.

lbrito · 2026-01-28T01:58:49 1769565529

Someone will say "you just need to instruct Claude.md to be more meta and do a wiggum loop on it"

teaearlgraycold · 2026-01-28T03:41:06 1769571666

I asked Claude to analyze something and report back. It thought for a while said “Wow this analysis is great!” and then went back to thinking before delivering the report. They’re auto-sycophantic now!

hyperadvanced · 2026-01-28T01:34:19 1769564059

Metacognition As A Service, you say?

guy4261 · 2026-01-28T03:28:18 1769570898

Running on the Meta Cognition Protocol server near you.

baxtr · 2026-01-28T05:47:33 1769579253

You’ll get sued by Meta for this!

r-w · 2026-01-28T03:25:05 1769570705

I think that’s called “consulting”.

karlgkk · 2026-01-28T06:14:02 1769580842

lol no it doesn’t. It hints at convincing language models

samusiam · 2026-01-28T01:26:35 1769563595

I mean, not always. I've seen Claude step back and reconsider things after hitting a dead end, and go down a different path. There are also workflows, loops that can increase the likelihood of this occurring.

BeetleB · 2026-01-27T23:15:28 1769555728

This is a major concern for junior programmers. For many senior ones, after 20 (or even 10) years of tenacious work, they realize that such work will always be there, and they long ago stopped growing on that front (i.e. they had already peaked). For those folks, LLMs are a life saver.

At a company I worked for, lots of senior engineers become managers because they no longer want to obsess over whether their algorithm has an off by one error. I think fewer will go the management route.

(There was always the senior tech lead path, but there are far more roles for management than tech lead).

codyb · 2026-01-28T03:27:05 1769570825

I feel like if you're really spending a ton of time on off by one errors after twenty years in the field you haven't actually grown much and have probably just spent a ton of time in a single space.

Otherwise you'd be senior staff to principle range and doing architecture, mentorship, coordinating cross team work, interviewing, evaluating technical decisions, etc.

I got to code this week a bit and it's been a tremendous joy! I see many peers at similar and lower levels (and higher) who have more years and less technical experience and still write lots of code and I suspect that is more what you're talking about. In that case, it's not so much that you've peaked, it's that there's not much to learn and you're doing a bunch of the same shit over and over and that's of course tiring.

I think it also means that everything you interact with outside your space does feel much harder because of the infrequency with which you have interacted with it.

If you've spent your whole career working the whole stack from interfaces to infrastructure then there's really not going to be much that hits you as unfamiliar after a point. Most frameworks recycle the same concepts and abstractions, same thing with programming languages, algorithms, data management etc.

But if you've spent most of your career in one space cranking tickets, those unknown corners are going to be as numerous as the day you started and be much more taxing.

rishabhaiover · 2026-01-27T23:26:01 1769556361

That's just sad. Right when I found love in what I do, my work has no value anymore.

jasonfarnon · 2026-01-28T00:28:55 1769560135

Aren't you still better off than the rest of us who found what they love + invested decades in it before it lost its value. Isn't it better to lose your love when you still have time to find a new one?

josephg · 2026-01-28T02:39:16 1769567956

I don't think so. Those of us who found what we love and invested decades into it got to spend decades getting paid well to do what we love.

pesus · 2026-01-28T00:38:34 1769560714

Depends on if their new love provides as much money as their old one, which is probably not likely. I'd rather have had those decades to stash and invest.

jasonfarnon · 2026-01-28T00:46:45 1769561205

A lot of pre-faang engineers dont have the stash you're thinking about. What you meant was "right when I found a lucrative job that I love". What was going on in tech these last 15 years, unfortunately, probably was once in a lifetime.

WarmWash · 2026-01-28T01:49:31 1769564971

It's crazy to think back in the 80's programmers had "mild" salaries despite programming back then being worlds more punishing. No libraries, no stack exchange, no forums, no endless memory and infinite compute. If you had a challenging bug you better also be proficient in reading schematics and probing circuits.

lurking_swe · 2026-01-28T08:30:28 1769589028

on the bright side software evolved much more slowly in the 80s. You could go very far by being an expert in 1 thing.

People had real offices with actual quiet focus time.

User expectations were also much lower.

pros and cons i guess?

nfredericks · 2026-01-28T00:49:52 1769561392

This is genuinely such a good take

dugidugout · 2026-01-28T04:35:16 1769574916

Especially on the topic of value! We are all intuitively aware that value is highly contextual, but get in a knot trying to rationalize value long past genuine engagement!

test6554 · 2026-01-27T23:35:40 1769556940

Imagine a senior dev who just approves PRs, approves production releases, and prioritizes bug reports and feature requests. LLM watches for errors ceaslessly, reports an issue. Senior dev reviews the issue and assigns a severity to it. Another LLM has a backlog of features and errors to go solve, it makes a fix and submits a PR after running tests and verifying things work on its end.

techgnosis · 2026-01-28T00:44:24 1769561064

Why are we pretending like the need for tenacity will go away? Certain problems are easier now. We can tackle larger problems now that also require tenacity.

samusiam · 2026-01-28T01:24:35 1769563475

Even right at this very moment where we have a high-tenacity AI, I'd argue that working with the AI -- that is to say, doing AI coding itself and dealing with the novel challenges that brings requires a lot of stubborn persistence.

mykowebhn · 2026-01-28T05:58:37 1769579917

Fittingly, George Hinton toiled away for years in relative obscurity before finally being recognized for his work. I was always quite impressed by his "tenacity".

So although I don't think he should have won the Nobel Prize because not really physics, I felt his perseverance and hard work should merit something.

daxfohl · 2026-01-27T20:06:03 1769544363

I still find in these instances there's at least a 50% chance it has taken a shortcut somewhere: created a new, bigger bug in something that just happened not to have a unit test covering it, or broke an "implicit" requirement that was so obvious to any reasonable human that nobody thought to document it. These can be subtle because you're not looking for them, because no human would ever think to do such a thing.

Then even if you do catch it, AI: "ah, now I see exactly the problem. just insert a few more coins and I'll fix it for real this time, I promise!"

gtowey · 2026-01-27T21:00:03 1769547603

The value extortion plan writes itself. How long before someone pitches the idea that the models explicitly almost keep solving your problem to get you to keep spending? Would you even know?

password4321 · 2026-01-27T23:53:33 1769558013

First time I've seen this idea, I have a tingling feeling it might become reality sooner rather than later.

sailfast · 2026-01-27T22:31:41 1769553101

That’s far-fetched. It’s in the interest of the model builders to solve your problem as efficiently as possible token-wise. High value to user + lower compute costs = better pricing power and better margins overall.

d0mine · 2026-01-27T23:11:25 1769555485

> far-fetched

Remember Google?

Once it was far-fetched that they would make the search worse just to show you more ads. Now, it is a reality.

With tokens, it is even more direct. The more tokens users spend, the more money for providers.

retsibsi · 2026-01-28T02:53:12 1769568792

> Now, it is a reality.

What are the details of this? I'm not playing dumb, and of course I've noticed the decline, but I thought it was a combination of losing the battle with SEO shite and leaning further and further into a 'give the user what you think they want, rather than what they actually asked for' philosophy.

supriyo-biswas · 2026-01-28T04:37:40 1769575060

https://www.wheresyoured.at/the-men-who-killed-google/

throwthrowuknow · 2026-01-28T00:20:26 1769559626

Only if you are paying per token on the API. If you are paying a fixed monthly fee then they lose money when you need to burn more tokens and they lose customers when you can’t solve your problems within that month and max out your session limits and end up with idle time which you use to check if the other providers have caught up or surpassed your current favourite.

layla5alive · 2026-01-28T04:53:53 1769576033

Indeed, unlimited plan seems like the only way that makes sense to not have it be guaranteed to be abused by the provider

xienze · 2026-01-27T23:02:06 1769554926

> It’s in the interest of the model builders to solve your problem as efficiently as possible token-wise.

Unless you’re paying by the token.

Fnoord · 2026-01-28T00:37:03 1769560623

I was thinking more of deliberate backdoor in code. RCE is an obvious example, but another one could be bias. "I'm sorry ma'am, computer says you are ineligable for a bank account." These ideas aren't new. They were there in 90s already when we still thought about privacy and accountability regarding technology, and dystopian novels already described them long, long ago.

fragmede · 2026-01-27T21:37:11 1769549831

The free market proposition is that competition (especially with Chinese labs and grok) means that Anthropic is welcome to do that. They're even welcome to illegally collude with OpenAi such that ChatGPT is similarly gimped. But switching costs are pretty low. If it turns out I can one shot an issue with Qwen or Deepseek or Kimi thinking, Anthropic loses not just my monthly subscription, but everyone else's I show that too. So no, I think that's some grade A conspiracy theory nonsense you've got there.

coffeefirst · 2026-01-27T21:51:09 1769550669

It’s not that crazy. It could even happen by accident in pursuit of another unrelated goal. And if it did, a decent chunk of the tech industry would call it “revealed preference” because usage went up.

hnuser123456 · 2026-01-27T22:29:04 1769552944

LLMs became sycophantic and effusive because those responses were rated higher during RLHF, until it became newsworthy how obviously eager-to-please they got, so yes, being highly factually correct and "intelligent" was already not the only priority.

daxfohl · 2026-01-27T22:48:22 1769554102

To be clear I don't think that's what they're doing intentionally. Especially on a subscription basis, they'd rather me maximize my value per token, or just not use them. Lulling users into using tokens unproductively is the worst possible option.

The way agents work right now though just sometimes feels that way; they don't have a good way of saying "You're probably going to have to figure this one out yourself".

bandrami · 2026-01-28T00:46:05 1769561165

> But switching costs are pretty low

Switching costs are currently low. Once you're committed to the workflow the providers will switch to prepaying for a year's worth of tokens.

jrflowers · 2026-01-27T21:59:07 1769551147

This is a good point. For example if you have access to a bunch of slot machines, one of them is guaranteed to hit the jackpot. Since switching from one slot machine to another is easy, it is trivial to go from machine to machine until you hit the big bucks. That is why casinos have such large selections of them (for our benefit).

krupan · 2026-01-27T22:49:19 1769554159

"for our benefit" lol! This is the best description of how we are all interacting with LLMs now. It's not working? Fire up more "agents" ala gas town or whatever

robotmaxtron · 2026-01-28T03:42:58 1769571778

last time I was at a casino I checked to see what company built the machines, imagine my surprise that it was (by my observation) a single vendor.

thunderfork · 2026-01-27T21:45:52 1769550352

As a rational consumer, how would you distinguish between some intentional "keep pulling the slot machine" failure rate and the intrinsic failure rate?

I feel like saying "the market will fix the incentives" handwaves away the lack of information on internals. After all, look at the market response to Google making their search less reliable - sure, an invested nerd might try Kagi, but Google's still the market leader by a long shot.

In a market for lemons, good luck finding a lime.

krupan · 2026-01-27T22:50:04 1769554204

FWIW, kagi is better than Google

chanux · 2026-01-28T03:46:33 1769571993

Is this from a page of dating apps playbook?

wvenable · 2026-01-27T22:11:02 1769551862

> These can be subtle because you're not looking for them

After any agent run, I'm always looking the git comparison between the new version and the previous one. This helps catch things that you might otherwise not notice.

teaearlgraycold · 2026-01-28T03:42:53 1769571773

And after manually coding I often have an LLM review the diff. 90% of the problems it finds can be discounted, but it’s still a net positive.

einrealist · 2026-01-28T07:09:55 1769584195

And there is this paradox where it becomes harder to detect the problems as the models 'improve'.

charcircuit · 2026-01-27T21:19:37 1769548777

You are using it wrong, or are using a weak model if your failure rate is over 50%. My experience is nothing like this. It very consistently works for me. Maybe there is a <5% chance it takes the wrong approach, but you can quickly steer it in the right direction.

testaccount28 · 2026-01-27T21:26:01 1769549161

you are using it on easy questions. some of us are not.

meowface · 2026-01-28T06:11:06 1769580666

A lot of people are getting good results using it on hard things. Obviously not perfect, but > 50% success.

That said, more and more people seem to be arriving at the conclusion that if you want a fairly large-sized, complex task in a large existing codebase done right, you'll have better odds with Codex GPT-5.2-Codex-XHigh than with Claude Code Opus 4.5. It's far slower than Opus 4.5 but more likely to get things correct, and complete, in its first turn.

mikkupikku · 2026-01-27T22:08:08 1769551688

I think a lot of it comes down to how well the user understands the problem, because that determines the quality of instructions and feedback given to the LLM.

For instance, I know some people have had success with getting claude to do game development. I have never bothered to learn much of anything about game development, but have been trying to get claude to do the work for me. Unsuccessful. It works for people who understand the problem domain, but not for those who don't. That's my theory.

samrus · 2026-01-27T22:50:09 1769554209

It works for hard problems when the person already solves it and just needs the grunt work done

It also works for problems that have been solved a thousand times before, which impresses people and makes them think it is actually solving those problems

daxfohl · 2026-01-27T23:32:10 1769556730

Which matches what they are. They're first and foremost pattern recognition engines extraordinaire. If they can identify some pattern that's out of whack in your code compared to something in the training data, or a bug that is similar to others that have been fixed in their training set, they can usually thwack those patterns over to your latent space and clean up the residuals. If comparing pattern matching alone, they are superhuman, significantly.

"Reasoning", however, is a feature that has been bolted on with a hacksaw and duct tape. Their ability to pattern match makes reasoning seem more powerful than it actually is. If your bug is within some reasonable distance of a pattern it has seen in training, reasoning can get it over the final hump. But if your problem is too far removed from what it has seen in its latent space, it's not likely to figure it out by reasoning alone.