When the very first ChatGPT transformed a simple C "hello world" into Python. I knew it's special. I'm a very big supporter ever since, including some worried moments of pondering about what our future would look like and what's the meaning of a having a profession - especially software which defined my life from childhood - for my kids.
I'm now very good with LLMs as a user and at the system/product level but I understand it's not a simple story of replacing people. They're exponentially better than us at some things, and allow me to create things professionally which I couldn't do with an entire team of experts, but the bullshit compounds fast.
Out of laziness I several times asked Claude and ChatGPT each some torque figures and other simple, hard data related to my dirt bike. They often got it completely wrong, but full of confidence every time. I never trust LLMs with hard data, unless you RAG the PDF into the context and even then it's sketchy.
Dates matters. Questions I asked about my Mazda a year ago that were total hucillunations were answered very well this year. To me it feel like the early days of computing. What was not possible one year became possible when a new generation CPU or GPU came out and you have to consistently re-evaluate your expectations or else you'll miss the things that others are discovering with fresh eyes.
I made this personal 'benchmark' of odd and strange questions a few years back when this took off and I would keep re-running these questions whenever some big news came out about a new model and also going back and fourth between the different companies to see where they all stood. (Obvioulsy with clean cache/new accounts)
10 questions: In 2023 it could only get past question 3-4 to reaching the last question and still hacillunating(last year) to providing sources pulled from really obscure books(this year).
For example, one of the harder questions was about the transition of a particular 30 second portion of a background song used in a 30+ year old Bond film that was only played once in the entire film. Went from totally making up nonsense to accurately describing the music theory defintiion of the transition(called a 'stinger') to also explaining why it was done in that particular scene of the film and also providing sources from a snippet of a unrelated interview with the composer explaining his mindset at the time.
Maybe this isn't considered a real benchmark as its not reproducable but for a 'personal benchmark' I came away impressed. I would consider everyone to define their own benchmarks and 'tests' and to consistantly challenge the models to see if there are any meaningful improvements. Now I treat the AI as something to keep skeptical but to also to always consider what it proposes as an answer(ie. dont ever dismiss it outright). I sometimes wonder if this is slowly messing up my biases and maybe thats what Altman, Amodei and others want.
Hard numbers, no. Even high level concepts and theory you need to triangulate and prompt in different angles, across different models, and figure out what overlaps to build a mental mode that’s - even then - roughly 80% correct. It’s better than google, but the information isn’t free
Yea, I’m sure the personal plans are subsidized. I have $200 Claude Max at home and straight API pricing at work and equivalent work would easily cost me 5x if not more on the API.
Bigger, because no one expects beauty from Fiat. That said, the Multipla was a bold and brilliant car. This one is only bold in the sense that “I can’t believe Ferrari allowed that to happen”. It’s kind of the Balenciaga of cars: will rich people buy just about anything with the right logo on?
Can't avoid gloating over this one. Just like the Palestinian identity was created and weaponized against Israel by the Arab world, now Canadians will get a taste of their own medicine courtesy of the Trump admin.
You got the sides wrong unfortunately, one of the states you are mentioning was literally created in the last century and is now doing the same thing that prompted its creation. But it must be nice living in ignorance and buying the propaganda.
Whether Palestinians have a national identity or not, driving them out of their homes at gunpoint and settling in is a war crime.
Albertans, while obviously the most disadvantaged and persecuted Canadians in recorded history, have not yet had anyone commiting genocide or war crimes against them.
But why gloat? What are you winning? Even if there were prizes here (spoiler: all the loot boxes are empty in this game), do you perceive yourself better off because of this?
>now Canadians will get a taste of their own medicine courtesy of the Trump admin.
Ah so no, you're just in the higher end of the sinking canoe laughing at the people who are drowning.
The problem is that technical debt is compounding. Bad LLM architectural and implementation decisions just blend in to the background and you build layer upon layer of a mess. At some point it becomes difficult and expensive (token wise) to maintain this code, even for an agent.
I mitigate this by few things:
1. Checkpoints every few days to thoroughly review and flag issues. Asking the LLM to impersonate (Linus Torvalds is my favorite) yields different results.
2. Frequent refactors. LLMs don't get discouraged from throwing things out like humans do. So I ask for a refactor when enough stuff accumulates.
3. Use verbose, typed languages. C# on the backend, TypeScript on the frontend.
Does it produce quality code? Locally yes, architecturally I don't know - it works so far, I guess. Anyway, my alternative is not to make this software I'm writing better but not making it at all for the lack of time, so even if it's subpar it still brings business value.
Scientists and engineers also invented Zyklon-B gas and built the crematoriums in the concentration camps. Don’t underestimate what scientists and engineers can do to Jews.
It's kind of a learning JIT. It's no use to go through and memorize something you don't need in the short term. It's hard to memorize well and by the time you need to draw on the knowledge it's already hazy.
This is why you can think of such documentation more as a reference manual and not just plain documentation.
In any case, AI is great for traversing a codebase and producing at least a draft of such documentation.
In terms of runtime performance of applications, AI is a net win. You can easily remove abstractions like Electron, React, various libraries. Just let the AI write more code. You can even do the unthinkable and write desktop native again.
I'm now very good with LLMs as a user and at the system/product level but I understand it's not a simple story of replacing people. They're exponentially better than us at some things, and allow me to create things professionally which I couldn't do with an entire team of experts, but the bullshit compounds fast.
reply