Hacker Newsnew | past | comments | ask | show | jobs | submit | bel8's commentslogin

It's a start and I welcome competition but I don't think I ever used small cloud models like Haiku 4.5. They are cute but for serious coding they tend to waste your expensive time.

And this certainly wont bring me back to GitHub Copilot which I cancelled yesterday.

GitHub Copilot had competitive pricing until yesterday when they changed from per-request to one of the most expensive per-token quotas. Seriously, take a look at their burning subreddit for some laughs: https://www.reddit.com/r/GithubCopilot

I have since changed to DeekSeek Flash on high which is Sonnet+ level for almost free.

If I feel I still need smarter models I might signup for $20/mo Codex to use GPT 5.5 which, in my opinion, is the best I can access right now.


I use larger models to organize work into a topologically sorted task graph and pin smaller models to the tasks depending on the complexity with a larger model evaluating the work and patching where necessary. This uses haiku quite often for routine work. I’m able to do multi hour highly complex work with superior results and a much lower bill as a result by doing this, with a parent orchestrator able to do a massive labor within a single context window by effectively organizing work and reviewing quality and integrating where needed. I don’t use haiku directly, but it’s often 30-40% of any major efforts token use. This further improves time to completion as well as cost - but I find haiku is better at following literal instructions and plans without “second guessing,” while opus class models second guess in their thinking constantly.

As such, haiku isn’t a waste of my time, it saves enormous amounts of time for me. But I spent a large amount of time building the orchestration system up front and iterating on it to get here. Interestingly i found my experience as a director and later a distinguished engineer gave me the tools to build it and get it working well and reliably end to end - the dynamics of multi agent workflows of varying capability is not a lot different than the dynamics of a 1000 engineer organization.


Everyone does that. But I don't find Haiku useful for actual coding tasks. Good to, ehm, generate commit messages and summaries.

In my tests, openweight Qwens and GLM are way better than it.


Got anything from your orchestrator you could share that’s usable by others? Sounds like how I’d like to work but is difficult to get going from scratch

https://github.com/7mind/baboon - all the backends apart from C# and Scala ones were created automatically, same for LSP server, same for playground.

I've been doing benchmarking of various models for finding hard security bugs, and my faith in Haiku (and Sonnet, even) has dropped precipitously in the process. Self-hosted Qwen 3.6 27B consistently outperforms both for finding security bugs, which was a shocking result. I expected Qwen to be around Haiku level, maybe a little worse, and I definitely expected it to be worse than Sonnet.

And, DeepSeek and MiMo perform much better than Haiku and Sonnet, near Opus/GPT 5.5 levels, at a fraction of the cost.

There's seemingly no reason to ever use Haiku or Sonnet, if you're not getting it for free or as part of a subscription (that you don't usually saturate).


I don't think that's what these small models are for. They are for things like text summarization and generating a title for your AI session. Maybe Haiku occupies a weird zone where it's overpowered for those tasks but underpowered for anything more sophisticated. But for example I used it on an agentic reasoning task recently (reading a chunk of information and drawing a written conclusion, not writing code) and it did just fine. More powerful model would have been a waste of money.

Sure, but it's priced higher than many better models. I'm not saying use the biggest models for everything. I'm saying Haiku is not a great deal as small models go. You can even self-host a model that is competitive if you've got a pretty beefy machine.

Haiku costs $1/$5. DeepSeek V4 Flash, a stronger model, is only $0.0028/$0.14/$0.28. That first number is the cached input, and DeepSeek caching is crazy efficient. So, using DeepSeek V4 Flash costs about an order of magnitude less than Haiku and performs better.

I have a Claude subscription because I'm willing to pay a premium for the best model for coding, one that doesn't waste as much of my time doing dumb stuff. But, if I need something other than Claude Code, I'm using something other than Claude models. Why burn money for no benefit?

Oh, also, Haiku chews tokens like crazy. In my benchmarks it used three times more tokens than the next highest model. Of course, security bug hunting is not in its wheelhouse, so it's not fair to judge it based on that one thing, but if it's more expensive per token and burns a lot more tokens, it ends up being a lot more expensive.


I suspect the outrageous pricing of haiku/sonnet is offsetting the cost of opus. The value proposition a year ago was they were cheaper than opus, not that they're a fantastic value (which they're not)

Haiku/Flash/small models are underpowered for literally anything where being non-false-positively correct on details matters at least like 25%. (That's not to say they are only correct 25% of the time, it's definitely more than that, but they're blatantly confidently wrong often enough that the wasted time is a significant net negative for me, even on relatively trivial tasks.)

Almost exactly the same story here. I've also had little to no refusals from DeepSeek, with it's Chinese values meaning substantially less friction when it comes to things like reverse engineering, finding copyrighted files, working with dubiously-sourced source code, et cetera. I don't think I'd go back to Copilot even if they dropped prices by 90%.

Are you purchasing directly from DeepSeek? Any concerns as far as privacy or data protection?

Using OpenRouter, going to migrate to DeepSeek's official API soon. I'm not using it for anything commercial or for private data so I have no privacy qualms.

Makes sense. Privacy is my only real hang up with DeepSeek. Both of the big SOTA providers have become extremely filtered. Things that I could do one version ago are now getting refusals. Anthropic is almost unusable. ChatGPT is slightly better. Even with a "cyber exception" in place and a vetted account. They are going to force me to take my business elsewhere.

GitHub Copilot refuses to do any security testing or proof-of-concepts for exploits. While I understand why, we pay for Enterprise and I’m working on our proprietary code base. It’s incredibly annoying.

I’ve actually had luck taking the analysis from GHCP and pasting it into our M365 Copilot and getting a useful poc to stick into my bug reports.


Yeah, seems like this is in the range of Qwen 3.6, Gemma 4, Nemotron 3 Super, and the like. There are lot of models, including much smaller cheaper ones (like Qwen 3.6 35B-A3B), that are similarly competitive with Haiku. I can run these on my laptop, I don't need to rent them from Microsoft.

I suppose if you're reeling at the new Copilot bill but want to stay in their ecosystem, this gives you something to use, but for most folks, there's a plethora of better options.


The $20/month ChatGPT plan that comes with codex is good value. Even just have premium ChatGPT is nice. I get rate limited regularly but it still lets me do most things.

The $100/month is excellent value. I don’t understand how’s that not the default option for all professional developers. Unless people don’t produce any value writing code, like playing around and experimenting with vibe coding, I understand. But if software development is your actual income, and assuming you live in a wealthy country, $100/month is nothing for a tool like Codex.

Work pays for my work stuff and I have both claude and codex there. On the personal side I sometimes go days without using it. It's more like my assistant to do annoying terminal shit on my home computer and like personal projects I guess. It's plenty for that.

Picked up the most recent SO developer survey that features relevant info, the 2024 release: https://survey.stackoverflow.co/2024/work#coding-outside-of-...

The supermajority of respondents did report that they do engage in some coding outside of working hours, for one reason or another. I'm impressed; I'm basically a zombie after hours, rarely in any shape to touch anything technical. Good for them.

But then only 19.3% of respondents ticked that they code for freelancing reasons, and only 15% said they're doing it in an attempt to bootstrap a business. These groups were the only types that suggested revenue generating after-hours activity, and they even overlap to a non-obvious-to-me extent. But even if we pretended they didn't, that adds up to like a third at best.

So when you say:

> I don’t understand how’s that not the default option for all professional developers.

that's in contradiction with this data (and imo common sense), which suggests that the supermajority of professional developers simply do not perform revenue generating software development activity outside of work hours, period. Therefore, for them, the ROI on any potential AI subscription is a flat and constant zero.

Unless you envision people working at "bring your own license" type shops, I don't know how this is supposed to make sense. These are work tools, corporate should be providing them already. But then I'm clearly not from a "wealthy" country either, so YMMV.


Every developer who writes code for a living should get an AI subscription from work and not have to pay for it himself.

The small stuff has their place. I have this safari extension and needed a way to quickly title people's chat histories. Haiku is the fast cheap thing to come up with decent titles of blocks of text. I feel like there's a bunch of those little things lying around you need a model for. I'm even finding Apple's Foundation Model is super useful for stuff like that. Even summarizing an article. It's like equally awful at doing it, but gets enough done to still be useful as a way to be like "oh yeah, this article is actually worth reading"

Small models are super useful. But I'm skeptical of their use for coding in particular, which is what this model is advertised for.

Haiku does quite well if given a detailed plan. That means much more detail than you otherwise would, but you can still save over e.g. having Opus or Sonnet do everything by having them expand their initial plans into more specific levels of detail and feed it to Haiku (or similar level models).

I personally wouldn't use models that class directly, though - I'd use them in a harness as a "backend" for more capable models. And Haiku itself, as opposed to other smaller models, is still expensive.


Makes sense as part of a larger coding workflow, especially if it’s fast. Using a trillion parameter model to figure out how to call a targeted edit tool or generate a commit message is a waste. Also narrow tasks like “make the background darker” or “rename this function and update callers”

> “rename this function and update callers”

I'm old enough to remember when IDEs could do this without needing a couple gigabytes of matrices to do it

(LLMs are great for anything even slightly more complicated ofc)


Won’t (presumably) all the market actors converge on similar pricing? If OpenAI stopped operating on subsidies and charge the true costs and their most token hungry customers are the ones that switch to Anthropic and others, then their pricing model switch will also be around the corner.

Unless of course we’re thinking Copilot will be more expensive than others longer term. But is that a reasonable assumption?


Anthropic & co charge API users much more, not least to demolish the middlemen low-effort plays like Cursor and Copilot. To not own the model is not viable in 2026.

Sorry, what do you mean by "To not own the model is not viable in 2026."

I assume I'm misunderstanding you (likely my fault), because the way I read that is that you're saying nobody should currently be using models owned & hosted by companies like OpenAI and Antheopic, while clearly a huge number of people are using those in 2026 despite not owning them.


It's that companies like copilot/cursor are in real trouble if they are in the business of reselling expensive Anthropic tokens

I think it’s more correct to say they charge subscription users much much less. I assume less even than the cost of providing the inference, if you actually are using it.

What application/UI are you using deep seek flash high on? Still copilot or something else

I've been having really good results with DeepSeek-v4-flash, qwen-3.6-moe, and the older gimini-3-flash-preview. (recent geminis suck hard)

Small models are more than enough for the majority of tasks these days. Plan and review with the bigger ones, let the little ones explore and implement.

OpenCode Go is $10/month for the open weight models with nice quotas: https://opencode.ai/go


You don’t have to limit yourself to the tiny models with the OpenCode Go plan, you can get a lot of usage from the bigger models if you keep the cache hot.

I am about 85% through my quota with 9 days left before refresh and have just used over 1B tokens, mostly DeepSeek V4 Pro, but also a little mimo 2.5 pro and kimi k2.6


For sure, I've been flipping between flash/pro (or the equivalent for other families), been trying to stick to one family per project as a way to test them out independently over longer periods and more realistic/diverse tasks. I've definitely spent more quota on pro and pushed more tokens through flash.

> "GitHub Copilot had competitive pricing until yesterday when they changed from per-request to one of the most expensive per-token quotas. Seriously, take a look at their burning subreddit for some laughs"

AI is expensive and it has been heavily subsidized. I you think $20/mo for Codex/Claude flat vs a more usage based model you're in for a shock. Especially once these companies go public and have to meet investor expectations.


I really hope one day there is something like Opus 4.8 but with Cerebras' speed -- they reach over 1,000t/s on gpt-oss-120b but that model is seemingly not even properly trained for tool calling. But watching it slam out several entire screens of thinking/reasoning per second is amazing. I'd love that with Opus quality.

I like gpt oss - great model even if not too smart.. runs on my laptop at over 100ts has a certain tone that I like over all these qwens stuck up their asses.

I wonder when THEY make it illegal to vote with your wallet.

They know Google has a ton of data to train LLMs on.

Recently I have been asking YouTube's new AI about some videos ("when is Steam metrics mentioned in the video?" for example), which means they also index videos. This is an unthinkable amount of data.

I'm actually impressed at how bad Alphabet is with LLMs since they invented the thing as we know AND have all the data to train on, yet OpenAI and Anthropic are eating their pie.


Kodak problem. Kodak invented the digital camera but their revenue came from making photographic film. They were unable to take advantage of their invention because it would cannibalise their revenue. That didn't stop other people and the revenue died anyway.

Google's main revenue is ads based on search. LLMs are a competitor to search. Creating better LLMs will cut into search volumes.

In any large organisation this is extraordinarily difficult to manage - they have to incentivise the new tech that is actively harming the current revenues, while maintaining as much of the old revenues as possible, without creating internal conflict between these two parts of the organisation that will kill it.

Though in fairness to Google they do seem to realise this and are trying to adapt - they're letting the LLM folks mess with search. It'll be interesting to see how this goes.


This is a sensible-seeming take at first blush, but it doesn't hold up to any scrutiny (or maybe my scrutiny is faulty - you tell me!)

Sundar and many of his executives have certainly read or heard of The Innovator's Dilemma, and I expect they're all moderately paranoid that it will be their downfall.

Also, that's not it. Google has a great ai app called Gemini where they have at various points hosted the top ai image generation model (certainly for speed, and for a while for accuracy) and have innovated with features like deep research

They are monetizing their ai conversations more effectively than OpenAI could dream of via ads and chat in Google search.

They are heavily investing in compute and talent.

When they've added llm results to Google search it has _increased_ engagement and re-engagement.

What part of the competition are they blissfully ignoring?

(I have counter arguments to some of these points, but I would rather hear other people's)


I heard Google search volumes by humans were declining, but I can't find the reference now so may be wrong. It's definitely changing the entire SEO industry.

Are they actually implementing ads in chat yet? I haven't seen an ad in Gemini yet.

Again, the results I've seen is that LLM results in search have resulted in more zero-click searches (as a proportion of all searches), which isn't increasing engagement? But again, I may be wrong, what are you basing your assertion on?

I didn't say they were blissfully ignoring anything. I gave them credit for knowing the situation they're in and doing something about it.

The problem that I was talking about (probably badly getting my point across) is that it's internal conflict and strife that causes the pain here. One part of the company is incentivised on increasing revenue on the existing business. The other part of the company is incentivised on increasing revenue for the new business. But the new business is at the expense of the old business, so it sets up internal conflict where each part of the business tries to protect its own incentives. And Google has always been afflicted with rife internal politics.


Google search related ad revenue is still going up. Volume isn't everything. Personally, as llms have gotten better I do more and more product research on Google.

Even if they include ads in Gemini the issue is that Gemini is not the best AI app. It’s maybe the 3rd or 4th. So if Google becomes the 3rd best “search/AI engine” the future is not that bright.

Gemini is a brand new surface that Google has created to capture the current excitement around AI, but it's not the only surface into which they're shoving AI.^

ChatGPT's growth is incredible, but they essentially have to get all of their growth from inside codex or ChatGPT apps. Google can auto query Gemini with every search. There's an interesting piece of data which is that this tells them (on a conditioned basis^^) when is the chat result more effective than the search and vice versa?

Google can force growth of Gemini by leveraging their existing properties. This is a huge asset, and if you're wondering why Meta has artificially high usage of their LLMs it's because distribution is hard and Meta and Google have a lot of surface area to distribute on

^ no, I don't mean this as a compliment, although it does lend credence to the idea that Google is willing to update its current products with AI.

^^ ie conditional on the user's willingness to view the chat result


Search's AI mode is by far the most popular "AI app". Most of the time I don't end up using Gemini, it's because search's AI mode is good enough for my needs and I use it out of habit. I imagine a lot of other would-be Gemini users are in similar shoes.

The thing about Innovator's Dilemma is that even if you know about it you mostly cannot escape your own company culture and norms.

If there is a "crack" there you might be able to get out of it, or it will let a disruptive idea to grow, but my way of thinking about innovator's dilemma is that is it a "culture bias": knowing about it give you some small advantage but it needs a real change to maybe have a chance to escape/act on it and the most important part is that under pressure it will quickly and imperceptible run the entire process or decision making.


Google ran a code red for a couple months iirc

But I also disagree with your reading of the innovators dilemma. You're being far too absolute


I agree that all of today's CEOs have learned from history and are paranoid about disruption, and I agree that Google is pivoting effectively and will even thrive in the AI era, given their technical and distribution advantages... but I think their revenues and profits and dominance will be much lower than what they are today: https://news.ycombinator.com/item?id=47957708

This take overlooks most of the work that Google has been doing in the past decade.

Have you seen their Cloud business?

Moreover, Google has continued to drive search growth since ChatGPT arrived and is executing competently. Their models are good (not great), but they have enough compute and one of the best ML-focused chips such that they aren't beholden to Nvidia (instead, they're beholden to fabs: tsmc - this is a much better dependency since Nvidia is hell bent on extracting as much value as they can from their position in the stack and it would be against the nature of tsmc to behave similarly)

Will Google's ad revenue decrease? Advertising is an incredible business because it is anti fragile.^ Even if search revenues decrease from their current highs (I would bet heavily against this), they still have YouTube with shorts and a robust display ads business that is going to improve if AI supercharges the economy (more companies - # startups founded in Jan 2026 is much higher than # founded the previous January, more products, advertising and distribution become the differentiators for these products)

If you're wondering how anthropic is going to continue to grow its base, the answer is advertising. In fact, Google is situated to fundamentally support everything that anthropic needs. Who cares if they make worse margins than anthropic? They'll benefit from the entire ride up, and they'll do the same for the next startup of that scale.

^ https://stratechery.com/2024/metas-ai-abundance/


Right, but my theory is that the ad business, still the biggest chunk (75%+) of their revenue, is extremely hyper-optimized for the current search journey-based UX and enjoys a monopoly + auction rigging premium (e.g. Project Bernanke) primarily through tremendous ad volume. That is, their current growth is largely based on stuffing more and more ads into commercial-intent SERPs.

However the agent-based conversational future simply does not support that level of valuable [1] ad volume, which collapses Google's carefully optimized tech+business stack.

Like I said, they will still thrive, but more because of GCP (which might see the biggest growth due to AI and the other tech + infrastructural advantages you mentioned) and the other businesses (YouTube, Waymo, etc.) However, their current cash cow is being disrupted, primarily by themselves, and I don't yet see how they can monetize agents nearly as lucratively as they've monetized search.

[1.] Sure, they could keep stuffing ads into each turn of the conversation, but 1) as I theorized in the linked post, those would be meaningless and low-value, and 2) that could just push users to competitors like ChatGPT, Anthropic or Perplexity that can offer a cleaner UX because they're starting from a clean slate and don't (yet) have the same revenue expectations to meet.


> I think their revenues and profits and dominance will be much lower than what they are today

While your linked post was explicitly about ads, this line wasn't and is just baseless

If you think that (the quoted line) you should short Google (haha you will lose all your money even if you're right, market...irrational...solvent)

Search is going to be fine. It's about half their revenue. Display is another large chunk and feels largely immune to the affects of AI, and Google is not a static entity. And their Cloud business is seeing unbelievable growth.


> While your linked post was explicitly about ads, this line wasn't and is just baseless

I mean, the ad business is still by far the largest chunk of their whole business, which is why I think their overall revenues, profits and dominance will decrease ¯\_(ツ)_/¯

Display is another good example of why. Like half (?) of it is YouTube, which will be fine, but the rest comes from non-first party properties, which are already seeing drastic drops in traffic. Just as with the shrinking of SERP ads, there's no way to stuff enough ads in chatbot conversations or to compensate for the display ad revenue dropping either.

I'm long Google (monopolies are usually good bets, bullish on AI + GCP, plus it is disrupting itself before others can so has good long-term prospects) but I can't see how that will compensate for its main cash cow today being cannibalized, so I fully expect future returns to be less than historic ones.


Google doesn't seem to "get" agentic autonomy. Their models are trained to solve short problems really well, but they get confused over long time horizon tasks and kinda suck at tool calling to boot.

> What part of the competition are they blissfully ignoring?

coding models? their own devs use claude code.


> Kodak invented the digital camera but their revenue came from making photographic film. They were unable to take advantage of their invention because it would cannibalise their revenue.

a bit more nuanced take on the failure would also account for executives backgrounds at the critical period:

- in 1981 Vince Barabba — Kodak's Head of Market Intelligence — conducted an extensive internal study that explicitly concluded digital photography could replace film and that Kodak had approximately 10 years to prepare for the transition.

- Kodak's leadership in 1980–1993 saw the company through the lens of its founding identity — silver-halide chemitry, precision coating and manufacturing, and the extraordinarily high margins of the film-plus-processing business. This identity-driven decade was spent on failed diversification and defending film instead of building an electronics cost structure and a defensible high-margin position. They steered capital and attention toward businesses that fit that self-image (specialty chemicals, pharmaceuticals, hybrid film products) rather than toward digital cameras, which meant fighting Sony and Canon on low-margin electronics turf where Kodak felt no competence and feared cannibalizing film.

- It was an inside executive culture, crystallized in the 1990 choice of film-lifer Kay Whitmore over the digital-minded Phil Samper. When Chandler retired, the finalists were Whitmore and vice-chairman Phil Samper, who had a deep appreciation for digital technology. The board chose Whitmore, and was explicit about why: as the New York Times reported, Whitmore said he would keep Kodak closer to its core businesses in film and photographic chemicals. Samper resigned and went on to become president of Sun Microsystems and then CEO of Cray Research — i.e., to lead exactly the kind of digital/computing companies Kodak was avoiding becoming.

- so when Kodak did get serious to compete in digital (in 1993 board made Fisher the CEO, he came from running Motorola and held an engineering degree plus a doctorate in applied mathematics) it did so as one commodity hardware maker among many and that was too late since film began to drop as digital started to pick up, exactly as Vince Barabba predicted in 1981


LLMs still need a search API, and use it a lot.

Google is well positioned to earn from this service, especially if they can prove that their search service is superior to competitors. While they lose some of their moat, they are well positioned to dominate the market, just like they did in the consumer space.


That’s not what claude code does… and that’s exactly the dilemma for Google.

Claude Code is not the majority of AI usage.

People asking any AI chat interface for ideas for their honeymoon will trigger some kind of search. SEO is still relevant and Google might still be able to sell top spots in their search so LLMs will pick it up.


exactly. Claude is in a niche. It's a high-value niche right now, but a niche nonetheless. Normies don't use claude much based on the numbers I saw. Search is still highly relevant and Google seems well positioned to capitalize on it.

so the argument here is that its too niche for google to care ? i dont belive that they made explicit decision to make a lame version of claude code that their own devs dont use.

Yeah but LLM's don't offer you an advert.

"You tried to find a recipe for cupcakes, well all I can offer you is an advert on kitchen appliances"


> "LLM's don't offer you an advert."

Some already do, and some of the ones that don't will in the future.

See for example https://help.openai.com/en/articles/20001047-ads-in-chatgpt

Of course that's not to say that the advertising situation will be identical to that of pre-LLM search engines, and the differences may lead to radically different economic models and user experiences. But I was just correcting your statement.


These people know about the innovator's dilemma. Their problem is incompetent product and people management, same as it has always been. Talk to anybody working on Gemini, and it's obvious that they're wasting a tremendous amount of effort and talent.

I use anthropic's models daily, and sometimes switch to Gemini. Google is losing the marketing front BADLY, but their AI service is surprisingly great. It's far cheaper than anthropic for one. and for my kind of research it's just better.

I'm quite certain that Google's AI services are likely the most used in the world right now by virtue of having the widest distribution. It's in the search box. It's on your Android phone. Just because they aren't the preferred coding or research agent does not mean they are losing - that's a pretty small slice.

Yeah this seems true. Claude Code are famously dubbed as best AI coding agent, but google doesn't care about that niche I guess. Somehow, I still rely on google search as they have diversified it.

If you ask questions, it will enable "AI overview" , but if we search about particular object/platform like "Google stock" or "bbc news", it will give the old classic search experience and we woulnd't need to swallow "AI overview" pill in that case.


I tried using Gemini CLI to sort some code issues for me, ran out of tokens mid-way through, even though I have Gemini Pro.

Turns out licensing is separate for "code" and "pro"...


Same happened to me. That was the death knell for Gemini as a coding agent to me. I even paid for a whole year...

I highly suspect they opaquely lowered usage limits on me.


It can be everywhere, but that doesn't mean users are paying or even value it.

See also: Windows / Notepad / M365 / GitHub / Paint / Xbox / Azure / Solitaire / D365 / Security Copilot.

who cares about marketing when you have distribution? Probably a smart move to pump dollars into the product and not the marketing.

in high margin businesses, customer acquisition is everything.

If your product becomes commoditized, it’s no longer a high margin business

You can have a high accounting margin and a product with price equal to economic marginal cost—externalities, cost of capital, barriers to entry… DRAM is a commodity but has (currently) a high margin.

> If your product becomes commoditized

Depends on the product - whether protein bars, salty chips, cellular service, or IPhone or something else. If your product has a flavor, it’s never going to get commoditized. Coke still tastes better than Pepsi.


This is the power of a brand. Kirkland and some private label products are literally the same as the competitor products and yet are perceived differently. Even in your Pepsi vs Coke example, Pepsi routinely wins in blind taste tests but there are more "Coke people".

It will be interesting to see if the LLM companies can establish their own "brand" and how they will do that. LLM voice is a thing but not sure if it's a good thing people will use to hang their self identity on. Distillation of models and constant training also make this complicated. Claude code is winning on harness and ux right now but it seems precarious and also easy to commoditize. I think elon tried to add branding to his chatbot pretty intelligently by being iconically crude/evil/"anti woke" since it's both highly visible and less likely to be copied.

We live in fascinating times!


And Google acquired you in 1998 with search.

flash 3.5 is the best price/performance model for what i'm doing. I had been using opus for everything but as we started running many agents at once, and then eventually agent managing sub agents frontier is not an option.

we started model testing the cost/performance of our skills and agents and flash 3.5 wins in most things.

As people develop harnesses for their codebase i think the intelligence required comes down a lot.


I have not tried the Gemini CLI in a few months but when I did it was a shit show.

Google makes it very hard to use their shit and it was full of bugs.

Anthropic's current run is based entirely around Claude Code in this space and the last time I used the gemeini-cli it wouldnt give me access to the latest models and I was paying them for the privilege


Google trashed the Gemini CLI client and replaced it with agy (antigravity), which is written in go and is much nicer.

Interesting you say that. Every user I speak to says antigravity cli is missing lots of features and Gemini cli was working quite well. Same for me.

It's not as feature rich, but has also not crashed once for me, unlike gemini cli, which was a flickery, unstable mess.

So they did.

https://github.com/google-gemini/gemini-cli/discussions/2727...

I get the complaints in that thread but I still think it is hilarious. That repo is a gong show to random shit and perhaps one of the best worst examples of "opensource" LLM development.


It will also just sit there "thinking" for ages, if whatever you are doing requires an input (like sudo)

Sometimes you have to tab across and give it a PW, but it seemingly is incapable of parsing that, and just asking.

Kiro, what we use at work, on the other hand will just prompt you. (And doesn't like taking credentials directly)


Is it? My mom and all her friends use "the intelligence". What is it? Gemini, because it's on their android phone.

Apple played a blinder by calling it "Apple Intelligence".

Well done lads


We use Kiro (AWS) and Gemini (Google) at work.

Kiro is of course really good to back into AWS stuff, it knows more about AWS than Amazon themselves!

Gemini is really good at understanding my inane ramble and mis-spelling


I think Google is a bit sandbagging here knowing they have all the data and likely better models hiding. My theory is it's a bit of not disrupting the stock market direction by exposing whose really the boss. If they can do it cheaper, faster, and better, people start asking questions, especially with upcoming IPO's.

This makes no sense. Google is beholden to its own shareholders, not the markets at large.

In any case, it's well known that devs in Google have liked anthropic/openai models for coding more than gemini, so unless they're hiding their best models from the people within, I think it's just the case that they're behind.


It's more that they know they can eventually clone any successes the other companies have and steal their market share. Their really is no moat. In a more normal environment they would be buyout candidates but that's a bit too far gone at this point, so you just let them run until they are out of gas and Google can benefit from any advances without upfronting the cost.

Even with anthropics record breaking revenue growth I don't see how the pure AI companies can sustain, but the catch-22 is that any obvious pivot proves that. This puts the more traditional tech companies in position to ride the back of the wave until the growth curve tops.


> they know they can eventually clone any successes the other companies have

Google has gone all in on AI. To the point of challenging their own core product. Apple is waiting and seeing. Google is building and distributing, albeit with terrible marketing.


Apple isn’t waiting and seeing on the hardware side, only implementing AI on the software side, which there doesn’t seem to be much of a demand for them to do. Apple are well set for on-device LLMs and agents with their Mx Max cpu/gpu, and their wait on the rest is saving them hundreds of billions by not burning all their profitability to the ground building Nvidia-filled datacenters the same as everyone else, which is why Google is now having to hunt for extra money by raising capital like this.

search is not their core product though, it's ads. they ain't challenging anything.

Ads are meaningless without a surface to show them on. Search is absolutely a core product.

Coding is a pretty small slice of the markets in play. Google's models are driving cars right now. Using coding agents doesn't give much insight into performance in the broader world; I would assume assume Google is performing better in general even if Claude or Codex is currently outperforming for coding.

> Coding is a pretty small slice of the markets in play.

I don't think that's true, mostly in that a lot of usecases are solved via coding models + a harness.

> Google's models are driving cars right now.

Yes + other models like alphafold. But those are (relatively) specialized models. Besides, the comment I was responding to was saying Google is sandbagging the market to keep it calm or something. I don't disagree that Google is doing well overall and has some clear advantages


Google also owns 15% of anthropic.

Pedantic correction that doesn't change anything other than accuracy: it was reported over a year ago to be closer to 14% than 15%.

https://www.theverge.com/news/627849/auto-draft

But I believe since then Anthropic have raised more money, almost certainly diluting Google's stake (I could be wrong and misremembering that Google didn't partake in the additional fundraising). I have in the back of my head that Google is down to something like 10% now, but don't have time to go and find details to fact check that, sorry!


It's important to remember that the cloud division, rapidly becoming Google's golden goose, does not give one fuck about Gemini and would happily sell out all of Gemini's compute to Anthropic and OAI if given the opportunity.

> yet OpenAI and Anthropic are eating their pie

I'm actually impressed by how much the Hackernews crowd is sleeping on Google & Gemini. Yes, it's lagging behind in coding, but it's consistently much better and more reliable at literally everything else.

Also there was a period of time when Gemini was the best model out there...


It's pretty hard for the nerd crowd to believe that only 4% of traffic to GPT is coding related.

I don't think they 'index' videos, per se. They just point the model at the video's transcript on demand when you ask a question, I believe. Doesn't change any of your conclusions, though. You're absolutely right, they have an absolute ton of data.

> I don't think they 'index' videos, per se.

I'm pretty sure they do. They already index metadata (you can see it in the web search results) so indexing the transcript is relatively easy.


I'm guessing one goal of the semi recent AI translated subtitles on every video, is now every video has a transcription.

It's actually incredibly useful if you just want to summarize a video, or my use case, want a text tutorial of something that's a video.


Their transcribing and summarisation of Google Meetings is pretty good.

I have a boss who loves to rattle on for ages, and it gives a breakdown of what on earth he was on about


I wouldn't be surprised if Google's logs alone are a substantial portion of all data created daily...

Some of the stuff that turns up on Googlebot, you really have to think "where on earth did you find that? Absolutely nobody, nowhere had a hyperlink to that"

Do they even do logging in the traditional sense? Surely they have some bespoke googly solution.

I've also asked the youtube ai about when some things are mentioned in videos, and upon verification the ai is just hallucinating.

I think Google is doing the right thing. Using LLMs for coding is the shiny low hanging fruit but it isn't what is going to make the tech ubiquitous. That'll be finding applications of it to real data problems.

Google knows LLMs are the new UI, not the new IDE.


> I'm actually impressed at how bad Alphabet is with LLMs

Not my impression. Lately I think Gemini is superior to ChatGPT and Claude in coding (I'm mostly using it with scientific stuff in Python).


My guess: The company culture means that the best people went to other companies.

Google has been diabolical with forming teams to develop a product, then disbanding the team, and then moonlighting the product right after deployment.

cries in Google Glass

Wild that Meta has that product now decades later, which isn't even half of what Google offered.


> They know Google has a ton of data to train LLMs on.

And they have a massive amounts of TPUs. And yet... their models are way behind.


Google doesn't suck at LLMs, they suck at customer service. There was a period where Gemini Pro was the best LLM out there, before they gutted it with quantization. It's like they didn't realize that "provide a great product, get people hooked then cut the quality" doesn't work when switching costs are so low. As with GCP, putting the wants of SREs over the wants of customers is not how you gain lots of customers.

Are you sure it’s not using transcripts? That would be equally useful but technologically less impressive.

Turning all of those annoyingly verbose and long YouTube videos into text that could be searched, summarized and referenced easily would be amazing

How good is YT's data though? Have you seen their Auto Caption? It's utterly incapable of understanding speech.

Auto Dubbing on the other hand is incredible, translating Russian/Ukranian speech with different voices and accents for each speaker, during a fire fight is wild.


> Have you seen their Auto Caption? It's utterly incapable of understanding speech.

How recently have you looked? I think nowadays it's quite good.


That voice though is atrocious.

Not only that, but the same webmasters who try to shoo AI crawlers away actively court Google's bots.

Really? Every business owner I know outside of HN wants to be discoverable by LLMs.

Being discoverable is one thing, having your content stolen wholesale is another

Most of the economy is not journalists or people who sell "content" online. In most cases I can think of - retailer, restaurant, hotel, plumber, any local small business, they want their content ingested. That means the AI chatbot knows about them and they can be in answers potentially.

And having your content rendered inaccessible to humans by a DDoS attack from overly aggressive webcrawlers that ignore robots.txt is yet another.

> I'm actually impressed at how bad Alphabet is with LLMs ...

I'm still on Anthropic models to code but I'm on Gemini 3.5 Flash for everything else. How can you say Google is bad at LLM when their little flash model is literally SOTA on many benchmarks?

> ... yet OpenAI and Anthropic are eating their pie.

They're eating nobody's pie: it's a new pie. Google is a $4.5 trillion company, the 2nd biggest in the world as I type this.

Seen that fact and seen how good Gemini 3.5 Flash is, I'm not really sure Google is "bad at LLMs".


Alphabet still cant fix search in Android Play store, so it works

You are assuming that Play store search is even broken from their perspective. I bet all their internal signals on it are positive, as in they make money on the fraud and scams, and crack down occasionally just enough to retain user trust.

pretty sure its only for videos with cc enabled.

Youtube has had AI generated transcripts with autotranslation for the subtitles for years, not to mention the forced AI dubbing on mobile phones.

Doing a little bit of RAG on the transcript hardly sounds impressive.


Yet, Gemini can't even get YouTube URLs right half of the times.

Everyone mocked them for paying for YouTube for years with no real income. Now it’s the most valuable data source in the world.

The XML extension I use in VSCode is by Red Hat.

Oh dear. Here we go again.


I would buy a Neo to escape Windows but it's not like macOS is a pleasant experience either.

My next mobile workstation will probably be an arm laptop with Linux for great battery life.


> "Use as-is until e-waste" the Neo is not

That's a very low bar to clear


Yes, which makes the comment claiming the neo doesn't clear it all the more egregious.

I have been rocking _smaller_ tablet PCs with better reparability score than the Neo in iFixit since practically the 2010s. My current one is a 10/10 from HP. This to say nothing about upgradability.

The Neo doesn't clear the bar. It just barely improves over recent macbooks, which is next to nothing. Specially to someone comparing to the Framework!


[flagged]


> You are to be congratulated on the sheer looming height of your standards. The angels cry out to you from the heavens. Sheesh. - akkartik

Snarky ad-hominem comments are forbidden in HN, FYI.

https://news.ycombinator.com/newsguidelines.html


Edit: thanks for toning it down. I will as well.

From the comment up above:

> I would imagine the Mac Neo is a sealed unit that you use as-is until it's e-waste.

So the "bar" is irrelevant to this conversation.


DeepSeek V4 Flash - High (answered in 3.4s, $0.00 cost)

    Thought: Let me think about this question. The days of the week are:
    Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday.
    Now I need to count how many have the letter "d" in them:
    - Monday: has 'd' 
    - Tuesday: has 'd' 
    - Wednesday: has 'd' 
    - Thursday: has 'd'  (Thursday)
    - Friday: has 'd' 
    - Saturday: has 'd' 
    - Sunday: has 'd' 
    Wait, let me be more careful:
    - M o n d a y - has 'd' 
    - T u e s d a y - has 'd' 
    - W e d n e s d a y - has 'd' 
    - T h u r s d a y - has 'd' 
    - F r i d a y - has 'd' 
    - S a t u r d a y - has 'd' 
    - S u n d a y - has 'd' 
    All 7 days have the letter 'd' in them.
    7
https://i.imgur.com/iWSaDxM.png

If Opus 4.8 is just slightly better than 4.7 then it maybe ties with GPT 5.4, maybe. And it gets completely outclassed by GPT 5.5 for my workload.

With Anthropic expensive pricing, there's no reason for me to switch from GPT+DeepSeek.

And I bet Mythos is GPT 5.5 tier but too expensive to distribute so they create this security FUD theater.


Well if they have a big challenge ahead since DeepSeek offers an open model at Sonnet+ level while being cheaper than Haiku, plus 1 million context size.

Yeah, I never use any of OpenAI or Anthropic's models other than whatever is the current highest-end one. For everything else, it makes more sense to use other providers.

On this note, is there a benchmark aggregator to compile all benchmarks in a single large grid?


One difference is that MiMo 2.5 (non-Pro) has image, audio and video input capabilities.

DeepSeek does not understand image, audio or video.


I've heard non-Pro isn't nearly as good for coding as Pro?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: