It's a race to the bottom. DeepSeek beats all others (single-shot), and it is ~5...

strangescript · 2026-03-27T13:47:42 1774619262

I will "suffer" through .004 of electricity if I can run it on my own computer

sourcecodeplz · 2026-03-27T04:04:02 1774584242

I've tested many open models, Deepseek 3.2 is the only SOTA similar.

yogthos · 2026-03-27T02:01:58 1774576918

You could use this approach with DeepSeek as well. The innovation here is that you can generate a bunch of solutions, use a small model to pick promising candidates and then test them. Then you feed errors back to the generator model and iterate. In a way, it's sort of like a genetic algorithm that converges on a solution.

hu3 · 2026-03-27T03:15:47 1774581347

Indeed but:

1) That is relatively very slow.

2) Can also be done, simpler even, with SoTA models over API.

yogthos · 2026-03-27T03:35:39 1774582539

Right, this works with any models. To me, the most interesting part is that you can use a smaller model that you could run locally to get results comparable to SoTA models. Ultimately, I'd far prefer running local, even if slower, for the simple reason of having sovereignty over my data.

Being reliant on a service means you have to share whatever you're working on with the service, and the service provider decides what you can do, and make changes to their terms of service on a whim.

If locally running models can get to the point where they can be used as a daily driver, that solves the problem.

eru · 2026-03-27T08:34:31 1774600471

Why do you need a small model to pick promising candidates? Why not a bigger one?

(And ideally you'd probably test first, or at least try to feed compiler errors back etc?)

Overall, I mostly agree.

yogthos · 2026-03-27T13:20:13 1774617613

mostly an issue of speed and resource usage, if the model is too big then simply running the tests will be cheaper

mikestorrent · 2026-03-27T01:15:50 1774574150

> cheaper than the cost of local electricity only.

Can you explain what that means?

simonw · 2026-03-27T01:30:40 1774575040

I think they mean that the DeepSeek API charges are less than it would cost for the electricity to run a local model.

Local model enthusiasts often assume that running locally is more energy efficient than running in a data center, but fail to take the economies of scale into account.

BoredomIsFun · 2026-03-27T09:34:24 1774604064

> Local model enthusiasts often assume that running locally is more energy efficient than running in a data center,

It is a well known 101 truism in /r/Localllama that local is rarely cheaper, unless run batched - then it is massively, 10x cheaper indeed.

> I think they mean that the DeepSeek API charges are less than it would cost for the electricity to run a local model.

Because it is hosted in China, where energy is cheap. In ex-USSR where I live it is inexpensive too, and keeping in mind that whole winter I had to use small space heater, due to inadequacy of my central heating, using local came out as 100% free.

pbhjpbhj · 2026-03-27T13:58:09 1774619889

Is it economies of scale, or is it unpaid externalities?

jacquesm · 2026-03-27T03:50:27 1774583427

Some of those local model enthusiasts can actually afford solar panels.

jLaForest · 2026-03-27T04:18:00 1774585080

You are still incurring a cost if you use the electricity instead of selling it back to the grid

Kodiack · 2026-03-27T04:26:11 1774585571

The extent of that heavily depends on where you are. Where I live in NZ, the grid export rates are very low while the import rates are very high.

Our peak import rate is 3x higher than our solar export rate. In other words, we’d need to sell 3 kWh hours of energy to offset the cost of using 1 kWh at peak.

We’re currently in the process of accepting a quote for home batteries. The rates here highly incentivise maximising self-use.

jacquesm · 2026-03-27T12:43:03 1774615383

Selling it back to the grid is something that is still possible but much, much less of a financially sound proposition than it was a few years ago because of regulatory capture by the utilities. In some places it is so bad that you get penalized for excess power. Local consumption is the fastest way to capitalize on this, more so if you can make money with that excess power.

dmichulke · 2026-03-27T04:59:21 1774587561

Luxembourg: Purchase price = 2 x sales price, mostly due to grid costs.

And this is with no income tax or VAT on sold electricity.

croes · 2026-03-27T05:28:57 1774589337

Local enthusiasts don’t have to fear account banning.

littlestymaar · 2026-03-27T03:15:45 1774581345

I guess it mostly comes from using the model with batch-size = 1 locally, vs high batch size in a DC, since GPU consumption don't grow that much with batch size.

Note that while a local chatbot user will mostly be using batch-size = 1, it's not going to be true if they are running an agentic framework, so the gap is going to narrow or even reverse.

eru · 2026-03-27T08:35:24 1774600524

Well, different parts of the world also have different electricity prices.

littlestymaar · 2026-03-27T13:15:21 1774617321

Usually not multiple orders of magnitude difference though.

atoav · 2026-03-27T02:18:10 1774577890

It means that the electricity you would have to pay if you did the computations yourself would be more expensive than paying them to do it. Part of thst has to do with the fact that China has cheap electricity, also due to their massive push into renewables. Part of that is just economies of scale. A big server farm can run more efficiently than your PC on average.

AuthAuth · 2026-03-27T03:53:16 1774583596

cheap electric due to their massive push on non renewables. There has been no change in the price of electricity during the renewable shift.

jojobas · 2026-03-27T01:32:15 1774575135

China has cheap electricity.

ericd · 2026-03-27T01:40:55 1774575655

Well, also, LLM servers get much more efficient with request queue depth >1 - tokens per second per gpu are massively higher with 100 concurrents than 1 on eg vllm.

DeathArrow · 2026-03-27T08:42:13 1774600933

Yes, but the hardware they use for inference like Huawei Ascend 910C is less efficient than Nvidia H100 used in US due to the difference in the process node.