The reasoning blocks are only temporarily part of the context. They aren't part of the context in the next turn anymore so (2) wouldn't really be an issue.
One future project idea suggestion. Can we combine these characters to create new ones just like Gboard allows us to intelligently combine emojis to create new complex emojis.
We accept human errors, limitations, and failures. We can empathize with team of humans doing the best they can, and we know any failure is a chance for them to learn and grow.
The sales pitch of AI is that it’s better than humans and has no real limits; it will make us all obsolete. This framing they created means I expect it not to make errors, not to have limits, and not to fail. I expect it to be able to learn and adapt at the speed of light and solve complex problems beyond what a PhD could do. This is what we’ve been told with the narratives around future jobs, AI performance on PhD level tests, how coding is a solved problem, and pictures painted of what a future with AI will look like. While we may know this isn’t true, this is what they are selling, and that’s the standard I’m going to hold them to.
I don’t blame the customer for being upset the snake oil didn’t live up to its promises, I blame the snake oil salesman. We have every right to be upset with the snake oil salesman and ridicule him when his product doesn’t work. Maybe we don’t need better more reliable snake oil, maybe we need real medicine. If real medicine don’t exist, its better to be honest than to mislead people and say it does.
This isn’t to say AI is completely useless, but it’s not what’s being sold. The downtime just proves that, unless they aren’t using their own product. If that’s the case, why not?
1. openrouter is API usage. There is obviously consumer side
2. people often use openrouter for the sole purpose of using a unified chat completions API
3. OpenAI invented chat completions; if you use openrouter for chat completions often you can just switch your endpoint URL to point to the OAI endpoint to avoid the openrouter surcharge!
4. Hence anyone with large enough volume will very likely not use openrouter for OpenAI; there is an active incentive to take the easy route of changing the endpoint URL to OAI’s
Schemas can get pretty complex (and LLMs might not be the best at counting). Also schemas are sometimes the first way to guard against the stochasticity of LLMs.
Post-training doesn't transfer over when a new base model arrives so anyone who adopted a task-specific LLM gets burned when a new generational advance comes out.
Obviously we're just dueling anecdotes here, but FWIW, I'm a US tech worker who bought a Tesla in 2022 and certainly never will again. I have four friends with Teslas in tech and all of them say the same thing: never again. Replacement cycles for cars are so long that this will take a while to fully show up in the data, but I don't see growth anywhere in their future, especially when BYD is eating their lunch in seemingly every non-US market.
Sure never again is totally fair and I am sure a lot of people hate it. I was mostly objecting to the radioactivity of it. Your friends will be more like “I am looking to sell my Tesla in 3 months” if it is truly radioactive.
Unfortunately, Tesla resale values have also plummeted, so even if people wanted to sell them desperately it may not be a financially sensible decision.
Personally, as a Tesla owner I'm concerned that if my car gets totalled I'll get pretty lowballed on the insurance settlement.
> Personally, as a Tesla owner I'm concerned that if my car gets totalled I'll get pretty lowballed on the insurance settlement.
The kinda obvious answer there is to use your insurance settlement to buy another highly-depreciated Tesla. Insurance settlements are intended to let you get a comparable replacement as determined by market value. (The alternative is that if your Tesla gets totalled, it's a get-out-of-jail-free card to get a non-Tesla.)
> And if for some ungodly reason you had to do it in Python
I literally invoke sglang and vllm in Python. You are supposed to (if not using them over-the-network) use the two fastest inference engines there is via Python.
1. make distillation much harder
2. safety: prevent modifications to the thinking leading to injection attacks.
3. also honestly sometimes the model raw thoughts can be deranged and is not a good user experience (consider the varied audience in the market, etc.)
also often the mass underestimate/the model makers over-estimate how people love distilling models
reply