More

BoorishBears · 2026-05-30T14:02:13 1780149733

This article got me messing with it, and I'm loving it as a post-training target.

Training on ~1B tokens on 8xB300 and the first checkpoint halfway in learned really well. Tencent might be struggling with agentic work, but the base knowledge is there.

BoorishBears · 2026-05-30T13:57:57 1780149477

I tested it against Gemma 4 31B and it's expectedly not favorable for world knowledge.

But even against E4B it's shaky, which is surprising given how many tokens they trained on. I guess it was on a lot of synthetic data.

BoorishBears · 2026-05-30T04:13:41 1780114421

Did you actually read the article past the hero image?

> Teikoku Databank has identified 52 Japanese companies using naphtha to make basic chemical products like ethylene, synthetic rubber, and PVC resin.

> The chemicals, petroleum, and coal products manufacturing sector is most vulnerable to naphtha price rises and shortages; of the 4,700 companies in this sector, 67.2% are integrated into the naphtha supply chain.

no-name-here · 2026-05-30T04:39:25 1780115965

From the official guidelines https://news.ycombinator.com/newsguidelines.html

> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".

May be good to edit your comment to remove the first sentence.

BoorishBears · 2026-05-30T06:54:14 1780124054

I won't.

ThePowerOfFuet · 2026-05-30T07:20:40 1780125640

Appropriate username is appropriate.

BoorishBears · 2026-05-29T10:12:57 1780049577

SVG is like asking an electrician to give you a circuit diagram by painting a watercolor

I'd try something like CircuiTikZ with instructions provided

BoorishBears · 2026-05-29T10:09:36 1780049376

So you can tell.

BoorishBears · 2026-05-28T20:43:01 1780000981

> Opus 4.7 and later

The source of truth should be the API docs which make it clear 4.8 didn't bring back extended thinking: https://platform.claude.com/docs/en/about-claude/models/over...

Any UI settings probably just map to changing the effort nudge on adaptive thinking

reed1234 · 2026-05-29T00:34:34 1780014874

https://platform.claude.com/docs/en/build-with-claude/effort...

BoorishBears · 2026-05-29T01:26:23 1780017983

Adaptive thinking supports effort, but it's a nudge instead of an actual token budget.

Why not use the pages that plainly state they don't support extended thinking: https://platform.claude.com/docs/en/build-with-claude/extend...

kakugawa · 2026-05-28T21:07:12 1780002432

Thank you for pointing this out.

BoorishBears · 2026-05-28T20:36:24 1780000584

All signs point to Opus 4.7 being smaller than 4.6, so I'm not sure all this holds.

You realize gpt-5.5 is also double the price of gpt-5.4, which itself was a price increase too, right?

Labs are divorcing pricing from inference costs.

BoorishBears · 2026-05-28T20:32:00 1780000320

Every model release you'll post this, and every time I'll be there to point out how it's completely useless (for reasons you've shared are intentional)

It does things like place the old Gemini 3 Flash above the more capable 3.5 Flash and Opus 4.5 - Opus 4.8 and gpt-5.5

At least, until hopefully one day HN has a rule about accounts that derive 99.9999% of their engagement with the site from shilling a personal project.

XCSme · 2026-05-28T20:54:31 1780001671

Also, what about the major flaw/bias linked for Gemini 3.5 flash? That has major real-life consequences if the model ends up being used for any automated scoring systems.

I found it while trying to use 3.5 Flash for scoring the reasoning of some models, and it gets it wrong because of the centering bias, whereas 3 Flash gets scoring right.

XCSme · 2026-05-28T20:49:47 1780001387

I'm happy you do comment, I did add more coding tests since then and add more improvements (price history per model, displaying cost to run at current pricing, improved scoring).

How is it useless to see that Opus 4.8 is 2x more expensive and 2x slower on some questions?

BoorishBears · 2026-05-28T01:36:11 1779932171

Oof, having a skillset so pedestrian that any incremental gain in efficiency needs to be kicked upwards must be tough.

BoorishBears · 2026-05-28T01:11:45 1779930705

Sorry, exactly what is a slippery slope?

You wrote a lot of words, but none of them describe a slippery slope, or explain how a supposed 10x increase in productivity precludes a 20% reduction in hours worked.