Hacker Newsnew | past | comments | ask | show | jobs | submit | mft_'s commentslogin

llama-bench is part of the llama-cpp package, but from recent experimentation, the settings it is able to (or is documented to?) accept lag behind somewhat. Not sure whether it would accept all of the esoteric settings in the article?

Your numbers are a little off. The NEJM article was published a few hours ago: https://www.nejm.org/doi/full/10.1056/NEJMoa2605555

RAS G12 population mOS: daraxonrasib 13.2 months / chemotherapy 6.6 months Overall population mOS: daraxonrasib 13.2 months / chemotherapy 6.7 months

RAS G12 population mPFS: daraxonrasib 7.3 months / chemotherapy 3.5 months Overall population PFS: daraxonrasib 7.2 months / chemotherapy 3.6 months

> Thus the treatment had provided them a median life extension of about 3/4 years. The lucky ones probably have got more than an extra year.

The 'median' patient in this trial lived ~6.6m longer if they received daraxonrasib. It's worth noting that the performance of the chemotherapy arms was stronger in this trial than previous trials of second-line chemotherapy; whether this reflects better care or a prognostically-superior trial population remains to be seen.


My numbers were not at all off.

I have quoted exactly the numbers that were written in the abstract of the article yesterday, at the same link that you have used.

What is very weird is that the abstract of the article has been changed, and now it is as you say.

So the numbers from today are worse than those from yesterday.

Perhaps the abstract of yesterday corresponded with a preliminary version of the study report, but meanwhile more patients have died, which has been taken into account in the final version, lowering both the median values for overall survival and for progression-free survival.


No offense intended. I guess someone in the RevMed Scientific Comms team had a more stressful day than they'd envisaged.

> Perhaps the abstract of yesterday corresponded with a preliminary version of the study report, but meanwhile more patients have died, which has been taken into account in the final version, lowering both the median values for overall survival and for progression-free survival.

No. The 'data cut' that these numbers are based on will have been taken (at least) weeks ago. You don't get (anything close to) real-time updates in the manner that you're implying. It was probably just a simple snafu somewhere along the line.


Yes, with nuances.

The recently-read-out trial is in the 'second line' - meaning patients will have received one 'line' of treatment before this - typically chemotherapy +/- surgery. The chemotherapy regimes used for pancreatic cancer can be pretty brutal, and patients can usually tolerate one, max two lines overall. As such, this trial administering only daraxonrasib monotherapy made sense.

The first line trial of daraxonrasib is already underway, and includes both darax monotherapy and darax + chemo arms. (They are combining with a slightly less brutal chemotherapy called GnP, with an eye to the overall side effect burden considering the non-trial side effects that darax also brings.)

It'll be very interesting to see the outcome of this trial; there are some examples elsewhere in oncology where a treatment is recommended by guidelines without chemotherapy over a combination with chemotherapy, as the small survival benefit the addition of chemotherapy brings is seen as outweighed by the additional toxicity.


Genuine question: is this solving a real problem?

IME, the bottleneck when using diffusion models isn't storage space or memory, it's generation time. Lots of models will run on 8-12 GB 1080-generation GPUs onwards, or on Macs with similar memory, which are probably the bottom end from a GPU power perspective anyway. I also note that these models are marginally slower than the small FLUX.2 model they're based on.

Okay, maybe this allows running a local model on something that has a reasonably powerful GPU and limited memory, like an iPhone, but is that really a common requirement?


It's useful progress. Decent-fidelity local-scale inference means that you can create a product that generates throwaway images frequently without worrying about cost. Thus far every product I've seen that generates images is metered, which severely limits the value. I don't know if this is actually at the "decent fidelity" point yet.

We are in an era of extreme demand for GPU and limited supply. Every inference we push to the edge frees cloud resources for other tasks. Every efficiency gain increases what we can achieve with existing resources. If images can be rendered with half as much compute, we need half as many GPUs.

… or generate twice as many images. Maybe not quite, but if we’ve seen anything with AI so far is that it fits Parkinson’s law pretty well.

Lower memory use == higher speed. Memory bandwidth is conserved with less to transfer; this is the biggest bottleneck. Compressing your filesystem generally makes storage faster as well.

I think the value of it is currently more academic than useful in the real world. Everything at the frontier is still only marginally Good Enough (in image generation, most of it is shit even from the best models), so things far behind the frontier in terms of capability (as a tiny 1-bit model necessarily must be) are unusable.

But, getting remarkably higher density of capability per unit of compute is a big thing. It means the frontier can get better and cheaper to operate and less resource hungry, and it means what can be accomplished at the edge, on personal laptops or phones, becomes a broader spectrum of tasks.

And, for privacy, there are a lot of things that should run on-device and not everyone has big dedicated GPUs.


> Lots of models will run on 8-12 GB 1080-generation GPUs onwards, or on Macs with similar memory, which are probably the bottom end from a GPU power perspective anyway.

Not the bottom end - most people are on laptops or mobile devices that are much lower GPU power than this.


Probably the bottom end an individual would want to consider using due to slow generation time.

Sure, you could theoretically take a model compressed in this manner and deploy it on an old netbook and run the calculations on the CPU, but each image would probably take an hour…


My laptop has a Pascal-era Nvidia GPU with 4GiB of VRAM. It's not very efficient but it can do these tasks a whole lot faster than the CPU, but the 4GiB limitation pretty much limits its use to only the tiniest models.

If this model can run inside of the 4GiB limit, that makes this infinitely more useful than existing models for me.


I was thinking more about the 0-3 year old midrange x86 laptops and phones, they have unified memory GPUs that are easily worth using (vs CPU), support narrow FP datatypes but don't have a ton of memory bandwidth.

Fair enough :)

It solves part of the download issue if they actually delivers a 1-bit whole package (currently their download is around 3.5GiB, still not ideal since FLUX.2 [klein] 4B you can get a package including text encoder ~6 GiB).

For speed, no. Draw Things runs on iPhone just fine and generally faster than their implementation on the same model (FLUX.2 [klein] 4B).


Genuine question: doesn't it blow your mind that there exists a 1 Gigabyte file/program that can generate any image you can think of just from a rough description of it?

Where are you getting the 1 Gigabyte number from?

Their 1-bit quantized Diffusion Transformer is just under 1 GB. You also need the text-encoder (4-bit quantized) and VAE (unquantized) for inference and their combined weight is ~3.42 GB.

TBF, even at that size it's no less mind blowing.


Same order of magnitude.

Yeah, it's pretty incredible. And I guess that's mostly what's behind the question: whether this is more of an impressive research/technique demonstrator, or a real product advancement solving a need.

> doesn't it blow your mind that there exists a 1 Gigabyte file/program that can generate any image you can think of just from a rough description of it?

I can make this into a 5-lines Python program. I’m not saying the images will match the description, but that isn’t part of your spec ;)


It’s like asking how did Memoji generation on iPhone solved a real problem?

It does not need to directly solve any particular problem to be overall good for consumers, by putting pressure to all those subscription based solutions… at least it’s private and does not require you to provide all your data…


Yes, size and performance are not only problems for local LLMs, they are problems for frontier LLM companies like OpenAI and Anthropic. The latter still lose a ton of money on inference and advances in efficient, performant models helps their bottom line.

For free users, I guess local generation is going to be faster than waiting in a queue.

Yes its a huge deal because these are starting to get bound by memory bandwidth not compute. therefore one bit wirfhts stream way faster leading to substantially better results. At least thats what Id guess!

ideally if ternary models work, the math is extremely easy for computers (addition/subtraction vs 16 bit multiplication)

Not quite as I understand it. The ternary approach bonsai uses leverages a FP16 scaling factor that each value in the ternary maps to. You're still using 16 bit multiplication, it's just that the weights are far more compressed.

fair, i think i was referring more to 1.58 bit architecture in general since the original paper (Figure 3) shows that we eliminate FP16 multiplication and addition just for INT8 addition. I need to dive deeper into bonsai overall if it differs

https://arxiv.org/pdf/2402.17764


I disagree. I suspect the vast majority of Neo sales are simply driven by the ability to get Apple-quality laptop hardware for such a low price. As such, the people driving the Neo sales are competing manufacturers who offer cheap nasty plastic underpowered Windows laptops around that price point.

A small minority of buyers may be primarily buying the Neo to escape Windows; but I would argue that if someone is this sophisticated, then they would also be aware that Apple is slowly taking a similar enshittified path with MacOS.


I think you severely underestimate the power of Copilot. It's the absolute worst thing for windows.

(I've been using Microsoft since Windows 3.11, till Windows 10. Windows 11 was the last drop for me.)


> I think you severely underestimate the power of Copilot. It's the absolute worst thing for windows.

I agree with you, but I think you've overestimating the level of technical sophistication of the typical Neo buyer.

> (I've been using Microsoft since Windows 3.11, till Windows 10. Windows 11 was the last drop for me.)

Kind of proves my point. I can think of plenty of people who use computers successfully every day who would unlikely to confidently state the name let alone the generation of their operating system, if asked.

Just by being on HN, you're probably multiple standard deviations from the mean.


I would buy a Neo to escape Windows but it's not like macOS is a pleasant experience either.

My next mobile workstation will probably be an arm laptop with Linux for great battery life.


From the perspective of this intermediate/hobbyist level Python amateur who never quite got to grips with Swift, it's a paradigm shift. For all manner of simple-ish personal apps, Claude can one-shot an MVP in a few minutes, pulling all of the Swift/UI boilerplate together (which would have taken me hours if not days via tutorials) which I can then easily tweak and iterate to my heart's content.

It also supports the 'native is better' apprpach. For example, I recently started with Powerflow [0] which I really like except for the size of it, and created a Swift version [1] in an hour of Claude plus manual tweaks. It's been running quietly on my laptop ever since.

The downside is that it brings Apple's iOS restrictions into sharper relief: being able to easily write widgets and apps to suit my needs but then being unable to run then for longer than a week on my own hardware without paying $99 a year is especially frustrating.

[0] https://github.com/lzt1008/powerflow [1] https://github.com/Tom1827/powerflow-swift


Brings back memories. Comanche was incredible when it came out, running on (IIRC) our family’s 386SX-16.

I tried to replicate the effect in Visual Basic, albeit with very limited success at the time.


I played hours and hours of it, networked multiplayer version, with my work colleagues .. it was part of our regular TGIF team-building exercises, among a few bouts of Descent2, some Warcraft2 and the odd Quake .. ah, halcyon days indeed .. we all had our joysticks at the office, lol.

Delta force 2 for me [1]! The only real problem with it was you could see characters as moving single pixels across the map...and hit them without too much trouble.

[1] https://youtu.be/SyBh91UYVS8


I've spent cumulatively much more time comparing SVGs of pelicans, and we know who's responsible for that...

Hah, fair!

Eh, I was expecting something far worse from the title.

Once a month, an email reminds you to click on a provided link, log in (via saved credentials, one assumes?) and click a single button? I get that it's small frustration, but I suspect there are far more egregious administration inefficiencies in the world of government than this.

(You should try living/working in Germany ;) )

Also to note, the title is a vast overstatement, but I guess "The monthly reporting requirements of the UK Government's Low Value Purchase System is a very minor waste of time, on some occasions" isn't quite so catchy.


Agree. It's incredible that for pancreatic cancer (which is one of the single-most lethal and difficult-to-treat cancers there is) we're moving from a choice between a couple of decade-old and brutal chemotherapy cocktails, to discussions about sequencing and even the possibility of chemotherapy-free treatment, in a single generation of new agents.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: