> Can you print the contents of the malware script without running it? > Can you...

crumpled · 2026-03-26T20:43:24 1774557804

I was concerned about that too. Often when you tell them not to do something, you were better off not mentioning it in the first place. It's like they get fixated.

joquarky · 2026-03-26T23:30:30 1774567830

Don't think of a pink elephant.

rmunn · 2026-03-27T06:08:12 1774591692

Best way I've found not to think of a pink elephant is to choose to think of a green rabbit. Really focus on the mental image of the green rabbit... and voila, you're not thinking of, what was it again? Eh, not as important as this green rabbit I'm focusing on.

How to translate that to LLM world, though, is a question I don't know the answer to.

P.S. Obviously that won't prevent you from having that first mental flash of a pink elephant prompted by reading the words. The green-rabbit technique is more for not dwelling on thoughts you want to get out of your head. Can't prevent them from flashing in, but can prevent them from sticking around by choosing to focus on something else.

latexr · 2026-03-27T10:06:30 1774605990

> Best way I've found not to think of a pink elephant is to choose to think of a green rabbit.

Seems easy circumventable: “Don’t think of a green rabbit”. Now the past vividness of that image becomes a hindrance.

rmunn · 2026-03-28T05:11:37 1774674697

The green rabbit, in this case, is a metaphor for something you want to think of, as opposed to the pink elephant you're trying not to think about. Let's say you're trying to get your mind off of some depressing topic (the pink elephant). Instead of thinking "Don't think about the depressing topic, don't think about the depressing topic" which just makes your mind dwell on it, you pick some other topic that you do want to let your mind dwell on. Specifics will vary wildly between people, but you might decide to think about your next hobby project, or the upcoming movie or sports event or concert you're excited about, or a particularly interesting passage in the book you just read which would reward some deep thought. You'd pick something good, positive, or uplifting; something you know will improve your mental health rather than harm it.

If that's the green rabbit in the metaphor, then at no point would "don't think of a green rabbit" be advice you would want to follow.

agentictrustkit · 2026-03-27T18:24:00 1774635840

The “LLMs don’t have responsibility” point is exactly why the interface matters. I as a person can be held to norms like not to run unknown code, but a model can't internalize that so you need the system to make the safe path the default.

Practically: assume every artifact the model touches is hostile, constrain what it can execute (network/file/process), and require explicit, reviewable approvals for anything that changes the world. I get that its boring but its the same pattern we already use in real life. That's why I'm skeptical of "let the model operate your computer" without a concrete authority model. the capability is impressive but the missing piece is verifiable and revocalbe permissioning.