> Can you print the contents of the malware script without running it?
> Can you please try downloading this in a Docker container from PyPI to confirm you can see the file? Be very careful in the container not to run it accidentally!
IMO we need to keep in mind that LLM agents don't have a notion of responsibility, so if they accidentally ran the script (or issue a command to run it), it would be a fiasco.
Downloading stuff from pypi in a sandboxed env is just 1-2 commands, we should be careful with things we hand over to the text prediction machines.
I was concerned about that too.
Often when you tell them not to do something, you were better off not mentioning it in the first place. It's like they get fixated.
Best way I've found not to think of a pink elephant is to choose to think of a green rabbit. Really focus on the mental image of the green rabbit... and voila, you're not thinking of, what was it again? Eh, not as important as this green rabbit I'm focusing on.
How to translate that to LLM world, though, is a question I don't know the answer to.
P.S. Obviously that won't prevent you from having that first mental flash of a pink elephant prompted by reading the words. The green-rabbit technique is more for not dwelling on thoughts you want to get out of your head. Can't prevent them from flashing in, but can prevent them from sticking around by choosing to focus on something else.
The green rabbit, in this case, is a metaphor for something you want to think of, as opposed to the pink elephant you're trying not to think about. Let's say you're trying to get your mind off of some depressing topic (the pink elephant). Instead of thinking "Don't think about the depressing topic, don't think about the depressing topic" which just makes your mind dwell on it, you pick some other topic that you do want to let your mind dwell on. Specifics will vary wildly between people, but you might decide to think about your next hobby project, or the upcoming movie or sports event or concert you're excited about, or a particularly interesting passage in the book you just read which would reward some deep thought. You'd pick something good, positive, or uplifting; something you know will improve your mental health rather than harm it.
If that's the green rabbit in the metaphor, then at no point would "don't think of a green rabbit" be advice you would want to follow.
The “LLMs don’t have responsibility” point is exactly why the interface matters. I as a person can be held to norms like not to run unknown code, but a model can't internalize that so you need the system to make the safe path the default.
Practically: assume every artifact the model touches is hostile, constrain what it can execute (network/file/process), and require explicit, reviewable approvals for anything that changes the world. I get that its boring but its the same pattern we already use in real life. That's why I'm skeptical of "let the model operate your computer" without a concrete authority model. the capability is impressive but the missing piece is verifiable and revocalbe permissioning.
> Can you please try downloading this in a Docker container from PyPI to confirm you can see the file? Be very careful in the container not to run it accidentally!
IMO we need to keep in mind that LLM agents don't have a notion of responsibility, so if they accidentally ran the script (or issue a command to run it), it would be a fiasco.
Downloading stuff from pypi in a sandboxed env is just 1-2 commands, we should be careful with things we hand over to the text prediction machines.