Executive summary: If you use AutoGPT, you need to be aware of prompt injection. This is a serious problem that can cause your AutoGPT agent to perform unexpected and unwanted tasks. Unfortunately, there isn't a perfect solution to this problem available yet.
If you set up an AutoGPT agent to perform task A, a prompt injection could 'derail' it into performing task B instead. Task B could be anything. Even something unwanted like deleting your personal files or sending all your bitcoins to some crooks wallet.
Docker helps limit the file system access that agents have. Measures like this are extremely useful. It's important to note that the agent can still be derailed.
In band signalling. This can happen if the agent uses web search to find things, and it hits upon a document that includes a prompt injection. GPT has to decide by itself whether text is instructions or not, and sometimes it will misinterpret untrusted user input from the internet as part of its instructions.
I'm not providing an example PoC. It may be useful for one to be privately shared with the developers to help them experiment with and address this problem but for now there is none.
It isn't yet known how to protect against prompt injection.
SQL injection can be solved by escaping the untrusted user input. Unfortunately there is no equivalent to escaping untrusted user input.
Why not add a second LLM that tries to detect prompt injection before passing the text to the LLM?
As an analogy: If you think of prompt injection like a knife and LLMs like a paper bag, the knife can stab through a paper bag. It can also stab through a second paper bag. To protect against a knife you need something different that's tough against a knife like armor.
We can't solve a vulnerability by adding more layers of vulnerable technology.
Another idea is to keep blocking every prompt injection that is found. This doesn't address the root cause of the problem. And it means that people will be occasionally victim to attacks.
I've been writing about this problem a bunch recently. Unfortunately there still isn't a good way of addressing this, despite a lot of motivated people trying to find a good solution.