What is Man-in-the-Prompt Attack and How to Protect Yourself
1 min read
Summary
A man-in-the-prompt attack is a type of cybersecurity threat that targets large language models (LLMs) via AI chatbots by injecting malicious instructions into the prompts that users provide to these systems.
The prompts can include visible or invisible content, and may direct the LLM to provide sensitive information, reveal private data, or give harmful responses to users.
The method of attack currently mainly takes place through browser extensions, via accessing the LLM prompt inputs and outputs through the page’s Document Object Model (DOM).
To counter this, it is suggested to not install any browser extensions that interact directly with LLM tools, and recommended to only enter LLM prompts manually and inspect them before sending, being Suspicious of Model Replies and avoiding personalised commercial chatbots.
Enterprise environments are particularly vulnerable to man-in-the-prompt attacks due to lack of vetting of browser extensions and use of incognito mode with extensions disabled can also help mitigate threat.