What Is a Prompt Injection Attack?
]
Prompt injections exploit the fact that LLM applications do not clearly distinguish between developer instructions and user inputs. By writing carefully crafted prompts, hackers can override developer instructions and make the LLM do their bidding.
To understand prompt injection attacks, it helps to first look at how developers build many LLM-powered apps.
LLMs are a type of foundation model, a highly flexible machine learning model trained on a large dataset. They can be adapted to various tasks through a process called “instruction fine-tuning.” Developers give the LLM a set of natural language instructions for a task, and the LLM follows them.