Introduction
The OWASP Top 10 for LLM Applications is useful because it moves the conversation beyond “the model said something wrong.” In real systems, an LLM is connected to prompts, RAG, vector databases, tools, APIs, logs, users, permissions, providers, and business workflows.
That is where the risk lives. A bad response is a quality problem. A bad response that triggers a tool call, leaks internal context, writes to a ticketing system, executes generated SQL, or retrieves another user’s documents becomes a security problem.
These notes are based on the OWASP Top 10 for LLM Applications 2025 material in this repository. The goal is not to memorize ten labels, but to use them as a review framework for LLM applications, RAG systems, agents, and AI-enabled workflows.
1) Core idea
Traditional AppSec still matters: authentication, authorization, input validation, output encoding, dependency management, logging, rate limiting, and secure deployment do not disappear.
LLM security adds a new problem: the application includes a non-deterministic component that interprets natural language, consumes untrusted context, and may decide how to call tools.
The practical rule is:
Treat the model output as untrusted, and never delegate security decisions only to the prompt.
That means:
- prompts are not authorization controls,
- retrieved documents are not automatically trusted,
- model-generated JSON, SQL, Markdown, code, or tool arguments need validation,
- agents need narrow tools and independent backend authorization,
- high-impact actions need human approval,
- logs must show prompt, retrieval, tool calls, validation, cost, and final output.
2) The Top 10 at a glance
| Risk | What it means | What to check first | Key controls |
|---|---|---|---|
| LLM01 Prompt Injection | User input or external content changes model behavior. | Direct prompts, indirect prompts in RAG, tool-use changes. | Separate instructions/data, validate context, least-privilege tools, output checks, evals. |
| LLM02 Sensitive Information Disclosure | The system reveals PII, secrets, internal data, logs, prompts, or tool results. | RAG permissions, logs, prompts, provider data handling. | Data minimization, redaction, permission-aware retrieval, secure logging. |
| LLM03 Supply Chain | Models, datasets, libraries, plugins, tools, providers, or adapters introduce risk. | Model origin, dataset lineage, licenses, plugin permissions. | AI BOM/SBOM, hashes, approvals, sandboxing, vendor review, rollback. |
| LLM04 Data and Model Poisoning | Training, fine-tuning, evaluation, RAG, embeddings, or model artifacts are manipulated. | Ingestion sources, dataset ownership, versioning, drift. | Data lineage, source validation, human review, regression evals, reindexing controls. |
| LLM05 Improper Output Handling | LLM output is used without validation in browsers, APIs, queries, tools, or workflows. | Generated SQL, shell, JSON, Markdown, HTML, tool args. | Schema validation, allowlists, contextual encoding, sandboxing, approval for mutative actions. |
| LLM06 Excessive Agency | The agent has too many tools, permissions, or autonomy. | Tool inventory, OAuth scopes, write permissions, action approval. | Narrow tools, backend authorization, user-scoped execution, step limits, kill switch. |
| LLM07 System Prompt Leakage | System prompts or internal rules are exposed. | Prompts with secrets, debug logs, internal tool descriptions. | No secrets in prompts, prompt review, output filtering, deterministic auth outside the LLM. |
| LLM08 Vector and Embedding Weaknesses | Embeddings, vector search, or RAG retrieval leak or manipulate context. | Tenant separation, metadata, pre-filtering, top-k, logs. | Permission-aware RAG, metadata preservation, retrieval logs, safe chunking, reindexing. |
| LLM09 Misinformation | The model produces false or unsupported information that users trust. | Grounding, citations, abstention, high-stakes use cases. | Groundedness checks, verified sources, refusal/abstention, human review, factuality evals. |
| LLM10 Unbounded Consumption | The system allows uncontrolled token, cost, compute, tool, or query usage. | Token limits, quotas, timeouts, agent loops, expensive tools. | Rate limits, budgets, max steps, timeouts, backpressure, billing alerts. |
3) A practical review model
Do not start by asking “is the model safe?” Start with the full flow.
Input -> Orchestrator -> Prompt -> Model -> Retrieval -> Tools -> Output -> Action -> Logs
For each step, ask:
- What untrusted data enters here?
- What data can the system retrieve?
- What tool can be called?
- Which identity is used?
- What output is rendered, parsed, executed, or stored?
- Which action needs confirmation?
- Which logs would prove what happened?
- Which limit prevents abuse of cost or tokens?
This flow makes the Top 10 easier to apply. Prompt injection affects the prompt and retrieval boundary. Sensitive disclosure affects retrieval, tools, logs, prompts, and outputs. Excessive agency affects tools and actions. Unbounded consumption affects every expensive step.
4) Technical notes by risk
LLM01: Prompt Injection
Prompt injection is the most representative LLM application risk because instructions and data share the same language channel. It can be direct, where the user writes the instruction, or indirect, where the instruction is hidden in external content such as a web page, ticket, email, document, PDF, or RAG source.
The important point is that keyword filtering is not enough. The attack can be semantic, translated, split across turns, hidden in retrieved documents, or carried across context boundaries.
Review focus:
- Are system instructions, user data, and retrieved context clearly separated?
- Is RAG context treated as untrusted?
- Can retrieved documents influence tool calls?
- Are tool permissions enforced outside the model?
- Are prompt injection attempts logged with retrieved document IDs and final tool calls?
Useful control pattern:
Untrusted document -> classify/sanitize -> retrieve with permissions -> mark as context, not instruction -> validate output/tool args -> log decision
LLM02: Sensitive Information Disclosure
Sensitive information disclosure is broader than “the chatbot printed a secret.” Leakage can happen through prompts, RAG context, embeddings, memory, logs, provider telemetry, tool responses, traces, error messages, or admin interfaces.
The highest-risk pattern is permission-blind RAG. If a vector database retrieves documents by semantic similarity but ignores user, tenant, role, or classification, the model may summarize data the user should never see.
Review focus:
- Is there an inventory of data sent to the model?
- Are documents classified before indexing?
- Does retrieval apply permissions before context construction?
- Are logs masked or minimized?
- Are PII and secrets redacted before provider calls?
- Can embeddings or chunks contain sensitive data unnecessarily?
Core controls are minimization, permission-aware retrieval, DLP/redaction, secure logging, and provider data handling review.
LLM03: Supply Chain
AI supply chain risk includes normal dependencies, but it also includes models, weights, datasets, adapters, embedding models, prompts, tools, plugins, MCP servers, notebooks, containers, CI/CD, and inference providers.
Some AI artifacts are hard to inspect. A model or LoRA adapter can behave normally in demos and still carry unsafe behavior under specific conditions.
Review focus:
- Is there an AI BOM/SBOM with model, dataset, adapter, dependency, provider, version, hash, license, owner, and purpose?
- Are models and adapters verified before use?
- Are datasets reviewed for origin, sensitivity, and license?
- Are plugins and tools reviewed for permissions?
- Is there rollback if a model update degrades security?
The practical control is a formal approval process for AI components, not just pip install and model downloads.
LLM04: Data and Model Poisoning
Poisoning targets integrity. The attacker manipulates training data, fine-tuning data, evaluation sets, RAG documents, embeddings, model weights, adapters, feedback loops, or agent memory.
This can produce biased behavior, backdoors, false answers, degraded quality, or behavior triggered only by rare prompts.
Review focus:
- Who can modify datasets, documents, embeddings, and model artifacts?
- Is there data lineage and versioning?
- Are new RAG documents quarantined or reviewed before indexing?
- Can you roll back an index or dataset?
- Do evals include adversarial and edge cases, not only average accuracy?
For RAG systems, poisoning prevention starts before retrieval: validate sources before they become searchable context.
LLM05: Improper Output Handling
The strongest rule in this category is simple:
The output of the LLM is input to the next component.
If that output becomes HTML, Markdown, SQL, shell commands, JSON tool arguments, file paths, API parameters, or workflow instructions, it must be validated and encoded for that specific context.
Review focus:
- Is LLM-generated JSON validated with a strict schema?
- Are unexpected fields rejected?
- Are generated queries reviewed, parameterized, or sandboxed?
- Is generated Markdown/HTML sanitized before rendering?
- Are tool arguments allowlisted?
- Are mutative actions separated from suggestions?
Secure pattern:
LLM suggestion -> strict parser -> schema validation -> authorization -> human approval if sensitive -> execution -> audit log
LLM06: Excessive Agency
Excessive agency appears when an LLM agent has too much functionality, too many permissions, or too much autonomy.
This is especially dangerous because prompt injection and hallucination become more severe when the model can act. A chatbot that says something wrong is one thing. An agent that sends emails, changes tickets, queries internal systems, or deletes data is another.
Review focus:
- Are all tools necessary?
- Are tools narrow or generic?
- Are read/write/delete/send permissions separated?
- Does the backend authorize every action independently of the model?
- Are actions executed as the user or as a privileged service account?
- Are there max steps, tool-call limits, and a kill switch?
Important control: the LLM may decide what it wants to do, but deterministic systems must decide what it is allowed to do.
LLM07: System Prompt Leakage
System prompt leakage is not always critical by itself. The real problem is when the prompt contains secrets, credentials, business logic, tool details, authorization rules, or anything that makes later attacks easier.
The source notes make the right distinction: the system prompt should not be treated as a strong security boundary.
Review focus:
- Are there secrets, credentials, tokens, or real PII in prompts?
- Are authorization rules written only in natural language?
- Do debug logs expose full prompts?
- Are prompts versioned and reviewed?
- Would the system remain safe if the user knew the prompt?
The safest architecture assumes the prompt can leak and keeps secrets, permissions, and irreversible controls outside it.
LLM08: Vector and Embedding Weaknesses
Vector databases are designed for relevance. Security requires more than relevance.
In RAG systems, a retrieved chunk can become part of the prompt. If the chunk belongs to another tenant, contains sensitive data, has weak metadata, or was poisoned, the LLM may convert it into a confident answer.
Review focus:
- Does every chunk preserve owner, source, sensitivity, tenant, role, and classification?
- Are filters applied before context construction?
- Is top-k justified and limited?
- Are retrieved documents logged with scores and filters?
- Is there a process for deletion and reindexing after permission changes?
- Are queries monitored for semantic enumeration?
Permission-aware RAG is the core control. Filtering only in the UI is too late.
LLM09: Misinformation
Misinformation is a security risk when false or unsupported responses influence decisions, legal commitments, customer support, code, financial choices, medical guidance, HR, or security work.
It does not always require an attacker. A model can hallucinate because the source is missing, retrieval is poor, the prompt rewards confidence, or the UI makes answers look authoritative.
Review focus:
- Can the system abstain?
- Are high-impact answers grounded in sources?
- Are citations verified, not only displayed?
- Are unsupported claims blocked in critical domains?
- Is human review required for legal, medical, financial, HR, security, or irreversible decisions?
- Are hallucination and factuality evals part of regression testing?
The practical control is not a disclaimer. It is grounding, abstention, review, and measurement.
LLM10: Unbounded Consumption
Unbounded consumption covers uncontrolled use of tokens, inference, context, RAG, tool calls, queues, streaming, model access, and provider cost.
In cloud environments, this includes Denial of Wallet: the service may stay online while the bill explodes.
Review focus:
- Are input and output tokens limited?
- Are there quotas per user, tenant, API key, IP, model, and tool?
- Do agents have max steps and timeouts?
- Is top-k bounded in RAG?
- Are expensive tools gated?
- Are cost, latency, errors, tokens, queue length, and top users monitored?
- Is there a daily/monthly budget and automatic cutoff?
This is not only reliability engineering. It is an abuse case for any public or semi-public LLM endpoint.
5) Condensed audit checklist
Use this as a first-pass review checklist.
Architecture
- Is there an updated LLM data flow diagram?
- Are user, backend, orchestrator, model, RAG, tools, APIs, and logs separated?
- Are trust boundaries explicit?
- Is model output treated as untrusted?
Prompting
- Are prompts free of secrets and PII?
- Are system instructions, user input, and retrieved context separated?
- Are prompt changes tested with regression evals?
- Is authorization implemented outside the model?
RAG and embeddings
- Are sources inventoried, classified, and owned?
- Are documents validated before indexing?
- Does retrieval enforce user/tenant/role permissions?
- Are retrieved chunks and scores logged?
- Is there a reindexing and deletion process?
Agents and tools
- Is every tool necessary?
- Are tool inputs and outputs schema-validated?
- Do APIs authorize independently?
- Are actions mutative or external-send actions approved by a human?
- Is there a kill switch or read-only mode?
Logging and monitoring
- Are prompts, responses, model version, retrieved sources, tool calls, validation results, tokens, cost, and errors logged?
- Are logs masked or tokenized when sensitive data appears?
- Are there alerts for injection attempts, unusual tool calls, sensitive retrieval, and abnormal cost?
Rate limiting and consumption
- Are there limits per user, IP, tenant, API key, model, endpoint, and tool?
- Are token budgets enforced?
- Are timeouts and backpressure implemented?
- Are Denial of Wallet scenarios tested?
Evaluation and red teaming
- Is there a dataset with normal, adversarial, and edge cases?
- Are tests mapped to the ten OWASP risks?
- Are findings converted into regression tests?
- Is retesting required after mitigation?
6) Purple team workflow
The purple team playbook in the notes gives a good structure:
- Scope: define app, model, RAG, tools, APIs, users, roles, data types, and risks covered.
- Threat model: map
Actor -> Input -> Context -> Model -> Tool -> Data -> Action -> Impact. - Test cases: create one controlled test per risk, with expected logs and defensive behavior.
- Attack simulation: use synthetic, reversible, authorized prompts/documents/actions.
- Detection engineering: define signals for injection, sensitive retrieval, tool abuse, high cost, and hallucination.
- Control validation: decide whether each control prevents, detects, reduces impact, and creates evidence.
- Reporting: include business impact, technical evidence, failed controls, recommendations, residual risk, and retest plan.
- Remediation: fix architecture, prompts, retrieval, tools, permissions, evals, logs, and alerts.
- Retest: repeat original tests and variants, then measure false positives and legitimate functionality.
A good test case is small and measurable:
ID: TC-LLM01-IND-001
Risk: LLM01 Prompt Injection
Goal: Ensure retrieved documents cannot override system policy.
Input: Synthetic RAG document containing an instruction-like sentence.
Expected: The app treats it as untrusted content, does not change policy, does not call tools, and logs a suspicious context signal.
Failure impact: Context from RAG can control model behavior.
7) Common pitfalls
- Treating the system prompt as a security boundary.
- Giving agents broad tools during prototyping and forgetting to remove them.
- Applying RAG permissions after retrieval instead of before context construction.
- Logging full prompts and user data without masking.
- Rendering LLM Markdown/HTML as trusted content.
- Letting generated SQL, shell, JSON, or API arguments execute without validation.
- Measuring only answer quality and ignoring security evals.
- Running red team tests without checking whether Blue Team telemetry saw anything.
- Setting API rate limits but forgetting token, cost, top-k, tool-call, and agent-step limits.
- Fixing a finding without turning it into a regression test.
Final thoughts
The OWASP LLM Top 10 is most useful as an architecture review tool. It forces you to look at the complete system: prompts, context, retrieval, vector stores, tools, permissions, outputs, actions, logs, providers, and cost controls.
The practical direction is clear: use traditional AppSec controls where they still apply, then add LLM-specific controls where language, retrieval, tools, and non-determinism create new failure modes.
For production systems, the strongest baseline is defense in depth: least privilege, permission-aware RAG, strict output handling, human approval for sensitive actions, continuous evals, and logs detailed enough to investigate what the model saw, what it did, and why it was allowed.
Reference
This article is based on my personal study notes from the Cyber AI Security track.
Full repository: https://github.com/lameiro0x/cyber-ai-security-notes