Within Hallucinations
When working AI code is not safe code
Generated code can appear finished because it runs, while hidden flaws in validation, authentication, or file handling remain unresolved.
On this page
- Why running code can still be vulnerable
- Common security flaws in generated code
- Review steps before code reaches production
Page outline Jump by section
Introduction
Generative AI can produce software that compiles, runs, and even passes basic tests. That apparent success creates a distinctive risk within the broader problem of plausible AI outputs: working code can still be insecure code. A program may perform its intended function while quietly exposing sensitive data, accepting malicious input, bypassing authentication checks, or creating opportunities for future attacks. The danger is not that the code visibly fails, but that it appears finished and trustworthy precisely because it works. Research on AI coding assistants repeatedly finds that functional correctness and security are not the same thing, and that developers often become more confident in code quality even when security weaknesses remain. [arXiv]arxiv.orgDo Users Write More Insecure Code with AI Assistants?18 Dec 2023 — Overall, we find that participants who had access to an AI assist…
As AI-generated code becomes a routine part of software development, understanding this distinction is increasingly important. The central lesson is simple: successful execution is evidence that code performs a task, not evidence that it performs that task safely.
Why Running Code Can Still Be Vulnerable
Traditional software bugs often announce themselves through crashes, error messages, or obvious malfunctions. Security flaws are different. A vulnerable application can function perfectly during normal use while remaining exploitable under specific conditions.
This difference matters because large language models are optimised to generate code that satisfies prompts and produces expected outputs. They are not inherently security auditors. When asked to build a login system, file uploader, API endpoint, or database query, a model may generate code that appears complete but omits safeguards that experienced security engineers would consider essential. [SonarSource]sonarsource.comOWASP LLM Top 10: How it Applies to Code GenerationThe OWASP Top 10 for Large Language Model Applications defines ten critical…
Researchers studying GitHub Copilot found that AI-generated solutions frequently included exploitable weaknesses despite successfully completing programming tasks. In one widely cited study, roughly 40% of generated programs in security-sensitive scenarios contained vulnerabilities or design flaws that attackers could exploit. [arXiv]arxiv.orgarXiv Asleep at the Keyboard?Assessing the Security of GitHub…August 20, 2021 — by H Pearce · 2021 · Cited by 936 — In this work, we systematically investigate the…
The result is an “illusion of correctness”. Code behaves correctly in expected situations, leading developers to assume it is production-ready. Yet hidden weaknesses remain dormant until an attacker discovers them. Industry security reports increasingly identify this false sense of confidence as one of the most significant risks associated with AI-assisted development. [IT Pro]itpro.comAccording to a Black Duck survey, there was a 12% increase in enterprises evaluating where large language model (LLM)-generated code can…
Common Security Flaws in Generated Code
Missing Input Validation
One of the most common weaknesses is inadequate validation of user input. Generated code may accept data exactly as provided and process it without checking whether the input is malicious, malformed, or excessively large.
For example, a web application may correctly accept form submissions and store them in a database. The feature works. However, if the code fails to sanitise or validate input, attackers may exploit it through injection attacks or unexpected data formats. The application passes functional testing but remains vulnerable. [OWASP Foundation]owasp.orgIt represents a broad consensus about the most critical security…Read more…
Unsafe Authentication and Authorisation
Authentication answers the question “Who are you?” Authorisation answers “What are you allowed to do?” AI-generated code sometimes implements the first while neglecting the second.
A generated API may verify that a user is logged in but fail to confirm whether that user should have access to a particular record or action. During normal testing the application appears correct because authorised users can perform expected tasks. The weakness only becomes visible when someone deliberately attempts unauthorised access. [SonarSource]sonarsource.comOWASP LLM Top 10: How it Applies to Code GenerationThe OWASP Top 10 for Large Language Model Applications defines ten critical…
Insecure File Handling
File uploads and file processing are frequent sources of security problems. AI-generated examples may successfully upload documents, images, or reports while failing to verify file types, restrict storage locations, or prevent dangerous filenames.
The feature works exactly as requested. The security controls that should surround it may be absent. This creates opportunities for attackers to upload malicious files or manipulate server behaviour through unexpected inputs. [SecureFlag]blog.secureflag.comthe risks of generative ai coding in software developmentThe risks of generative AI coding in software development16 Oct 2024 — One of the most noticeable risks with AI-generated code…
Weak Database Access Patterns
Database queries generated by AI can appear clean and efficient while relying on insecure techniques such as string concatenation rather than parameterised queries.
From a user’s perspective the application retrieves and stores data correctly. From a security perspective it may be vulnerable to injection attacks. Researchers and security practitioners repeatedly cite this category as an example of AI producing code that functions correctly but ignores established secure coding practices. [TechRadar]techradar.comTech Radar Why LLMs are plateauingWhile LLMs like OpenAI's GPT-5 have shown improved accuracy in producing secure code due to enhanced reasoning capabilities, most models—…
Lack of Defensive Programming
Human developers often add protective checks for edge cases, unexpected states, and invalid operations. Studies comparing human-written and AI-generated code have found that model-generated code frequently lacks these defensive measures.
Researchers examining generated implementations found examples that compiled and executed successfully yet omitted safeguards against issues such as buffer overflows, integer overflows, null dereferences, and out-of-bounds access. The code fulfilled its primary task while remaining less resilient under unusual or hostile conditions. [arXiv]arxiv.orgArtificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code GenerationSeptember 28, 2024…
What Research Reveals About the Risk
Evidence from multiple studies points to a recurring pattern: AI-generated code often appears more trustworthy than it deserves.
A Stanford-led study found that participants using AI coding assistants produced less secure code than those working without assistance. Notably, users with AI assistance often believed their code was more secure despite the opposite being true. [arXiv+2Stanford EE Department]arxiv.orgDo Users Write More Insecure Code with AI Assistants?18 Dec 2023 — Overall, we find that participants who had access to an AI assist…
Other research evaluating GitHub Copilot across dozens of security-sensitive programming scenarios reported substantial rates of vulnerable output. The concern was not merely occasional mistakes but the systematic reproduction of insecure patterns learned from public code repositories containing both good and bad examples. [arXiv]arxiv.orgarXiv Asleep at the Keyboard?Assessing the Security of GitHub…August 20, 2021 — by H Pearce · 2021 · Cited by 936 — In this work, we systematically investigate the…
More recent analyses continue to find vulnerabilities across a wide range of modern coding models. Comparative evaluations of multiple large language models report that all tested systems generated vulnerable code in at least some circumstances, with many weaknesses rated high or critical severity. [arXiv]arxiv.orgarXiv Security of LLM-generated Code: A Comparative AnalysisSecurity of LLM-generated Code: A Comparative AnalysisMay 21, 2026…
Industry assessments show similar trends. Large-scale testing of generated code has found significant rates of security flaws even when the resulting programs appear production-ready and function as intended. [TechRadar]techradar.comThe research analyzed over 100 large language models (LLMs) across 80 coding tasks and revealed no significant improvement in security pe…
Why These Weaknesses Persist
The underlying reason is straightforward. AI systems learn from existing code rather than from an independent understanding of software security.
Public repositories contain millions of examples of authentication systems, database queries, file upload handlers, and API endpoints. Many of those examples are insecure. When a model predicts likely code patterns, it can reproduce both secure and insecure approaches with similar confidence. [arXiv]arxiv.orgarXiv Asleep at the Keyboard?Assessing the Security of GitHub…August 20, 2021 — by H Pearce · 2021 · Cited by 936 — In this work, we systematically investigate the…
Another challenge is that security is often invisible in simple demonstrations. A prompt such as “create a login page” rewards visible functionality. The generated answer is more likely to focus on getting the feature running than on implementing rate limiting, session hardening, audit logging, privilege separation, and other protective controls that become important in production environments. [SonarSource]sonarsource.comOWASP LLM Top 10: How it Applies to Code GenerationThe OWASP Top 10 for Large Language Model Applications defines ten critical…
Even attempts to have AI repair its own security issues produce mixed results. Research has shown that prompting models to fix vulnerabilities sometimes removes one weakness while introducing another elsewhere in the codebase. [arXiv]arxiv.orgArtificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code GenerationSeptember 28, 2024…
Review Steps Before Code Reaches Production
The most effective response is to treat AI-generated code as a draft rather than a finished product.
Before deployment, organisations increasingly apply the same scrutiny they would apply to third-party code:
- Conduct security-focused code review. Reviewers should examine authentication, authorisation, input handling, cryptography, logging, and error management rather than focusing solely on functionality.
- Use automated security testing. Static analysis, dependency scanning, and vulnerability detection tools can identify weaknesses that functional testing misses.
- Test hostile scenarios. Security testing should include malformed inputs, unauthorised requests, privilege escalation attempts, and abuse cases.
- Verify dependencies. Generated code often imports libraries automatically. Those dependencies should be reviewed for known vulnerabilities and maintenance status.
- Require human approval. Security-critical code should not enter production solely because an AI system generated it or because it passed unit tests. [Checkmarx+2Veracode]checkmarx.comGit Hub Copilot Security: Risks, Built-In Controls, and BestGitHub Copilot Security: Risks, Built-In Controls, and Best…May 11, 2026 — GitHub Copilot integrates with GitHub Advanced Sec…
Many organisations are also beginning to treat AI-generated code similarly to external software supply-chain components: useful, productive, and potentially valuable, but requiring verification before trust. [TechRadar]techradar.comTech Radar Nearly all security bosses are worried about AI safetyAn overwhelming 90% of security leaders report active concerns about AI safety, particularly as AI coding tools become more widespread in…
The Real Cost of Plausible Code
Within the broader discussion of hallucinated answers and plausible AI outputs, insecure code occupies a special category. Unlike a fabricated fact in a text response, vulnerable software can persist for months or years inside production systems.
The challenge is that AI-generated code often succeeds at the most visible test: it works. Yet security failures usually emerge under conditions that ordinary users never see. Research, industry audits, and security assessments consistently show that functional success should not be mistaken for secure design. AI can accelerate programming dramatically, but speed and correctness do not automatically include safety. [arXiv+2Veracode]arxiv.orgDo Users Write More Insecure Code with AI Assistants?18 Dec 2023 — Overall, we find that participants who had access to an AI assist…
Amazon book picks
Further Reading
Books and field guides related to When working AI code is not safe code. Use these as the next step if you want deeper reading beyond the article.
Software Security
Rating: 4.0/5 from 5 Google Books ratings
Explains secure development and vulnerability prevention.
The Web Application Hacker's Handbook
Shows how functional applications can still be exploitable.
Endnotes
-
Source: arxiv.org
Link: https://arxiv.org/html/2211.03622v3Source snippet
Do Users Write More Insecure Code with AI Assistants?18 Dec 2023 — Overall, we find that participants who had access to an AI assist...
-
Source: ee.stanford.edu
Title: dan boneh and team find relying ai more likely make your code buggier
Link: https://ee.stanford.edu/dan-boneh-and-team-find-relying-ai-more-likely-make-your-code-buggierSource snippet
Stanford EE DepartmentDan Boneh and team find relying on AI is more likely to...11 Jan 2023 — Their study examined how users interact wi...
-
Source: sonarsource.com
Link: https://www.sonarsource.com/resources/library/owasp-llm-code-generation/Source snippet
OWASP LLM Top 10: How it Applies to Code GenerationThe OWASP Top 10 for Large Language Model Applications defines ten critical...
-
Source: owasp.org
Link: https://owasp.org/www-project-top-10-for-large-language-model-applications/Source snippet
OWASP FoundationOWASP Top 10 for Large Language Model ApplicationsThe OWASP GenAI Security Project is a global, open-source initiative de...
-
Source: arxiv.org
Title: arXiv Asleep at the Keyboard?
Link: https://arxiv.org/abs/2108.09293Source snippet
Assessing the Security of GitHub...August 20, 2021 — by H Pearce · 2021 · Cited by 936 — In this work, we systematically investigate the...
Published: August 20, 2021
-
Source: techradar.com
Title: Tech Radar Nearly all security bosses are worried about AI safety
Link: https://www.techradar.com/pro/security/nearly-all-security-bosses-are-worried-about-ai-safety-with-a-third-saying-they-still-rely-on-manually-reviewing-code-before-launchSource snippet
An overwhelming 90% of security leaders report active concerns about AI safety, particularly as AI coding tools become more widespread in...
-
Source: owasp.org
Link: https://owasp.org/www-project-top-ten/Source snippet
It represents a broad consensus about the most critical security...Read more...
-
Source: blog.secureflag.com
Title: the risks of generative ai coding in software development
Link: https://blog.secureflag.com/2024/10/16/the-risks-of-generative-ai-coding-in-software-development/Source snippet
The risks of generative AI coding in software development16 Oct 2024 — One of the most noticeable risks with AI-generated code...
-
Source: techradar.com
Title: Tech Radar Why LLMs are plateauing
Link: https://www.techradar.com/pro/why-llms-are-plateauing-and-what-that-means-for-software-securitySource snippet
While LLMs like OpenAI's GPT-5 have shown improved accuracy in producing secure code due to enhanced reasoning capabilities, most models—...
-
Source: veracode.com
Title: securing code and agentic ai risk
Link: https://www.veracode.com/blog/securing-code-and-agentic-ai-risk/Source snippet
Securing Code in the Era of Agentic AI12 Feb 2025 — A study by Stanford University found that 40% of AI-generated code suggestions from G...
-
Source: arxiv.org
Link: https://arxiv.org/abs/2409.19182Source snippet
Artificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code GenerationSeptember 28, 2024...
Published: September 28, 2024
-
Source: arxiv.org
Title: arXiv Security of LLM-generated Code: A Comparative Analysis
Link: https://arxiv.org/abs/2605.23091Source snippet
Security of LLM-generated Code: A Comparative AnalysisMay 21, 2026...
Published: May 21, 2026
-
Source: arxiv.org
Link: https://arxiv.org/abs/2605.05867 -
Source: techradar.com
Link: https://www.techradar.com/pro/nearly-half-of-all-code-generated-by-ai-found-to-contain-security-flaws-even-big-llms-affectedSource snippet
The research analyzed over 100 large language models (LLMs) across 80 coding tasks and revealed no significant improvement in security pe...
-
Source: checkmarx.com
Title: Git Hub Copilot Security: Risks, Built-In Controls, and Best
Link: https://checkmarx.com/learn/ai-security/top-5-github-copilot-security-risks-9-ways-to-mitigate-them/Source snippet
GitHub Copilot Security: Risks, Built-In Controls, and Best...May 11, 2026 — GitHub Copilot integrates with GitHub Advanced Sec...
Published: May 11, 2026
-
Source: veracode.com
Title: genai code security report
Link: https://www.veracode.com/blog/genai-code-security-report/Source snippet
Insights from 2025 GenAI Code Security Report30 Jul 2025 — How secure is code generated by AI? We asked 100+ AI models to write code. Her...
-
Source: owasp.org
Link: https://owasp.org/Source snippet
OWASP Foundation, the Open Source Foundation for...Explore the world of cyber security. Driven by volunteers, OWASP resources are access...
-
Source: genai.owasp.org
Link: https://genai.owasp.org/Source snippet
Gen AI Security Project: HomeOWASP's AI Security Solutions Landscape is a landmark guide for security professionals. It outlines key risk...
-
Source: genai.owasp.org
Title: llm05 supply chain vulnerabilities
Link: https://genai.owasp.org/llmrisk/llm05-supply-chain-vulnerabilities/Source snippet
owasp.orgLLM05:2025 Improper Output HandlingAn LLM is used to generate code... While efficient, this approach risks exposing sensitive i...
-
Source: genai.owasp.org
Title: llm02 insecure output handling
Link: https://genai.owasp.org/llmrisk2023-24/llm02-insecure-output-handling/Source snippet
LLM02: Insecure Output HandlingInsecure Output Handling refers specifically to insufficient validation, sanitization, and handling of the...
-
Source: cyber.fsi.stanford.edu
Link: https://cyber.fsi.stanford.edu/Source snippet
Policy Center | FSI - Stanford UniversityStanford University's research center for the interdisciplinary study of issues at the nexus of...
-
Source: arxiv.org
Link: https://arxiv.org/html/2504.20612v1Source snippet
The lack of expertise from new developers can lead them to...Read more...
-
Source: itpro.com
Link: https://www.itpro.com/software/development/ai-generated-code-is-fast-becoming-the-biggest-enterprise-security-risk-as-teams-struggle-with-the-illusion-of-correctnessSource snippet
According to a Black Duck survey, there was a 12% increase in enterprises evaluating where large language model (LLM)-generated code can...
Additional References
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/401623597_Security_Risks_in_AI-Generated_Code_Security_Risks_in_AI-Generated_Code_Investigating_Vulnerabilities_Introduced_by_AI_Coding_Assistants_A_Research_Study_on_Claude_Code_and_Generative_AI_Development_TSource snippet
(PDF) Security Risks in AI-Generated Code...Mar 6, 2026 — AI coding assistants such as Claude Code, GitHub Copilot, and other generative...
-
Source: linkedin.com
Link: https://www.linkedin.com/posts/secure-coding-hub_github-says-copilot-makes-developers-55-activity-7446536463699206144-00s9Source snippet
AI Code Generation: Security Risks and Reviewer SkillsGitHub says Copilot makes developers 55% faster. Stanford says those same developer...
-
Source: computing.co.uk
Link: https://www.computing.co.uk/news/4061952/ai-assistants-produce-buggy-insecure-codeSource snippet
AI assistants produce buggy, insecure codeA new Stanford University study has found that developers who use AI coding tools like GitHub C...
-
Source: linkedin.com
Link: https://www.linkedin.com/posts/vchirrav_github-vchirravowasp-secure-coding-md-activity-7425549913930887169-42yqSource snippet
Secure Coding with OWASP Rules for AI-Generated CodeThis article explores the common vulnerabilities found in AI-assisted development and...
-
Source: medium.com
Link: https://medium.com/%40victoku1/security-risks-in-llm-powered-applications-a-comprehensive-review-29057f63aabcSource snippet
Security Risks in LLM Powered ApplicationsPrompt injection, agent abuse, and [data leaks]({{ 'data-leaks/' | relative_url }}): a deep dive into securing modern applications bu...
-
Source: softwareseni.com
Link: https://www.softwareseni.com/ai-generated-code-security-risks-why-vulnerabilities-increase-2-74x-and-how-to-prevent-them/Source snippet
Why Vulnerabilities Increase 2.74x and How to Prevent Them17 Feb 2026 — Here, we break down the actual security risks, look at real incid...
-
Source: medium.com
Link: https://medium.com/tech-waves/the-double-edged-sword-of-ai-in-code-generation-exploring-github-copilots-vulnerabilities-21904fc273a6 -
Source: brightsec.com
Link: https://brightsec.com/blog/vulnerabilities-of-coding-with-github-copilot-when-ai-speed-creates-invisible-risk/Source snippet
Bright SecurityVulnerabilities of Coding with GitHub Copilot: When AI...Jan 16, 2026 — Common Vulnerabilities Introduced by Copilot-Gene...
-
Source: techcrunch.com
Title: code generating ai can introduce security vulnerabilities study finds
Link: https://techcrunch.com/2022/12/28/code-generating-ai-can-introduce-security-vulnerabilities-study-finds/Source snippet
Code-generating AI can introduce security vulnerabilities...28 Dec 2022 — A recent study finds that software engineers who use code-gene...
-
Source: oligo.security
Title: owasp top 10 llm updated 2025 examples and mitigation strategies
Link: https://www.oligo.security/academy/owasp-top-10-llm-updated-2025-examples-and-mitigation-strategiesSource snippet
Prompt Injection Attacks · 2. Sensitive Information [Disclosure]({{ 'disclosure/' | relative_url }}) · 3. Supply Chain · 4. Data and Model Poisoning · 5. Improper Output Handl...
Topic Tree



