Within Hallucinations

When working AI code is not safe code

Generated code can appear finished because it runs, while hidden flaws in validation, authentication, or file handling remain unresolved.

On this page

  • Why running code can still be vulnerable
  • Common security flaws in generated code
  • Review steps before code reaches production
Preview for When working AI code is not safe code

Introduction

Generative AI can produce software that compiles, runs, and even passes basic tests. That apparent success creates a distinctive risk within the broader problem of plausible AI outputs: working code can still be insecure code. A program may perform its intended function while quietly exposing sensitive data, accepting malicious input, bypassing authentication checks, or creating opportunities for future attacks. The danger is not that the code visibly fails, but that it appears finished and trustworthy precisely because it works. Research on AI coding assistants repeatedly finds that functional correctness and security are not the same thing, and that developers often become more confident in code quality even when security weaknesses remain. [arXiv]arxiv.orgDo Users Write More Insecure Code with AI Assistants?18 Dec 2023 — Overall, we find that participants who had access to an AI assist…

Unsafe code illustration 1 As AI-generated code becomes a routine part of software development, understanding this distinction is increasingly important. The central lesson is simple: successful execution is evidence that code performs a task, not evidence that it performs that task safely.

Why Running Code Can Still Be Vulnerable

Traditional software bugs often announce themselves through crashes, error messages, or obvious malfunctions. Security flaws are different. A vulnerable application can function perfectly during normal use while remaining exploitable under specific conditions.

This difference matters because large language models are optimised to generate code that satisfies prompts and produces expected outputs. They are not inherently security auditors. When asked to build a login system, file uploader, API endpoint, or database query, a model may generate code that appears complete but omits safeguards that experienced security engineers would consider essential. [SonarSource]sonarsource.comOWASP LLM Top 10: How it Applies to Code GenerationThe OWASP Top 10 for Large Language Model Applications defines ten critical…

Researchers studying GitHub Copilot found that AI-generated solutions frequently included exploitable weaknesses despite successfully completing programming tasks. In one widely cited study, roughly 40% of generated programs in security-sensitive scenarios contained vulnerabilities or design flaws that attackers could exploit. [arXiv]arxiv.orgarXiv Asleep at the Keyboard?Assessing the Security of GitHub…August 20, 2021 — by H Pearce · 2021 · Cited by 936 — In this work, we systematically investigate the…Published: August 20, 2021

The result is an “illusion of correctness”. Code behaves correctly in expected situations, leading developers to assume it is production-ready. Yet hidden weaknesses remain dormant until an attacker discovers them. Industry security reports increasingly identify this false sense of confidence as one of the most significant risks associated with AI-assisted development. [IT Pro]itpro.comAccording to a Black Duck survey, there was a 12% increase in enterprises evaluating where large language model (LLM)-generated code can…

Common Security Flaws in Generated Code

Missing Input Validation

One of the most common weaknesses is inadequate validation of user input. Generated code may accept data exactly as provided and process it without checking whether the input is malicious, malformed, or excessively large.

For example, a web application may correctly accept form submissions and store them in a database. The feature works. However, if the code fails to sanitise or validate input, attackers may exploit it through injection attacks or unexpected data formats. The application passes functional testing but remains vulnerable. [OWASP Foundation]owasp.orgIt represents a broad consensus about the most critical security…Read more…

Unsafe Authentication and Authorisation

Authentication answers the question “Who are you?” Authorisation answers “What are you allowed to do?” AI-generated code sometimes implements the first while neglecting the second.

A generated API may verify that a user is logged in but fail to confirm whether that user should have access to a particular record or action. During normal testing the application appears correct because authorised users can perform expected tasks. The weakness only becomes visible when someone deliberately attempts unauthorised access. [SonarSource]sonarsource.comOWASP LLM Top 10: How it Applies to Code GenerationThe OWASP Top 10 for Large Language Model Applications defines ten critical…

Insecure File Handling

File uploads and file processing are frequent sources of security problems. AI-generated examples may successfully upload documents, images, or reports while failing to verify file types, restrict storage locations, or prevent dangerous filenames.

The feature works exactly as requested. The security controls that should surround it may be absent. This creates opportunities for attackers to upload malicious files or manipulate server behaviour through unexpected inputs. [SecureFlag]blog.secureflag.comthe risks of generative ai coding in software developmentThe risks of generative AI coding in software development16 Oct 2024 — One of the most noticeable risks with AI-generated code…

Weak Database Access Patterns

Database queries generated by AI can appear clean and efficient while relying on insecure techniques such as string concatenation rather than parameterised queries.

From a user’s perspective the application retrieves and stores data correctly. From a security perspective it may be vulnerable to injection attacks. Researchers and security practitioners repeatedly cite this category as an example of AI producing code that functions correctly but ignores established secure coding practices. [TechRadar]techradar.comTech Radar Why LLMs are plateauingWhile LLMs like OpenAI's GPT-5 have shown improved accuracy in producing secure code due to enhanced reasoning capabilities, most models—…

Unsafe code illustration 2

Lack of Defensive Programming

Human developers often add protective checks for edge cases, unexpected states, and invalid operations. Studies comparing human-written and AI-generated code have found that model-generated code frequently lacks these defensive measures.

Researchers examining generated implementations found examples that compiled and executed successfully yet omitted safeguards against issues such as buffer overflows, integer overflows, null dereferences, and out-of-bounds access. The code fulfilled its primary task while remaining less resilient under unusual or hostile conditions. [arXiv]arxiv.orgArtificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code GenerationSeptember 28, 2024…Published: September 28, 2024

What Research Reveals About the Risk

Evidence from multiple studies points to a recurring pattern: AI-generated code often appears more trustworthy than it deserves.

A Stanford-led study found that participants using AI coding assistants produced less secure code than those working without assistance. Notably, users with AI assistance often believed their code was more secure despite the opposite being true. [arXiv+2Stanford EE Department]arxiv.orgDo Users Write More Insecure Code with AI Assistants?18 Dec 2023 — Overall, we find that participants who had access to an AI assist…

Other research evaluating GitHub Copilot across dozens of security-sensitive programming scenarios reported substantial rates of vulnerable output. The concern was not merely occasional mistakes but the systematic reproduction of insecure patterns learned from public code repositories containing both good and bad examples. [arXiv]arxiv.orgarXiv Asleep at the Keyboard?Assessing the Security of GitHub…August 20, 2021 — by H Pearce · 2021 · Cited by 936 — In this work, we systematically investigate the…Published: August 20, 2021

More recent analyses continue to find vulnerabilities across a wide range of modern coding models. Comparative evaluations of multiple large language models report that all tested systems generated vulnerable code in at least some circumstances, with many weaknesses rated high or critical severity. [arXiv]arxiv.orgarXiv Security of LLM-generated Code: A Comparative AnalysisSecurity of LLM-generated Code: A Comparative AnalysisMay 21, 2026…Published: May 21, 2026

Industry assessments show similar trends. Large-scale testing of generated code has found significant rates of security flaws even when the resulting programs appear production-ready and function as intended. [TechRadar]techradar.comThe research analyzed over 100 large language models (LLMs) across 80 coding tasks and revealed no significant improvement in security pe…

Why These Weaknesses Persist

The underlying reason is straightforward. AI systems learn from existing code rather than from an independent understanding of software security.

Public repositories contain millions of examples of authentication systems, database queries, file upload handlers, and API endpoints. Many of those examples are insecure. When a model predicts likely code patterns, it can reproduce both secure and insecure approaches with similar confidence. [arXiv]arxiv.orgarXiv Asleep at the Keyboard?Assessing the Security of GitHub…August 20, 2021 — by H Pearce · 2021 · Cited by 936 — In this work, we systematically investigate the…Published: August 20, 2021

Another challenge is that security is often invisible in simple demonstrations. A prompt such as “create a login page” rewards visible functionality. The generated answer is more likely to focus on getting the feature running than on implementing rate limiting, session hardening, audit logging, privilege separation, and other protective controls that become important in production environments. [SonarSource]sonarsource.comOWASP LLM Top 10: How it Applies to Code GenerationThe OWASP Top 10 for Large Language Model Applications defines ten critical…

Even attempts to have AI repair its own security issues produce mixed results. Research has shown that prompting models to fix vulnerabilities sometimes removes one weakness while introducing another elsewhere in the codebase. [arXiv]arxiv.orgArtificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code GenerationSeptember 28, 2024…Published: September 28, 2024

Review Steps Before Code Reaches Production

The most effective response is to treat AI-generated code as a draft rather than a finished product.

Before deployment, organisations increasingly apply the same scrutiny they would apply to third-party code:

  • Conduct security-focused code review. Reviewers should examine authentication, authorisation, input handling, cryptography, logging, and error management rather than focusing solely on functionality.
  • Use automated security testing. Static analysis, dependency scanning, and vulnerability detection tools can identify weaknesses that functional testing misses.
  • Test hostile scenarios. Security testing should include malformed inputs, unauthorised requests, privilege escalation attempts, and abuse cases.
  • Verify dependencies. Generated code often imports libraries automatically. Those dependencies should be reviewed for known vulnerabilities and maintenance status.
  • Require human approval. Security-critical code should not enter production solely because an AI system generated it or because it passed unit tests. [Checkmarx+2Veracode]checkmarx.comGit Hub Copilot Security: Risks, Built-In Controls, and BestGitHub Copilot Security: Risks, Built-In Controls, and Best…May 11, 2026 — GitHub Copilot integrates with GitHub Advanced Sec…Published: May 11, 2026

Many organisations are also beginning to treat AI-generated code similarly to external software supply-chain components: useful, productive, and potentially valuable, but requiring verification before trust. [TechRadar]techradar.comTech Radar Nearly all security bosses are worried about AI safetyAn overwhelming 90% of security leaders report active concerns about AI safety, particularly as AI coding tools become more widespread in…

Unsafe code illustration 3

The Real Cost of Plausible Code

Within the broader discussion of hallucinated answers and plausible AI outputs, insecure code occupies a special category. Unlike a fabricated fact in a text response, vulnerable software can persist for months or years inside production systems.

The challenge is that AI-generated code often succeeds at the most visible test: it works. Yet security failures usually emerge under conditions that ordinary users never see. Research, industry audits, and security assessments consistently show that functional success should not be mistaken for secure design. AI can accelerate programming dramatically, but speed and correctness do not automatically include safety. [arXiv+2Veracode]arxiv.orgDo Users Write More Insecure Code with AI Assistants?18 Dec 2023 — Overall, we find that participants who had access to an AI assist…

Amazon book picks

Further Reading

Books and field guides related to When working AI code is not safe code. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: arxiv.org
    Link: https://arxiv.org/html/2211.03622v3
    Source snippet

    Do Users Write More Insecure Code with AI Assistants?18 Dec 2023 — Overall, we find that participants who had access to an AI assist...

  2. Source: ee.stanford.edu
    Title: dan boneh and team find relying ai more likely make your code buggier
    Link: https://ee.stanford.edu/dan-boneh-and-team-find-relying-ai-more-likely-make-your-code-buggier
    Source snippet

    Stanford EE DepartmentDan Boneh and team find relying on AI is more likely to...11 Jan 2023 — Their study examined how users interact wi...

  3. Source: sonarsource.com
    Link: https://www.sonarsource.com/resources/library/owasp-llm-code-generation/
    Source snippet

    OWASP LLM Top 10: How it Applies to Code GenerationThe OWASP Top 10 for Large Language Model Applications defines ten critical...

  4. Source: owasp.org
    Link: https://owasp.org/www-project-top-10-for-large-language-model-applications/
    Source snippet

    OWASP FoundationOWASP Top 10 for Large Language Model ApplicationsThe OWASP GenAI Security Project is a global, open-source initiative de...

  5. Source: arxiv.org
    Title: arXiv Asleep at the Keyboard?
    Link: https://arxiv.org/abs/2108.09293
    Source snippet

    Assessing the Security of GitHub...August 20, 2021 — by H Pearce · 2021 · Cited by 936 — In this work, we systematically investigate the...

    Published: August 20, 2021

  6. Source: techradar.com
    Title: Tech Radar Nearly all security bosses are worried about AI safety
    Link: https://www.techradar.com/pro/security/nearly-all-security-bosses-are-worried-about-ai-safety-with-a-third-saying-they-still-rely-on-manually-reviewing-code-before-launch
    Source snippet

    An overwhelming 90% of security leaders report active concerns about AI safety, particularly as AI coding tools become more widespread in...

  7. Source: owasp.org
    Link: https://owasp.org/www-project-top-ten/
    Source snippet

    It represents a broad consensus about the most critical security...Read more...

  8. Source: blog.secureflag.com
    Title: the risks of generative ai coding in software development
    Link: https://blog.secureflag.com/2024/10/16/the-risks-of-generative-ai-coding-in-software-development/
    Source snippet

    The risks of generative AI coding in software development16 Oct 2024 — One of the most noticeable risks with AI-generated code...

  9. Source: techradar.com
    Title: Tech Radar Why LLMs are plateauing
    Link: https://www.techradar.com/pro/why-llms-are-plateauing-and-what-that-means-for-software-security
    Source snippet

    While LLMs like OpenAI's GPT-5 have shown improved accuracy in producing secure code due to enhanced reasoning capabilities, most models—...

  10. Source: veracode.com
    Title: securing code and agentic ai risk
    Link: https://www.veracode.com/blog/securing-code-and-agentic-ai-risk/
    Source snippet

    Securing Code in the Era of Agentic AI12 Feb 2025 — A study by Stanford University found that 40% of AI-generated code suggestions from G...

  11. Source: arxiv.org
    Link: https://arxiv.org/abs/2409.19182
    Source snippet

    Artificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code GenerationSeptember 28, 2024...

    Published: September 28, 2024

  12. Source: arxiv.org
    Title: arXiv Security of LLM-generated Code: A Comparative Analysis
    Link: https://arxiv.org/abs/2605.23091
    Source snippet

    Security of LLM-generated Code: A Comparative AnalysisMay 21, 2026...

    Published: May 21, 2026

  13. Source: arxiv.org
    Link: https://arxiv.org/abs/2605.05867

  14. Source: techradar.com
    Link: https://www.techradar.com/pro/nearly-half-of-all-code-generated-by-ai-found-to-contain-security-flaws-even-big-llms-affected
    Source snippet

    The research analyzed over 100 large language models (LLMs) across 80 coding tasks and revealed no significant improvement in security pe...

  15. Source: checkmarx.com
    Title: Git Hub Copilot Security: Risks, Built-In Controls, and Best
    Link: https://checkmarx.com/learn/ai-security/top-5-github-copilot-security-risks-9-ways-to-mitigate-them/
    Source snippet

    GitHub Copilot Security: Risks, Built-In Controls, and Best...May 11, 2026 — GitHub Copilot integrates with GitHub Advanced Sec...

    Published: May 11, 2026

  16. Source: veracode.com
    Title: genai code security report
    Link: https://www.veracode.com/blog/genai-code-security-report/
    Source snippet

    Insights from 2025 GenAI Code Security Report30 Jul 2025 — How secure is code generated by AI? We asked 100+ AI models to write code. Her...

  17. Source: owasp.org
    Link: https://owasp.org/
    Source snippet

    OWASP Foundation, the Open Source Foundation for...Explore the world of cyber security. Driven by volunteers, OWASP resources are access...

  18. Source: genai.owasp.org
    Link: https://genai.owasp.org/
    Source snippet

    Gen AI Security Project: HomeOWASP's AI Security Solutions Landscape is a landmark guide for security professionals. It outlines key risk...

  19. Source: genai.owasp.org
    Title: llm05 supply chain vulnerabilities
    Link: https://genai.owasp.org/llmrisk/llm05-supply-chain-vulnerabilities/
    Source snippet

    owasp.orgLLM05:2025 Improper Output HandlingAn LLM is used to generate code... While efficient, this approach risks exposing sensitive i...

  20. Source: genai.owasp.org
    Title: llm02 insecure output handling
    Link: https://genai.owasp.org/llmrisk2023-24/llm02-insecure-output-handling/
    Source snippet

    LLM02: Insecure Output HandlingInsecure Output Handling refers specifically to insufficient validation, sanitization, and handling of the...

  21. Source: cyber.fsi.stanford.edu
    Link: https://cyber.fsi.stanford.edu/
    Source snippet

    Policy Center | FSI - Stanford UniversityStanford University's research center for the interdisciplinary study of issues at the nexus of...

  22. Source: arxiv.org
    Link: https://arxiv.org/html/2504.20612v1
    Source snippet

    The lack of expertise from new developers can lead them to...Read more...

  23. Source: itpro.com
    Link: https://www.itpro.com/software/development/ai-generated-code-is-fast-becoming-the-biggest-enterprise-security-risk-as-teams-struggle-with-the-illusion-of-correctness
    Source snippet

    According to a Black Duck survey, there was a 12% increase in enterprises evaluating where large language model (LLM)-generated code can...

Additional References

  1. Source: researchgate.net
    Link: https://www.researchgate.net/publication/401623597_Security_Risks_in_AI-Generated_Code_Security_Risks_in_AI-Generated_Code_Investigating_Vulnerabilities_Introduced_by_AI_Coding_Assistants_A_Research_Study_on_Claude_Code_and_Generative_AI_Development_T
    Source snippet

    (PDF) Security Risks in AI-Generated Code...Mar 6, 2026 — AI coding assistants such as Claude Code, GitHub Copilot, and other generative...

  2. Source: linkedin.com
    Link: https://www.linkedin.com/posts/secure-coding-hub_github-says-copilot-makes-developers-55-activity-7446536463699206144-00s9
    Source snippet

    AI Code Generation: Security Risks and Reviewer SkillsGitHub says Copilot makes developers 55% faster. Stanford says those same developer...

  3. Source: computing.co.uk
    Link: https://www.computing.co.uk/news/4061952/ai-assistants-produce-buggy-insecure-code
    Source snippet

    AI assistants produce buggy, insecure codeA new Stanford University study has found that developers who use AI coding tools like GitHub C...

  4. Source: linkedin.com
    Link: https://www.linkedin.com/posts/vchirrav_github-vchirravowasp-secure-coding-md-activity-7425549913930887169-42yq
    Source snippet

    Secure Coding with OWASP Rules for AI-Generated CodeThis article explores the common vulnerabilities found in AI-assisted development and...

  5. Source: medium.com
    Link: https://medium.com/%40victoku1/security-risks-in-llm-powered-applications-a-comprehensive-review-29057f63aabc
    Source snippet

    Security Risks in LLM Powered ApplicationsPrompt injection, agent abuse, and [data leaks]({{ 'data-leaks/' | relative_url }}): a deep dive into securing modern applications bu...

  6. Source: softwareseni.com
    Link: https://www.softwareseni.com/ai-generated-code-security-risks-why-vulnerabilities-increase-2-74x-and-how-to-prevent-them/
    Source snippet

    Why Vulnerabilities Increase 2.74x and How to Prevent Them17 Feb 2026 — Here, we break down the actual security risks, look at real incid...

  7. Source: medium.com
    Link: https://medium.com/tech-waves/the-double-edged-sword-of-ai-in-code-generation-exploring-github-copilots-vulnerabilities-21904fc273a6

  8. Source: brightsec.com
    Link: https://brightsec.com/blog/vulnerabilities-of-coding-with-github-copilot-when-ai-speed-creates-invisible-risk/
    Source snippet

    Bright SecurityVulnerabilities of Coding with GitHub Copilot: When AI...Jan 16, 2026 — Common Vulnerabilities Introduced by Copilot-Gene...

  9. Source: techcrunch.com
    Title: code generating ai can introduce security vulnerabilities study finds
    Link: https://techcrunch.com/2022/12/28/code-generating-ai-can-introduce-security-vulnerabilities-study-finds/
    Source snippet

    Code-generating AI can introduce security vulnerabilities...28 Dec 2022 — A recent study finds that software engineers who use code-gene...

  10. Source: oligo.security
    Title: owasp top 10 llm updated 2025 examples and mitigation strategies
    Link: https://www.oligo.security/academy/owasp-top-10-llm-updated-2025-examples-and-mitigation-strategies
    Source snippet

    Prompt Injection Attacks · 2. Sensitive Information [Disclosure]({{ 'disclosure/' | relative_url }}) · 3. Supply Chain · 4. Data and Model Poisoning · 5. Improper Output Handl...

Topic Tree

Follow this branch

Parent topic

Hallucinations When Fluent AI Answers Are Wrong

Related pages 2