When working AI code is not safe code

Introduction

Generative AI can produce software that compiles, runs, and even passes basic tests. That apparent success creates a distinctive risk within the broader problem of plausible AI outputs: working code can still be insecure code. A program may perform its intended function while quietly exposing sensitive data, accepting malicious input, bypassing authentication checks, or creating opportunities for future attacks. The danger is not that the code visibly fails, but that it appears finished and trustworthy precisely because it works. Research on AI coding assistants repeatedly finds that functional correctness and security are not the same thing, and that developers often become more confident in code quality even when security weaknesses remain. [arXiv]arxiv.orgDo Users Write More Insecure Code with AI Assistants?18 Dec 2023 — Overall, we find that participants who had access to an AI assist…

Unsafe code illustration 1 As AI-generated code becomes a routine part of software development, understanding this distinction is increasingly important. The central lesson is simple: successful execution is evidence that code performs a task, not evidence that it performs that task safely.

Why Running Code Can Still Be Vulnerable

Traditional software bugs often announce themselves through crashes, error messages, or obvious malfunctions. Security flaws are different. A vulnerable application can function perfectly during normal use while remaining exploitable under specific conditions.

This difference matters because large language models are optimised to generate code that satisfies prompts and produces expected outputs. They are not inherently security auditors. When asked to build a login system, file uploader, API endpoint, or database query, a model may generate code that appears complete but omits safeguards that experienced security engineers would consider essential. [SonarSource]sonarsource.comOWASP LLM Top 10: How it Applies to Code GenerationThe OWASP Top 10 for Large Language Model Applications defines ten critical…

Researchers studying GitHub Copilot found that AI-generated solutions frequently included exploitable weaknesses despite successfully completing programming tasks. In one widely cited study, roughly 40% of generated programs in security-sensitive scenarios contained vulnerabilities or design flaws that attackers could exploit. [arXiv]arxiv.orgarXiv Asleep at the Keyboard?Assessing the Security of GitHub…August 20, 2021 — by H Pearce · 2021 · Cited by 936 — In this work, we systematically investigate the…Published: August 20, 2021

The result is an “illusion of correctness”. Code behaves correctly in expected situations, leading developers to assume it is production-ready. Yet hidden weaknesses remain dormant until an attacker discovers them. Industry security reports increasingly identify this false sense of confidence as one of the most significant risks associated with AI-assisted development. [IT Pro]itpro.comAccording to a Black Duck survey, there was a 12% increase in enterprises evaluating where large language model (LLM)-generated code can…

Common Security Flaws in Generated Code

Missing Input Validation

One of the most common weaknesses is inadequate validation of user input. Generated code may accept data exactly as provided and process it without checking whether the input is malicious, malformed, or excessively large.

For example, a web application may correctly accept form submissions and store them in a database. The feature works. However, if the code fails to sanitise or validate input, attackers may exploit it through injection attacks or unexpected data formats. The application passes functional testing but remains vulnerable. [OWASP Foundation]owasp.orgIt represents a broad consensus about the most critical security…Read more…

Unsafe Authentication and Authorisation

Authentication answers the question “Who are you?” Authorisation answers “What are you allowed to do?” AI-generated code sometimes implements the first while neglecting the second.

A generated API may verify that a user is logged in but fail to confirm whether that user should have access to a particular record or action. During normal testing the application appears correct because authorised users can perform expected tasks. The weakness only becomes visible when someone deliberately attempts unauthorised access. [SonarSource]sonarsource.comOWASP LLM Top 10: How it Applies to Code GenerationThe OWASP Top 10 for Large Language Model Applications defines ten critical…

Insecure File Handling

File uploads and file processing are frequent sources of security problems. AI-generated examples may successfully upload documents, images, or reports while failing to verify file types, restrict storage locations, or prevent dangerous filenames.

The feature works exactly as requested. The security controls that should surround it may be absent. This creates opportunities for attackers to upload malicious files or manipulate server behaviour through unexpected inputs. [SecureFlag]blog.secureflag.comthe risks of generative ai coding in software developmentThe risks of generative AI coding in software development16 Oct 2024 — One of the most noticeable risks with AI-generated code…

Weak Database Access Patterns

Database queries generated by AI can appear clean and efficient while relying on insecure techniques such as string concatenation rather than parameterised queries.

From a user’s perspective the application retrieves and stores data correctly. From a security perspective it may be vulnerable to injection attacks. Researchers and security practitioners repeatedly cite this category as an example of AI producing code that functions correctly but ignores established secure coding practices. [TechRadar]techradar.comTech Radar Why LLMs are plateauingWhile LLMs like OpenAI's GPT-5 have shown improved accuracy in producing secure code due to enhanced reasoning capabilities, most models—…

Unsafe code illustration 2

Lack of Defensive Programming

Human developers often add protective checks for edge cases, unexpected states, and invalid operations. Studies comparing human-written and AI-generated code have found that model-generated code frequently lacks these defensive measures.

Researchers examining generated implementations found examples that compiled and executed successfully yet omitted safeguards against issues such as buffer overflows, integer overflows, null dereferences, and out-of-bounds access. The code fulfilled its primary task while remaining less resilient under unusual or hostile conditions. [arXiv]arxiv.orgArtificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code GenerationSeptember 28, 2024…Published: September 28, 2024

What Research Reveals About the Risk

Evidence from multiple studies points to a recurring pattern: AI-generated code often appears more trustworthy than it deserves.

A Stanford-led study found that participants using AI coding assistants produced less secure code than those working without assistance. Notably, users with AI assistance often believed their code was more secure despite the opposite being true. [arXiv+2Stanford EE Department]arxiv.orgDo Users Write More Insecure Code with AI Assistants?18 Dec 2023 — Overall, we find that participants who had access to an AI assist…

Other research evaluating GitHub Copilot across dozens of security-sensitive programming scenarios reported substantial rates of vulnerable output. The concern was not merely occasional mistakes but the systematic reproduction of insecure patterns learned from public code repositories containing both good and bad examples. [arXiv]arxiv.orgarXiv Asleep at the Keyboard?Assessing the Security of GitHub…August 20, 2021 — by H Pearce · 2021 · Cited by 936 — In this work, we systematically investigate the…Published: August 20, 2021

More recent analyses continue to find vulnerabilities across a wide range of modern coding models. Comparative evaluations of multiple large language models report that all tested systems generated vulnerable code in at least some circumstances, with many weaknesses rated high or critical severity. [arXiv]arxiv.orgarXiv Security of LLM-generated Code: A Comparative AnalysisSecurity of LLM-generated Code: A Comparative AnalysisMay 21, 2026…Published: May 21, 2026

Industry assessments show similar trends. Large-scale testing of generated code has found significant rates of security flaws even when the resulting programs appear production-ready and function as intended. [TechRadar]techradar.comThe research analyzed over 100 large language models (LLMs) across 80 coding tasks and revealed no significant improvement in security pe…

Why These Weaknesses Persist

The underlying reason is straightforward. AI systems learn from existing code rather than from an independent understanding of software security.

Public repositories contain millions of examples of authentication systems, database queries, file upload handlers, and API endpoints. Many of those examples are insecure. When a model predicts likely code patterns, it can reproduce both secure and insecure approaches with similar confidence. [arXiv]arxiv.orgarXiv Asleep at the Keyboard?Assessing the Security of GitHub…August 20, 2021 — by H Pearce · 2021 · Cited by 936 — In this work, we systematically investigate the…Published: August 20, 2021

Another challenge is that security is often invisible in simple demonstrations. A prompt such as “create a login page” rewards visible functionality. The generated answer is more likely to focus on getting the feature running than on implementing rate limiting, session hardening, audit logging, privilege separation, and other protective controls that become important in production environments. [SonarSource]sonarsource.comOWASP LLM Top 10: How it Applies to Code GenerationThe OWASP Top 10 for Large Language Model Applications defines ten critical…

Even attempts to have AI repair its own security issues produce mixed results. Research has shown that prompting models to fix vulnerabilities sometimes removes one weakness while introducing another elsewhere in the codebase. [arXiv]arxiv.orgArtificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code GenerationSeptember 28, 2024…Published: September 28, 2024

Review Steps Before Code Reaches Production

The most effective response is to treat AI-generated code as a draft rather than a finished product.

Before deployment, organisations increasingly apply the same scrutiny they would apply to third-party code:

Conduct security-focused code review. Reviewers should examine authentication, authorisation, input handling, cryptography, logging, and error management rather than focusing solely on functionality.
Use automated security testing. Static analysis, dependency scanning, and vulnerability detection tools can identify weaknesses that functional testing misses.
Test hostile scenarios. Security testing should include malformed inputs, unauthorised requests, privilege escalation attempts, and abuse cases.
Verify dependencies. Generated code often imports libraries automatically. Those dependencies should be reviewed for known vulnerabilities and maintenance status.
Require human approval. Security-critical code should not enter production solely because an AI system generated it or because it passed unit tests. [Checkmarx+2Veracode]checkmarx.comGit Hub Copilot Security: Risks, Built-In Controls, and BestGitHub Copilot Security: Risks, Built-In Controls, and Best…May 11, 2026 — GitHub Copilot integrates with GitHub Advanced Sec…Published: May 11, 2026

Many organisations are also beginning to treat AI-generated code similarly to external software supply-chain components: useful, productive, and potentially valuable, but requiring verification before trust. [TechRadar]techradar.comTech Radar Nearly all security bosses are worried about AI safetyAn overwhelming 90% of security leaders report active concerns about AI safety, particularly as AI coding tools become more widespread in…

Unsafe code illustration 3

The Real Cost of Plausible Code

Within the broader discussion of hallucinated answers and plausible AI outputs, insecure code occupies a special category. Unlike a fabricated fact in a text response, vulnerable software can persist for months or years inside production systems.

The challenge is that AI-generated code often succeeds at the most visible test: it works. Yet security failures usually emerge under conditions that ordinary users never see. Research, industry audits, and security assessments consistently show that functional success should not be mistaken for secure design. AI can accelerate programming dramatically, but speed and correctness do not automatically include safety. [arXiv+2Veracode]arxiv.orgDo Users Write More Insecure Code with AI Assistants?18 Dec 2023 — Overall, we find that participants who had access to an AI assist…

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

I Can't Keep Calm I'm Studying Computer Science - Mug

Search eBay.co.uk: computer science mug

Browse similar on eBay.co.uk

Example eBay listing

Here Sits The Mug Of The World's Best Computer Science Student - Mug

Search eBay.co.uk: computer science mug

Browse similar on eBay.co.uk

Example eBay listing

Relax I'm A Doctor... Of Computer Science - PhD, Doctorate Mug

Search eBay.co.uk: computer science mug

Browse similar on eBay.co.uk

Example eBay listing

Funny Gift Awesome Retired COMPUTER SCIENCE TEACHER Mug | Retirement Humour Idea

Search eBay.co.uk: computer science mug

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: arxiv.org
Link: https://arxiv.org/html/2211.03622v3
Source snippet
Do Users Write More Insecure Code with AI Assistants?18 Dec 2023 — Overall, we find that participants who had access to an AI assist...
Source: ee.stanford.edu
Title: dan boneh and team find relying ai more likely make your code buggier
Link: https://ee.stanford.edu/dan-boneh-and-team-find-relying-ai-more-likely-make-your-code-buggier
Source snippet
Stanford EE DepartmentDan Boneh and team find relying on AI is more likely to...11 Jan 2023 — Their study examined how users interact wi...
Source: sonarsource.com
Link: https://www.sonarsource.com/resources/library/owasp-llm-code-generation/
Source snippet
OWASP LLM Top 10: How it Applies to Code GenerationThe OWASP Top 10 for Large Language Model Applications defines ten critical...
Source: owasp.org
Link: https://owasp.org/www-project-top-10-for-large-language-model-applications/
Source snippet
OWASP FoundationOWASP Top 10 for Large Language Model ApplicationsThe OWASP GenAI Security Project is a global, open-source initiative de...
Source: arxiv.org
Title: arXiv Asleep at the Keyboard?
Link: https://arxiv.org/abs/2108.09293
Source snippet
Assessing the Security of GitHub...August 20, 2021 — by H Pearce · 2021 · Cited by 936 — In this work, we systematically investigate the...

Published: August 20, 2021
Source: techradar.com
Title: Tech Radar Nearly all security bosses are worried about AI safety
Link: https://www.techradar.com/pro/security/nearly-all-security-bosses-are-worried-about-ai-safety-with-a-third-saying-they-still-rely-on-manually-reviewing-code-before-launch
Source snippet
An overwhelming 90% of security leaders report active concerns about AI safety, particularly as AI coding tools become more widespread in...
Source: owasp.org
Link: https://owasp.org/www-project-top-ten/
Source snippet
It represents a broad consensus about the most critical security...Read more...
Source: blog.secureflag.com
Title: the risks of generative ai coding in software development
Link: https://blog.secureflag.com/2024/10/16/the-risks-of-generative-ai-coding-in-software-development/
Source snippet
The risks of generative AI coding in software development16 Oct 2024 — One of the most noticeable risks with AI-generated code...
Source: techradar.com
Title: Tech Radar Why LLMs are plateauing
Link: https://www.techradar.com/pro/why-llms-are-plateauing-and-what-that-means-for-software-security
Source snippet
While LLMs like OpenAI's GPT-5 have shown improved accuracy in producing secure code due to enhanced reasoning capabilities, most models—...
Source: veracode.com
Title: securing code and agentic ai risk
Link: https://www.veracode.com/blog/securing-code-and-agentic-ai-risk/
Source snippet
Securing Code in the Era of Agentic AI12 Feb 2025 — A study by Stanford University found that 40% of AI-generated code suggestions from G...
Source: arxiv.org
Link: https://arxiv.org/abs/2409.19182
Source snippet
Artificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code GenerationSeptember 28, 2024...

Published: September 28, 2024
Source: arxiv.org
Title: arXiv Security of LLM-generated Code: A Comparative Analysis
Link: https://arxiv.org/abs/2605.23091
Source snippet
Security of LLM-generated Code: A Comparative AnalysisMay 21, 2026...

Published: May 21, 2026
Source: arxiv.org
Link: https://arxiv.org/abs/2605.05867
Source: techradar.com
Link: https://www.techradar.com/pro/nearly-half-of-all-code-generated-by-ai-found-to-contain-security-flaws-even-big-llms-affected
Source snippet
The research analyzed over 100 large language models (LLMs) across 80 coding tasks and revealed no significant improvement in security pe...
Source: checkmarx.com
Title: Git Hub Copilot Security: Risks, Built-In Controls, and Best
Link: https://checkmarx.com/learn/ai-security/top-5-github-copilot-security-risks-9-ways-to-mitigate-them/
Source snippet
GitHub Copilot Security: Risks, Built-In Controls, and Best...May 11, 2026 — GitHub Copilot integrates with GitHub Advanced Sec...

Published: May 11, 2026
Source: veracode.com
Title: genai code security report
Link: https://www.veracode.com/blog/genai-code-security-report/
Source snippet
Insights from 2025 GenAI Code Security Report30 Jul 2025 — How secure is code generated by AI? We asked 100+ AI models to write code. Her...
Source: owasp.org
Link: https://owasp.org/
Source snippet
OWASP Foundation, the Open Source Foundation for...Explore the world of cyber security. Driven by volunteers, OWASP resources are access...
Source: genai.owasp.org
Link: https://genai.owasp.org/
Source snippet
Gen AI Security Project: HomeOWASP's AI Security Solutions Landscape is a landmark guide for security professionals. It outlines key risk...
Source: genai.owasp.org
Title: llm05 supply chain vulnerabilities
Link: https://genai.owasp.org/llmrisk/llm05-supply-chain-vulnerabilities/
Source snippet
owasp.orgLLM05:2025 Improper Output HandlingAn LLM is used to generate code... While efficient, this approach risks exposing sensitive i...
Source: genai.owasp.org
Title: llm02 insecure output handling
Link: https://genai.owasp.org/llmrisk2023-24/llm02-insecure-output-handling/
Source snippet
LLM02: Insecure Output HandlingInsecure Output Handling refers specifically to insufficient validation, sanitization, and handling of the...
Source: cyber.fsi.stanford.edu
Link: https://cyber.fsi.stanford.edu/
Source snippet
Policy Center | FSI - Stanford UniversityStanford University's research center for the interdisciplinary study of issues at the nexus of...
Source: arxiv.org
Link: https://arxiv.org/html/2504.20612v1
Source snippet
The lack of expertise from new developers can lead them to...Read more...
Source: itpro.com
Link: https://www.itpro.com/software/development/ai-generated-code-is-fast-becoming-the-biggest-enterprise-security-risk-as-teams-struggle-with-the-illusion-of-correctness
Source snippet
According to a Black Duck survey, there was a 12% increase in enterprises evaluating where large language model (LLM)-generated code can...

Additional References

Source: researchgate.net
Link: https://www.researchgate.net/publication/401623597_Security_Risks_in_AI-Generated_Code_Security_Risks_in_AI-Generated_Code_Investigating_Vulnerabilities_Introduced_by_AI_Coding_Assistants_A_Research_Study_on_Claude_Code_and_Generative_AI_Development_T
Source snippet
(PDF) Security Risks in AI-Generated Code...Mar 6, 2026 — AI coding assistants such as Claude Code, GitHub Copilot, and other generative...
Source: linkedin.com
Link: https://www.linkedin.com/posts/secure-coding-hub_github-says-copilot-makes-developers-55-activity-7446536463699206144-00s9
Source snippet
AI Code Generation: Security Risks and Reviewer SkillsGitHub says Copilot makes developers 55% faster. Stanford says those same developer...
Source: computing.co.uk
Link: https://www.computing.co.uk/news/4061952/ai-assistants-produce-buggy-insecure-code
Source snippet
AI assistants produce buggy, insecure codeA new Stanford University study has found that developers who use AI coding tools like GitHub C...
Source: linkedin.com
Link: https://www.linkedin.com/posts/vchirrav_github-vchirravowasp-secure-coding-md-activity-7425549913930887169-42yq
Source snippet
Secure Coding with OWASP Rules for AI-Generated CodeThis article explores the common vulnerabilities found in AI-assisted development and...
Source: medium.com
Link: https://medium.com/%40victoku1/security-risks-in-llm-powered-applications-a-comprehensive-review-29057f63aabc
Source snippet
Security Risks in LLM Powered ApplicationsPrompt injection, agent abuse, and [data leaks]({{ 'data-leaks/' | relative_url }}): a deep dive into securing modern applications bu...
Source: softwareseni.com
Link: https://www.softwareseni.com/ai-generated-code-security-risks-why-vulnerabilities-increase-2-74x-and-how-to-prevent-them/
Source snippet
Why Vulnerabilities Increase 2.74x and How to Prevent Them17 Feb 2026 — Here, we break down the actual security risks, look at real incid...
Source: medium.com
Link: https://medium.com/tech-waves/the-double-edged-sword-of-ai-in-code-generation-exploring-github-copilots-vulnerabilities-21904fc273a6
Source: brightsec.com
Link: https://brightsec.com/blog/vulnerabilities-of-coding-with-github-copilot-when-ai-speed-creates-invisible-risk/
Source snippet
Bright SecurityVulnerabilities of Coding with GitHub Copilot: When AI...Jan 16, 2026 — Common Vulnerabilities Introduced by Copilot-Gene...
Source: techcrunch.com
Title: code generating ai can introduce security vulnerabilities study finds
Link: https://techcrunch.com/2022/12/28/code-generating-ai-can-introduce-security-vulnerabilities-study-finds/
Source snippet
Code-generating AI can introduce security vulnerabilities...28 Dec 2022 — A recent study finds that software engineers who use code-gene...
Source: oligo.security
Title: owasp top 10 llm updated 2025 examples and mitigation strategies
Link: https://www.oligo.security/academy/owasp-top-10-llm-updated-2025-examples-and-mitigation-strategies
Source snippet
Prompt Injection Attacks · 2. Sensitive Information [Disclosure]({{ 'disclosure/' | relative_url }}) · 3. Supply Chain · 4. Data and Model Poisoning · 5. Improper Output Handl...

When working AI code is not safe code

Introduction

Why Running Code Can Still Be Vulnerable

Common Security Flaws in Generated Code

Missing Input Validation

Unsafe Authentication and Authorisation

Insecure File Handling

Weak Database Access Patterns

Lack of Defensive Programming

What Research Reveals About the Risk

Why These Weaknesses Persist

Review Steps Before Code Reaches Production

The Real Cost of Plausible Code

Further Reading

Software Security

Secure Coding

The Web Application Hacker's Handbook

Clean Code

Marketplace Samples

I Can't Keep Calm I'm Studying Computer Science - Mug

Here Sits The Mug Of The World's Best Computer Science Student - Mug

Relax I'm A Doctor... Of Computer Science - PhD, Doctorate Mug

Funny Gift Awesome Retired COMPUTER SCIENCE TEACHER Mug | Retirement Humour Idea

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 2