Skip to content

The Copy-Paste Problem: How Employees Leak IP to Chatbots

Menlo Security logged 155K copy and 313K paste events monthly into AI tools. Samsung’s source code leak started with three employees. Your version is next.

In May 2023, three Samsung semiconductor engineers pasted proprietary source code, meeting notes, and hardware test data into ChatGPT — not as part of a coordinated insider threat, but as part of a normal workday. They were trying to work faster. Samsung discovered the incidents through its own monitoring, banned generative AI tools enterprise-wide, and launched an internal investigation.

The data was gone. OpenAI's terms of service at the time allowed that input to be used in model training. Samsung could not retrieve it, could not identify who else had seen it, and could not assess what competitive damage had been done.

This incident is memorable because it involved a global technology company and made the news. The behavior it represents — employees pasting sensitive work content into free AI tools — is not rare. It is the norm.

The Numbers Behind the Behavior

Menlo Security's 2025 enterprise AI usage report analyzed enterprise AI activity and found approximately 155,000 copy events and 313,000 paste events per month into AI tools across the enterprises they monitored. These are not isolated incidents — they represent continuous, high-volume data transfer from corporate environments into AI platforms.

The same report found that 68% of employees use free-tier AI tools for work tasks, and 57% admit to inputting sensitive data into those tools. The gap between "has an AI policy" and "complies with that AI policy" is measured in hundreds of thousands of monthly data transfer events.

What Gets Pasted

The data categories that flow most commonly from enterprise environments into AI tools:

Source code. Developers use AI coding assistants constantly. Pasting a function for debugging, asking for a code review, or requesting autocomplete for a complex implementation — all of these may involve proprietary code that represents significant competitive and security value. Code that contains credentials, API keys, or internal infrastructure details creates additional exposure.

Internal documents. Meeting notes, strategic plans, project proposals, and internal analyses get pasted for summarization, editing, or translation. These documents often contain information that is sensitive precisely because it is pre-decisional — before it has been cleaned up for external consumption.

Customer data. Sales representatives paste customer emails and contact details for CRM note drafting. Customer success teams paste support tickets and account history for response drafting. Any customer data that is subject to privacy regulation creates compliance exposure the moment it hits a free-tier AI tool.

Financial information. Finance teams paste budget spreadsheets, forecast models, and variance analyses for interpretation and formatting assistance. This is material non-public information in public company contexts.

Legal matters. Legal teams and business unit leaders paste contract drafts, negotiation notes, and regulatory correspondence for review and summarization. Attorney-client privilege may be waived for communications reviewed by third-party AI services.

Why Employees Keep Doing It

The behavior persists because the short-term benefit is immediate and visible: the task gets done faster. The risk is invisible and deferred: nothing obviously bad happens when you paste the document, the data leaves silently, and the consequences — if they materialize at all — appear weeks or months later in a breach notification or a legal dispute.

You cannot solve invisible, deferred risks with policies that employees have to consciously choose to follow at every interaction. The policy approach alone has failed for two years at every organization with a copy-paste AI data leak problem. The solution requires a combination of technical controls and accessible alternatives.

What Actually Reduces the Risk

Provide a Good Approved Alternative

The single most effective intervention is giving employees an AI tool that meets their needs and handles their data appropriately. An enterprise Microsoft 365 Copilot license, or an approved Claude or OpenAI enterprise account with a DPA, processes the same requests with contractual data protections — and removes the primary reason employees use consumer-tier tools.

Employees paste data into free-tier tools because those tools exist and the approved alternatives do not, or the approved alternatives are harder to access. Remove the friction from the approved option, and most of the shadow usage reduces naturally.

Technical Controls at the Content Layer

Network-level blocking of AI tool domains is a blunt instrument that creates as many problems as it solves. Employees will route around it through personal devices, mobile networks, or VPNs.

Content-aware controls are more precise. A CASB (Cloud Access Security Broker) or DLP (Data Loss Prevention) solution configured to detect when regulated data categories are included in AI tool submissions can block or alert on high-risk transfers while allowing routine use to proceed. This approach requires the data classification infrastructure to function — another reason AI governance investment is compounding.

Awareness Training That Explains the Actual Risk

Compliance checkbox training that says "do not paste sensitive data into AI tools" without explaining why does not change behavior at the point of the copy-paste decision. Training that includes:

  • The specific categories of data that create legal exposure if they leave the organization
  • What "incorporated into training data" actually means in practice
  • The Samsung example as a real-world anchor
  • Clear instruction on what tools to use instead

This produces behavior change because it gives employees the information they need to make a different decision at the moment that matters.


If your organization is in the 98% and you want to get ahead of the problem before it becomes a headline, Talk to JP Stratton.


Filed under Shadow AI.

Keep reading

Related insights.

Shadow AI · January 25, 2026

How to Inventory AI in 90 Days: A Practical Discovery Playbook

Before you can govern AI use in your organization, you have to know what AI is actually running. Here's a 90-day playbook to build that inventory from scratch.

Read

Shadow AI · January 10, 2026

Shadow AI Is Already in Your Organization

98% of organizations have unsanctioned AI use. 20% have had a breach linked to it. The question is no longer if it exists — it is what you do about it.

Read