Field Experiment · N = 388 Microsoft × Gap Inc. arXiv:2604.08678

How people use AI matters more than whether they have it.

A field experiment with 388 Fortune 500 employees. Microsoft designed it. Their treatment condition — the one that won — was our AI Mindset curriculum. Here's what happened when it went head-to-head with standard Copilot training.

Read the full paper Jump to findings

Microsoft ran the experiment. We wrote the curriculum that won. This page shares what their researchers found — and what they concluded about the training we've been delivering to enterprises for years.

388

Gap Inc. employees across six functional areas

77%

of Mindset-trained employees hit a perfect score — vs. 62% for standard training

8×

more likely to fail to produce anything when forced into a rigid protocol

OR 2.07

odds ratio for top-quality output after a single Mindset session (p = 0.022)

01 · The Setup

Same tool.
Different scaffolding.

194 pairs, randomized. Everyone got Microsoft Copilot. Everyone tackled the same two real-world tasks. The only variable: the structure surrounding their AI use — one behavioral, one cognitive.

Task A · Paired · 30 minutes

The Strategy Task

Each pair produced a one-page "AI Adoption Action Plan" tailored to their organizational function.

The anti-generic constraint: every item in the plan had to include either a specific named noun — a system, a dataset, a stakeholder — or a quantitative metric. No vague consultancy filler permitted.

Task B · Individual

The Communications Task

Each participant drafted a strategic response addressing three distinct stakeholder concerns:

Data governance and transparency
Workforce transition and displacement
Sustainability of AI infrastructure

02 · The Findings

Two interventions.
Two opposite outcomes.

Structure that was meant to help collaboration actively broke it. A brief mental-model shift measurably lifted individual output quality. The gap between the two is the story.

Task A · The failure of mandates

Forcing a protocol made pairs worse, not better.

Pairs assigned a rigid "Create-Out-Loud" protocol scored nearly 5 points lower on quality — and were eight times more likely to produce no document at all.

The protocol required synchronous meetings, verbal discussion, then AI drafting from the transcript. It imposed real coordination costs without adding information the AI could actually use. Less than a quarter of treatment pairs even managed to follow it.

The pattern was uniform across every rubric dimension: opportunities, risks, action plan, insight.

Document Quality Score (out of 22)

Control (natural use)15.63

Treatment (mandated protocol)10.68

27%

Treatment pairs failing to submit anything

Control pairs failing to submit (Fisher p < .001)

454 words

Treatment average document length

740 words

Control average — longer, richer documents

Task B · The success of mindset

A single session of reframing lifted 15 more people to top quality.

77% of Mindset-trained participants hit a perfect score — compared to 62% of controls. Odds ratio of 2.07, significant at p = 0.022.

Partnership training didn't teach anyone a new Copilot feature. It shifted the mental model — from "search engine" to "thought partner." That reframing alone shifted the probability of producing top-tier work.

The effect held at the ≥18 threshold too (OR = 1.87, p = 0.049). A ceiling effect in the AI grader — 68% of documents scored perfectly — made the continuous model noisy, but the binary signal was clear.

Hit-Rate · Perfect Score (20/20)

Control (standard training)62%

Treatment (AI Mindset)77%

+15 pts

More people per session reaching top quality

OR 2.07

Odds ratio, CR2 95% CI [1.12, 3.83]

p = .022

Statistically significant, binary model

0 hours

Additional Copilot features introduced

Watch · the 60-second version

Microsoft's researchers, on what they found.

A short from Microsoft Research on what the study uncovered — including why they concluded the mindset intervention mattered more than the tools themselves.

The full experiment is live on Microsoft's research microsite, with interactive data and supplementary appendices.

Visit the microsite

YouTube Short Under 60 seconds

ACCESS the one-page explainer

03 · Collaboration patterns

Not all "teamwork"
is the same.

The research categorized every treatment pair by what they actually did. Three of the four modes performed no better than working alone. Only one actually worked. The rest were a social placebo.

01 · BASELINE

Natural Use

Pairs left to work however they preferred — the reference point for what "normal" AI adoption looks like.

Reference · 15.6 quality score

02 · INTENDED

True Joint

Partners actually shared a conversation and the prompt. They thought together, then asked together. The highest-reported experience scores.

Best outcome · 12.5 quality

03 · SOCIAL PLACEBO

Parallel Play

Partners met and talked — then prompted the AI individually. Looks like collaboration. Produces outputs indistinguishable from working alone.

9.8 quality · no gain

04 · FRICTION

Stranded

Tech failures. No-shows. Scheduling breakdowns. The collaboration the protocol mandated never happened. 37% of treatment pairs.

12.3 quality · forced friction

04 · The Framework

Scaffolding, layered.

The study mapped two approaches to a three-layer architecture for AI adoption. Each layer depends on the one below — and the layers only work in order. Skip a step and the layer above collapses.

LAYER 03 · PEAK

Behavioral Scaffolding

Mandated interaction protocols. The most ambitious layer — and the hardest to execute. Without the two layers below, it actively backfires. (This is what Task A tested.)

LAYER 02 · MIDDLE

Cognitive Scaffolding

Reframing AI as a thought partner. Low-cost, portable, measurable. Works when the foundation is solid. (This is what Task B tested — and it lifted quality.)

LAYER 01 · BASE

Mechanical Fluency

People can actually operate the tool. Without this, nothing above it functions. The prerequisite no one talks about because it feels too obvious — until it isn't.

05 · The argument

Practitioners saw it
coming.

The pattern the research surfaced echoes what the practitioners inside the study have been saying in the field for years. Mindset first. Mechanics later. Mandates last, if at all.

If you had to choose — an employee with an AI-first mindset navigating an analog process, or an employee with an analog mindset navigating an AI-first process — which would you pick?

Alexia Cambon

Director, Applied Research · Microsoft

When we stop treating AI like a tool and start treating it like a teammate, we unlock its real potential.

Conor Grennan

Founder · AI Mindset

Enable, optimize, reinvent. You enable the people, you optimize the work, and only then do you scale transformation.

Sven Gerjets

Chief Technology Officer · Gap Inc.

With AI Mindset training it was 77 [out of 100 clearing the bar]. That's 15 more people reaching top-quality work — from a single session.

Alex Farach

Senior Data Scientist · Microsoft

What Microsoft wrote about our curriculum

"The sole differentiator was behavioral."

Microsoft Research · Appendix 13 · Scaffolding Human–AI Collaboration

Microsoft's paper is explicit: treatment and control had identical Copilot access. No new features. No technical capabilities. The only thing that differed between the two groups was the training itself — ours versus theirs.

Their finding: participants trained on the AI Mindset curriculum were more than twice as likely to produce top-quality work, with an odds ratio of 2.07 (p = 0.022). The paper concludes that performance gains in AI use stem from how people engage with the system, not from what they know about its features.

In plainer language: the same tool, in the hands of differently-trained people, produces materially different results. That's the lift Microsoft measured. That's the curriculum.

06 · Inside the curriculum

What we actually
teach.

The treatment condition was our curriculum — the same AI Mindset training Conor Grennan has delivered to enterprises for years. No new Copilot features. No technical wizardry. Three behavioral components, all designed to shift how people work with AI.

Cognitive Reframing

Challenge the default assumption that AI is a search box. Reposition it as a collaborator that deserves the same engagement a human thought partner would — context, follow-ups, real-time correction.

Mental Model Replacement

Put the old model in direct contrast with the new: the "search engine" frame (one-shot, extractive) versus the "thought partner" frame (multi-turn, generative). The "smart intern" metaphor makes it actionable.

Guided Practice

Structured exercises in iterative, conversational prompting. The goal isn't to memorize syntax — it's to build fluency with dialogue itself. The rest follows naturally.

07 · The team

The people behind the study.

Microsoft Research

Alex FarachSenior Data Scientist
Corresponding author
Alexia CambonDirector, Applied Research
Lev TankelevitchSenior Researcher
Connie HsuehSenior Researcher
Rebecca JanssenSenior Applied Scientist

Gap Inc. Leadership

Sven GerjetsChief Technology Officer
Mario DiazSenior Manager
Future Skills

Training Curriculum

Conor GrennanFounder
AI Mindset

Ready to run this in your organization?

This is the curriculum
Microsoft tested.

You just read what happened when it ran alongside standard Copilot training. The AI Mindset enterprise program is what we deliver every week — to technology, finance, retail, and professional services teams rolling out AI at scale. The same framework, the same behavioral shift, the same measurable lift.

Explore Enterprise training Read the paper

arXiv:2604.08678 Experiment microsite Watch the short