What is Cowork?
Cowork is not a chatbot. It is an autonomous agent that lives inside the Claude Desktop app.
Where regular Claude chat waits for you to type a message after every step, Cowork takes a goal and works through it independently. It plans tasks, reads your files, writes documents, runs scripts, and delivers finished work to your folders.
It connects to the tools you already use: Google Drive, Gmail, Notion, Calendar, Slack. It can read your spreadsheets, draft your emails, and build reports.
Think of it as a junior analyst who can do the legwork but needs your judgement on every decision that matters.
For compensation teams, that means less time on the repetitive mechanics (drafting cycle comms, summarising market data, building calibration packs) and more time on the work that actually requires a senior practitioner's brain.
Chat vs Cowork
| Chat | Cowork | |
|---|---|---|
| How it works | You ask, it answers. Each step needs a new prompt. | Describe the outcome. Claude plans and executes autonomously. |
| Files | Max 20 files per conversation, 30 MB each. Everything uploads to the cloud. | Reads directly from your computer. No file limit, no size cap. |
| Context | Smaller context window. Starts compacting your conversation and losing important details sooner. | Larger context window. Holds complex, multi-step work without dropping what matters. |
| Output | Text in the chat window. You copy, paste, and format it yourself. | Ready-to-use files delivered directly to your folders (.docx, .xlsx, .pdf). |
| Code | Cannot execute code | Runs Python scripts in a sandboxed VM on your machine |
| Memory | Projects offer pinned files and basic memory across conversations. | File-based memory that Claude reads and writes to. Compounds every time you correct it. |
| Scheduling | Not available | Automated tasks on a daily, weekly, or custom schedule |
When to use which
Use Chat for quick questions, brainstorming, and one-off writing tasks where you will act on the answer yourself.
Use Cowork the moment you need to work with real files, hold a long analytical thread together, or get a finished deliverable back. If you find yourself copying answers out of Chat and formatting them in Excel, that is a sign you should be using Cowork.
Getting Started (5 minutes)
- Open Claude Desktop and select Cowork from the interface
- Go to Settings and connect your tools (Google Drive, Gmail, Calendar, Notion, whatever you use day to day)
- Create your first Project (see the next section for why this matters)
- Upload your key documents: comp philosophy, salary bands, merit guidelines, cycle timeline, manager FAQ
That is it. You are set up.
Projects vs Skills: The Setup That Changes Everything
This is where most people get Cowork wrong. They open it, type a question, get a decent answer, and move on. Then next week they open it again and it has forgotten everything. That is because they skipped the setup that makes Cowork genuinely useful.
✓ Custom instructions
✓ Persistent chat history
✓ One per workstream
✓ Activate automatically
✓ Work across all projects
✓ Write once, use forever
Projects = What to know
A Project is a dedicated workspace where everything about a topic lives together. Upload your documents, write custom instructions, and every conversation inside that Project has that context automatically. Chat history persists across sessions. When your uploaded documents exceed what Claude can hold in memory, it automatically searches and retrieves only the relevant parts.
For comp: Create a Project called "Q2 2026 Merit Cycle." Upload your comp philosophy, salary bands, merit matrix, cycle timeline, and manager FAQ. Every time you open that Project, Claude already knows your framework, your rules, your timeline. No re-explaining.
File types you can upload: PDF, DOCX, CSV, TXT, HTML, XLSX, JSON, and more. Up to 30MB per file, unlimited files per Project.
Skills = How to do things
A Skill is a set of reusable instructions that tells Claude how to handle a specific task the same way every time. Write it once, and it activates automatically whenever the task comes up, across any Project, any conversation.
For comp: Create a Skill called "Comp Cycle Email Drafter" with instructions like: use this tone, follow this structure, reference the guidelines in the Project, keep emails under 200 words, always include key dates and the approval workflow link. Now every time you ask Claude to draft a cycle email, it follows your playbook without being reminded.
How to create a Skill
The easiest way: type /skill-creator in Cowork. Claude will interview you about what the skill should do, draft it, generate test prompts, and iterate until it works. No coding needed.
Or create one manually: make a folder at ~/.claude/skills/your-skill-name/ with a SKILL.md file inside. The file contains a description (when to activate) and step-by-step instructions (what to do).
The combination is the real unlock
Your Project holds the context (what is our comp philosophy, what are the salary bands, when are the deadlines). Your Skill holds the procedure (how to draft the email, how to format the calibration pack, how to structure the manager guide). Together, Claude produces work that is consistent, context-aware, and aligned with how your team actually operates.
This is what separates a well-set-up Cowork from a Custom GPT. A Custom GPT gives you fixed instructions and limited memory. A Project + Skill setup gives you a living knowledge base paired with repeatable procedures that improve as you correct them.
Persistent Memory: The System That Makes It Compound
This is the section most guides skip entirely, and it is the single biggest reason why some people get transformative results from Claude while others keep re-explaining themselves every session.
Compensation work is cumulative. The decisions you made about salary bands in January inform the merit cycle in March which feeds the pay equity review in June. If your AI tool forgets everything between sessions, you are re-teaching it every time. That defeats the purpose.
How I set up persistent memory (my approach)
What follows is how I personally manage persistent memory across my projects. Others may do it differently and I am open to suggestions. But this approach has worked well across consulting engagements, community work, and tool building, and it is the pattern I would recommend as a starting point.
The system has three files working together, each at a different scope:
File 1: Global CLAUDE.md — how Claude works with you
This file lives at ~/.claude/CLAUDE.md and applies across ALL your projects and conversations. It defines how Claude works with you, not what it works on.
What goes here: - Workflow rules: plan before building, verify before marking done, stop and re-plan if something goes sideways - Self-improvement rules: after any correction, update lessons.md with the pattern - Communication preferences: explain things simply, no jargon, show working demos - Task management patterns: write plans to todo.md, track progress, capture lessons - Core principles: simplicity first, find root causes, minimal impact
Think of this as your operating system. Anthropic explains how CLAUDE.md and memory work in their official documentation. We have also created a starter template adapted for compensation professionals that you can download and customise.
This file should NOT contain salary band data, comp philosophy, or anything specific to a project. It is purely about how Claude behaves.
File 2: Project-scoped memory — what Claude knows about this workstream
Each project gets its own set of memory files containing the context specific to that workstream. This is where your comp-specific knowledge lives:
- Your compensation philosophy and principles
- Naming conventions (how you refer to salary bands, job levels, pay ranges)
- Rules that should never be broken ("always show compa-ratios to two decimal places", "never include contractors in the merit pool")
- Data structure details (which columns mean what, which fields to ignore)
- Preferred output formats (UK spelling, specific table layouts, chart styles)
- Feedback you have given Claude that should persist: voice rules, design preferences, workflow discipline
I organise memory files by topic, not chronologically. For example, my Range community project has separate memory files for content and copy rules, design and visual production, workflow discipline, and Notion/API patterns. Each file has a name, a description, and the actual rules. A central index file (MEMORY.md) lists them all with one-line summaries so Claude knows where to look.
This is also where auto-memory lives. When you correct Claude during a conversation ("we use the smoothed P50, not the raw P50"), it saves that correction as a persistent note and applies it in future sessions without being asked. Your corrections compound over time.
File 3: Lessons learned (tasks/lessons.md) — the self-improvement loop
This is the file that most people do not know about, and it is the most powerful one for compensation work.
Create a tasks/lessons.md file in your project. Every time something goes wrong or you discover a better approach, add it to the file. Claude reads this at the start of every session alongside your other instructions.
What goes in lessons.md:
- Mistakes Claude made that you corrected: "The regression output showed a 12% gap but used unadjusted data. Always use the adjusted model first."
- Workflow discoveries: "Running pay equity by country first, then globally, produces more actionable results than the reverse."
- Data quirks specific to your organisation: "The APAC data uses calendar year tenure, not anniversary date. Adjust before comparing to EMEA."
- Process rules learned the hard way: "Always validate the merit matrix totals against the budget before generating manager packs. Last cycle the matrix was 0.3% over budget and it cascaded into 40 wrong recommendations."
Why this matters for comp specifically:
Compensation work is full of institutional knowledge that lives in people's heads. The reason your senior analyst knows not to include the Hong Kong office in the APAC pay equity cut is because three years ago it produced misleading results due to the statutory MPF structure. That knowledge either dies when they leave or lives in a lessons file that Claude reads every session.
This is how you build an AI assistant that gets smarter with every cycle, not one that starts from zero each time.
Real example from practice:
A compensation consultant working across multiple clients uses a separate project per client, each with its own memory files and lessons file. Client A's lessons file says "always use the Mercer Global Grade structure, not the local IPE levels." Client B's says "this company's job architecture has dual-track career paths above Grade 8, always show both management and IC bands." Client C's says "the CFO wants scenario models in exactly this format: three columns, base/mid/stretch, with headcount impact." None of this is generic knowledge. All of it makes Claude dramatically more useful from session to session.
Why three files, not one
The mistake most people make is putting everything in one place. Your global CLAUDE.md should not contain salary band data. Your merit cycle project should not contain workflow orchestration rules. Separating the three layers means:
- Global CLAUDE.md stays clean and applies everywhere
- Project memory keeps each workstream's context isolated (what you learn in the merit cycle project does not bleed into job architecture)
- Lessons.md captures what went wrong so it never happens again
This is my approach. If you find a better one, I would genuinely like to hear about it.
The end-of-session habit (this is what makes it work)
The memory system only compounds if you actually update it. The good news: you do not need to write the updates yourself. Claude does it for you. You just need to ask.
At the end of every session, before you close the window, say something like:
That is it. One sentence. Claude reviews the session, identifies what is worth remembering, and writes it to the right files. Next session, it reads those files and applies everything automatically.
Some examples of what Claude captures when you ask:
- "The smoothed P50 is the default, not the raw P50." (a correction you made once, never need to make again)
- "Always validate merit matrix totals against the budget before generating manager packs." (a process discovery from a near-miss)
- "The APAC data uses calendar year tenure. Adjust before comparing to EMEA." (a data quirk that would have caused a wrong analysis)
- "Load voice DNA and brand rules before drafting any external communication." (a workflow rule learned after a draft came back too generic)
None of these require you to write anything manually. You correct Claude during the session, and at the end you ask it to save what it learned. The files update themselves.
A note on AGENTS.md
If you already use other AI coding tools (Cursor, GitHub Copilot, or similar), you may have an AGENTS.md file in your project. Claude Code reads CLAUDE.md, not AGENTS.md. But you can import your existing AGENTS.md into your CLAUDE.md so both tools read the same instructions without duplicating them. Just add @AGENTS.md at the top of your CLAUDE.md file.
How memory scoping works
Memory is scoped to the Project. What Claude learns in your merit cycle Project does not carry over to your job architecture Project. This is a feature, not a limitation. It keeps contexts clean and prevents cross-contamination.
But it means you should think about your Project structure upfront:
- One Project per comp cycle (Q2 2026 Merit Cycle) — keeps everything about that cycle together, lessons included
- One Project per ongoing workstream (Job Architecture, Pay Equity, Benchmarking) — for work that spans cycles
- One Project per client (if you are a consultant) — completely separate contexts, separate lessons, separate preferences
Persistent memory for job matching
Job matching is one of the best examples of why persistent memory matters. When you match roles to survey codes, you build up a body of knowledge about how your organisation's roles map to external benchmarks. Without memory, you re-explain these mappings every time.
With a properly set up Project:
- Upload your job architecture document with all role definitions
- Upload previous matching decisions (which roles mapped to which survey codes and why)
- Add lessons as you go: "Our Software Engineer III includes team lead responsibilities, so match it one level higher than the default suggestion"
- Claude remembers these decisions and applies them consistently in future sessions
Over time, your job matching Project becomes a living knowledge base of every matching decision you have ever made, with rationale. That is something no spreadsheet can do.
Case Study: What we learned building a job matching tool
We built a tool that matches roles across multiple compensation survey providers. The experience taught us something important about how AI handles job matching, and it directly affects how you should use Cowork for this workflow.
Pure semantic matching is not enough. We started with sentence embeddings (vector representations of job descriptions) and cosine similarity to find matches. This works well for capturing meaning nuance — it understands that "engineer" and "technical specialist" can refer to similar roles even when the words are different.
But for compensation survey matching, meaning alone is not sufficient. Structure matters too. A Finance Business Partner and a Compensation Business Partner might have semantically similar descriptions (both "partner with senior leadership on strategic decisions") but belong in completely different job families. A Technology role in a bank and a Technology role in a software company might have identical descriptions but sit in different survey cuts.
The approach that worked was a weighted hybrid:
- Description similarity (45% weight) — semantic embeddings capture the meaning of what the role does
- Function matching (40% weight) — fuzzy matching on the job family or function name prevents cross-family false matches
- Title similarity (15% weight) — catches cases where descriptions are vague but titles are specific
- Industry filtering — only matches roles that share at least one industry classification, preventing cross-industry false matches
What this means for your Cowork setup: When you ask Claude to help with job matching, do not just paste a job description and ask for the closest match. Give it context: the job family, the level, the industry, the survey you are matching against. The more structural context you provide, the better the match quality. And save every matching decision in your lessons file so Claude applies the same logic next time.
Confidence scores are more useful than binary matches. Our tool breaks every match into component scores (title, description, function) so you can see why it suggested a particular match. When using Cowork for matching, ask it to explain its confidence and show which aspects of the role drove the suggestion. This makes review dramatically faster than checking every match blind.
Bidirectional matching matters. Matching from one survey provider to another produces different results than matching in the opposite direction, because survey coverage is asymmetric. When you run matches in Cowork, always check both directions.
The Maths Problem: Why LLMs Get Numbers Wrong (And What To Do About It)
This is the concern every compensation practitioner raises, and they are right to raise it.
Claude guesses. The number looks right. It might not be.
Real maths. Auditable. Same input, same output, every time.
The core issue
Large language models predict the next word in a sequence. They do not compute. When you ask Claude to calculate a compa-ratio in chat, it is making a statistical guess about what the right number looks like based on patterns in its training data. It is not running arithmetic.
For simple calculations (what is 85,000 divided by 90,000?) it is usually right. For anything involving multiple variables, regressions, weighted averages across datasets, or large volumes of data, it is unreliable. The model might give you a plausible-looking number that is subtly wrong, and you would have no way to know without checking it manually.
This matters enormously for compensation work because:
- Pay equity analysis requires regression models controlling for multiple variables (job level, tenure, performance, location). Getting the coefficients wrong changes the conclusion.
- Merit budget modelling involves multiplying employee-level calculations across hundreds or thousands of records. A rounding error in the formula compounds across the population.
- Compa-ratio analysis needs precise division and comparison across pay ranges that may have different structures by level or geography.
- Scenario modelling for executives requires exact maths because the numbers get presented to boards and compensation committees. A wrong number in a board pack is a career-defining mistake.
The solve
Do not let the model do the maths. Let the model write the code that does the maths.
When you describe an analysis to Cowork ("calculate compa-ratios for all employees against the midpoint of their grade, flag anyone below 0.80 or above 1.20"), it writes a Python script and runs it inside a sandboxed virtual machine on your computer. The script uses proper mathematical libraries:
- pandas for data manipulation and aggregation
- numpy for numerical computation
- scipy for statistical analysis
- statsmodels for regression models (critical for pay equity)
Same input, same output, every time. Auditable. Repeatable. Deterministic.
This is the difference between asking a colleague to estimate a number from memory and asking them to build a spreadsheet formula. The formula is reliable. The estimate is a guess.
Practical rules
For anything involving numbers that matter:
- Always tell Cowork to write and run a script. Say: "Write a Python script to do this calculation, then run it."
- Never accept a number Claude calculated in conversation. If it says "the compa-ratio is 0.94," ask it to show you the script that produced that number.
- Review the methodology, not just the output. The model might use the wrong regression specification or the wrong percentile calculation. Your domain expertise is what catches this.
- Ask Claude to add logging to the script so you have an audit trail of what was calculated, when, and from what data.
What this looks like in practice
You: "I have a CSV with EmployeeID, Gender, BaseSalary, JobLevel, Department, HireDate, PerformanceRating. I need the adjusted gender pay gap controlling for job level, tenure, and performance. The EU Pay Transparency Directive requires the adjusted gap to be below 5%."
Claude writes a complete Python script: data loading, validation, OLS regression with log-transformed salary, confidence intervals, compliance check. Saves it as a file in your Project folder.
You: "Run it against my data."
The script executes locally. Your data stays on your machine. The output: adjusted gaps, p-values, confidence intervals, compliance scoring. All auditable. All reproducible.
Total employee data exposed to the AI: zero.
Code, Not Data: Keeping Employee Information Safe
The Code Not Data methodology was originally developed by Ivan Nosov and tested against synthetic data for EU Pay Transparency Directive compliance analysis. This section builds on his work.
This is the section that matters most for compensation teams. Your data is sensitive. Salaries, performance ratings, demographic breakdowns, disability status. This is personal data under GDPR — it enables you to identify an individual (Art. 4(1)). That means the full weight of GDPR applies: lawful basis for processing (Art. 6), purpose limitation (Art. 5(1)(b)), integrity and confidentiality (Art. 5(1)(f)), and cross-border transfer rules (Chapter V).
Worth noting: most organisations already process personal data in the cloud via services like M365 Exchange Online, covered by formal Data Processing Agreements. The difference with consumer LLM APIs is that those DPAs typically do not exist, which is exactly why the local-processing approach in this guide matters.
It should not be uploaded to external AI services without proper governance.
The principle
Ask the AI to write the code. Run the code yourself, on your machine, against your data. The LLM never sees a single employee record.
This is the separation: the AI provides the analytical capability (statistical models, visualisation logic, data validation routines). You provide the data. The two never meet inside the AI's context window.
How this works in Cowork
Cowork runs a sandboxed Linux virtual machine locally on your computer. When it writes a Python script and executes it, that execution happens on your machine, not in the cloud. Your files mount into the VM locally. The AI reasoning happens on Anthropic's servers, but the code execution and file operations happen on your hardware.
Network access from the VM is restricted to a strict allowlist (Python package repositories and Anthropic's API only). It cannot make arbitrary web requests or send your data anywhere.
What the AI sees vs what stays local
| What the AI sees | What stays on your machine |
|---|---|
| Code (Python scripts, SQL queries) | Employee records |
| Column names and data types | Actual values |
| Aggregated output (means, medians, percentages) | Individual-level data |
| Error messages and debugging context | Raw CSV/Excel files |
| Your analytical questions | PII, salaries, ratings |
The AI operates as a code generator, not a data processor. The separation is architectural, not incidental.
Where the boundary leaks (and how to manage it)
The separation is strong but not airtight. Three things still travel to the AI provider, and you should be aware of each:
Column names and schema descriptions. When you tell the AI "my data has columns Gender, Ethnicity, DisabilityStatus, BaseSalary," those category names go into the prompt. They are metadata, not personal data, but they reveal what sensitive categories your organisation holds. For most purposes this is acceptable. For highly regulated environments, use generic column names in the conversation ("column A, column B") and map them locally.
Error messages. When code fails and you paste the traceback, error output often includes sample data values, file paths, or partial records. Discipline matters here: check error messages before sharing them with the AI. Strip any lines that contain actual data values.
Small-group aggregations. "Why does the output show a 23% gap for the 3 female directors in Market X?" That aggregation, combined with the context, could be re-identifiable. Keep group-level discussions above a minimum threshold (typically n=10 or higher) when interacting with the AI.
Start with synthetic data
Before pointing any script at real employee data, build and test with synthetic data first. Ask Claude to generate a fake dataset with realistic structure — job levels, salary ranges, departments, performance ratings — but no real people. Iterate on the code until it does what you need, then swap in the real file path.
One caution: synthetic means generated from scratch, not real data with names changed. If you take actual salary distributions and swap identifiers, the underlying patterns could still be identifying in small populations.
Practical safeguards
- File access controls: In Claude Code (the more advanced terminal-based tool), you can configure deny rules in settings to block the AI from reading data files entirely. The AI writes the code, you review it, you run it. The AI never has access to the data directory at all.
- Logging for audit trails: Ask the AI to add logging to the generated scripts, outputting to a local log file. A record of what the script ran, when, and what it processed is useful if anyone asks how the analysis was produced. Make sure the log captures actions and summaries, not the data itself.
- Local models for maximum isolation: For scenarios where even metadata exposure is unacceptable, run a local LLM with zero network traffic for anything that references your actual data structure.
A note on EU AI Act classification
If you are using this method for pay equity or compensation analytics, be aware: AI systems used in employment-related decisions fall under high-risk classification in Annex III of the EU AI Act (full application August 2, 2026). Even if the AI only generated the code and never processed employee data, the fact that AI-generated code drives employment-related analysis should be documented in your organisation's AI Act compliance posture. This includes maintaining records of what the AI produced, what a human reviewed, and what decisions the analysis informed.
This is also the line between AI-assisted development and what Andrej Karpathy termed "vibe coding" — accepting AI output without reviewing it. For regulated analytics, you review the code, you understand the statistical method, you validate the output. The AI accelerates execution. The judgement stays with you.
Comp Workflows in Cowork
Here are practical use cases where Cowork connects to your existing tools and delivers real value:
Cycle Communications
Upload your comp cycle guidelines and timeline into a Project. Ask Cowork to draft pre-cycle kickoff emails for managers, mid-cycle reminders for HRBPs, and post-cycle summaries for leadership. It reads your docs, follows your tone, and drafts the full set. You review and send.
Create a Skill for this workflow so every cycle follows the same structure and tone without re-explaining.
Manager FAQ
Upload your comp guidelines, merit matrix, and FAQ document into a Project. Configure it with an instruction: "When asked questions, answer only from these documents. If the answer is not in the documents, say so." This alone can save hours during every cycle by handling the same 20 questions managers always ask.
Calibration Prep
Give Cowork your anonymised summary data (not individual records) and ask it to generate discussion guides, flag outlier recommendations, and draft calibration talking points. It can produce a pack per department in minutes. You own the interpretation. Cowork does the formatting.
Benchmarking Summaries
Export market data from your survey providers. Ask Cowork to identify roles where your ranges sit below P25 or above P75, calculate gaps, and draft a summary with recommended actions. The analysis should run as a script (see The Maths Problem section), not as a chat calculation.
Job Matching
Paste a job description and ask Cowork to suggest matching survey codes across your providers. It will not be perfect, but it gives you a starting point that is faster than matching manually from scratch. Always review the matches yourself. Over time, with persistent memory, your Project accumulates every matching decision you have made, building a reusable knowledge base.
Setup Checklist
- Download Claude Desktop
- Connect your tools: Google Drive, Gmail, Calendar, Notion
- Create a Project for your current comp workstream
- Upload: comp philosophy, salary bands, merit guidelines, FAQ doc, cycle timeline
- Write a CLAUDE.md instruction file with your preferences, rules, and naming conventions
- Create a tasks/lessons.md file to capture corrections and discoveries as you go
- Create your first Skill for a repeatable task (cycle comms, calibration packs)
- For any analysis involving numbers: always tell Cowork to write and run a script
- Start with synthetic data before using real employee records
- For maximum data isolation: use Claude Code with explicit file access deny rules
One Last Thing
The barrier to using AI in compensation is not technical skill. It is knowing where to start and knowing how to keep your data safe while you do it.
This guide is that starting point. The judgement, the interpretation, the business context — that is yours. That is what makes you a compensation professional. The AI handles the mechanics so you can spend more time on the work that actually needs you.