Anthropic just dropped its Claude Code workshop tapes. The playbook is better than the marketing.

Boris Cherny on Claude Code, Applied AI on prompting, Erik Schluntz on vibe coding in prod. Three Code with Claude tapes hit YouTube ahead of the 2026 conference.

Anthropic has three Code with Claude SF 2025 workshop tapes back in circulation on YouTube. Boris Cherny’s 30-minute Claude Code masterclass and the Applied AI team’s 24-minute Prompting 101 went up last week. Erik Schluntz’s 31-minute Vibe Coding in Prod, Responsibly is getting cited alongside them. Code with Claude 2026 SF kicks off May 6.

The three videos run roughly 85 minutes together. They contain more concrete advice for shipping with Claude than the launch keynote that drew the headlines on May 22, 2025, the day Anthropic announced Claude 4 Opus, Sonnet 4, and the Claude GitHub app. One walks Claude Code as a power tool. One frames prompt engineering as empirical iteration. The third asks the harder question: how do you let an agent write 22,000 lines of production code and stay sane?

Why these tapes, why right now

Anthropic filmed all three sessions at the original Code with Claude SF on May 22, 2025. Boris’s and the Applied AI team’s recordings sat behind the event’s livestream archive until late April 2026, when Anthropic re-cut and uploaded them as standalone videos. Glen Rhodes picked up on the prompting talk on April 28, calling out 40 distinct techniques in 24 minutes. The Claude Code video has been getting bilingual cuts uploaded by community channels within days. Erik Schluntz’s vibe coding talk has had a longer half-life: he tweeted that it was live in August, and it’s been re-circulated this week as the third leg of the same canon.

The timing is not random. Code with Claude 2026 SF lands May 6, with London on May 19 and Tokyo on June 10. The agenda is already public: Cat Wu on the State of Claude Code, Dickson Tsai on what’s new in Claude Code, Daisy Hollman running a Claude Code best-practices workshop, Sid Bidasaria on rearchitecting workflows, and Boris Cherny on the opening keynote panel. The 2025 tapes work as the canonical primer for what the 2026 sessions will build on. If you want to follow along on May 6 and not feel lost, these are the prerequisites.

There’s a second reason the timing matters. Boris was Member of Technical Staff when he gave the talk; he’s now Head of Claude Code, with a recent Lenny Rachitsky interview where he claims 100% of his own code has been written by AI since November 2025 and that 4% of all public GitHub commits are now authored by Claude Code. That makes the 2025 tape a snapshot of how the workflow looked when the team had just shipped what would become the default agentic coding tool. The advice in it has aged well in part because the team kept doubling down on the same primitives.

Boris’s first rule: start with codebase Q&A

The opening prescription in the 30-minute talk is the bit that got the loudest applause at the venue. Anthropic’s technical onboarding used to take two to three weeks. With Claude Code, Boris said, it’s now two to three days. The hand-on-keyboard advice he gives every new hire on day one is not “edit code.” It’s “ask the codebase questions.”

The mechanics are deliberate. Claude Code does not build a remote index of your repo. There is no upload, no embedding, no waiting on first run. The model uses the same tools you do: file search, file read, bash. It walks the source on demand and synthesises an answer. That keeps your code on your machine, but it also collapses the setup tax to zero.

Boris’s actual prompts on the slide are not glamorous. “How is this class instantiated in tests?” “Why does this function have 15 arguments? Look through git history.” “What did I ship this week? My git username is X.” The last one is part of his Monday standup workflow. He runs it, copies the output into a doc, and ships the standup. The git-history archaeology one is the prompt that converts Claude Code from a fancy autocomplete into a junior researcher who never tires.

Two configuration moves carry most of the day-one wins. Run /terminal-setup so shift-enter inserts a newline; run /install-github-app so you can mention @claude on any GitHub issue or pull request and have the agent triage or open a PR; customise allowed tools so you stop being prompted on the bash commands you actually trust. macOS users can hit the dictation hotkey twice and just talk at the prompt. He does this for most of his prompts.

“We’re not system-prompting it to do this. It just knows how to do this. The model is good.”
Boris Cherny, on why Claude Code knows to read git history without being told

The understated framing matters. Most of the talk’s advice is some version of trust the model, then give it the right scaffolding. The scaffolding gets a lot of slides. The trust gets one line, repeated four times.

The four-step Claude Code escalation ladder

Boris’s mental model for the workflow is a ladder. Each rung adds capability, and you climb it in order.

Rung one is Q&A. That’s the onboarding move above. New users do not start by editing code. They start by asking. This is also how a team learns the boundary between what Claude can one-shot and what needs hand-holding.

Rung two is editing code with feedback. Once you’re comfortable asking, give Claude a tool to check its own work. Unit tests are the obvious one. So is a Puppeteer or Playwright screenshot harness for UI work, an iOS simulator screenshot for mobile, or a curl against a dev server. The pattern Boris keeps hammering: iteration with feedback beats one-shot every time. If Claude can see the result of its own change, two or three loops will land you something close to perfect.

Rung three is brainstorming and plan mode. Before the model writes a line, ask it to plan. “Brainstorm three approaches, pick one, then ask before writing code.” You don’t need a special flag to do this. Just say it. Claude infers the rest. Boris’s go-to incantation for shipping a feature is “make a commit with this and open a PR.” No further explanation. Claude reads recent git history to learn the team’s commit format, makes the branch, makes the commit, opens the PR.

Rung four is plugging in your team’s tools. Bash CLIs, internal --help outputs, and Model Context Protocol servers. The talk uses a deliberately hypothetical “Burley CLI” example to make the point: tell Claude about a tool, point it at --help, and it figures out how to drive it. MCP is the same shape with structured inputs. Once you’ve taught Claude about your team’s tools, you should write the result down somewhere it gets pulled in next session.

That somewhere is CLAUDE.md. It loads automatically at session start, lives at the project root for shared context, in nested directories for scoped context, and at user/enterprise scope for cross-repo defaults. The advice is to keep it short. Common bash commands, a short style guide, three or four key files, architectural decisions worth knowing. Long CLAUDE.md files burn context window and stop being read carefully. We covered the growth of agent automation as a category recently; the pattern Boris describes is the manual version that scaled into the Routines feature.

Prompting 101 as empirical iteration

The Applied AI team’s 24-minute walkthrough takes a different shape. Hannah Moran and Christian Ryan (Anthropic Applied AI) build one prompt up across four iterations, on a real customer scenario: a Swedish car-insurance company that needs Claude to read a filled-out accident report form and a hand-drawn sketch of the collision, then determine which driver is at fault.

They start with the simplest possible prompt. “You are reviewing a car accident report. Determine what happened.” Two images attached. Run.

Claude’s first answer: this looks like a skiing accident on a street called Schaffmann Gartan. The form is in Swedish; one of the labels happens to look like a piste name. With no context, the model picked up the most surface-level cue and ran with it.

This is the talk’s opening lesson. Prompt engineering is an iterative empirical science. You give Claude a prompt, you watch where it fails, you bake the missing context back in, you run again. Hannah and Christian go through five versions in the demo. Each one fixes a specific failure mode the previous version exposed.

Title card for Anthropic's Prompting 101 workshop session. — Hannah Moran and Christian Ryan, Applied AI team, build a prompt over five iterations on a Swedish insurance demo. Source: Anthropic / YouTube

Version two adds task context and tone. “You are an AI system helping a Swedish claims adjuster. Stay factual. Do not guess. Only assess fault if you are confident.” The model now correctly identifies a vehicle collision and reads the checkbox marks. It still hedges on fault, which is exactly what Christian asked it to do. Stay confident or stay quiet.

Version three moves the static reference content into the system prompt. The accident form has 17 numbered checkboxes and a fixed two-column layout. That structure does not change between claims. Putting it in the system prompt does two things: it lets prompt caching kick in (every claim shares that 1,200-token prefix), and it stops Claude burning a chunk of every response on re-deriving what the form is. Hannah’s tip is to also describe how the form gets filled out in practice: humans tick boxes, but they also draw circles, scribble, or put X marks. Tell Claude the failure modes ahead of time.

Version three is also where the model first commits. Vehicle B at fault. Same data as version one. The difference is the context.

Version four adds detailed instructions in the user prompt. This is where the order-of-operations point lands.

“The order in which Claude analyzes this information is very important. You would not look at the drawing first and try to understand what was going on. It’s a bunch of boxes and lines. But if you read the form first, you understand it’s a car accident, you know what boxes are checked. Then you can interpret the sketch.”
Hannah Moran, Applied AI team, Anthropic

The instruction list reads like a checklist for a junior adjuster. Read the form. List every checked box. Then look at the sketch with that context. Match the sketch to your form findings. Output your verdict in <final_verdict> tags. Each step has a deliverable. The model now narrates its own work, which is half the point: a transcript you can audit when the verdict is wrong.

XML, prefill, and extended thinking as a debugging crutch

Three techniques in the prompting talk are easy to miss because they feel like polish. They are not polish.

XML tags beat markdown for structure. Christian is explicit: Claude was fine-tuned on XML, and XML lets you label what’s inside a block. <form_analysis>...</form_analysis> is unambiguous. A markdown heading is a hint. The talk uses XML to wrap the example data Claude should learn from, the instruction list, and the final output format. The same advice shows up in Anthropic’s prompt engineering docs and is consistent with how the Cursor and Cline teams structure their system prompts. If you’ve been writing prompts in pure markdown, this is the cheapest upgrade you can make today.

Prefilled responses lock the output shape. If you want JSON, start the assistant turn with {. If you want XML, start it with <final_verdict>. The model continues from whatever you’ve put in its mouth. This kills “let me explain my reasoning first” preambles when you don’t want them. Combined with extended thinking, you get the reasoning and the structured output, in that order, and you parse only the bit you need.

Extended thinking is a debugging tool, not just a quality knob. Christian frames it as a crutch: turn it on, read the scratchpad, find the steps Claude is taking that you didn’t expect, and bake those discoveries into the system prompt for next time. The extended-thinking transcript is essentially a free printout of what context Claude wishes it had. Once you’ve read it twice, you’ll know what to add to your system prompt to make the model reason more efficiently. The token cost goes down on the next iteration.

The 10-point structure on the team’s slide is worth memorising. Task description, tone, background data, detailed instructions, examples, conversation history, the immediate task, thinking instructions, output formatting, and a final reminder of any critical rules. Most prompts skip three or four of those slots. Filling them in order, in XML-tagged blocks, is what separates a working prompt from a flaky one.

The third tape: Erik Schluntz on vibe coding in prod

The third Code with Claude SF 2025 tape is Erik Schluntz’s 31-minute Vibe Coding in Prod, Responsibly. Schluntz is an Anthropic researcher and the co-author, with Barry Zhang, of Building Effective Agents, the company’s most-cited engineering writeup on agentic systems.

Schluntz opens with a personal hook. He broke his hand biking to work last year, was in a cast for two months, and let Claude write all of his code through the entire stretch. The talk is the playbook he wrote for himself out of necessity, then exported to the rest of the company.

His definition of vibe coding is narrower than the popular usage. Cursor and Copilot do not qualify, in his framing, because you’re still in a tight feedback loop with the model. He pulls Andrej Karpathy’s original phrasing: vibe coding is “where you fully give into the vibes, embrace exponentials, and forget that the code even exists”. The phrase that matters is forget the code even exists. You’re not reading every line. You’re trusting the system.

Title card for Erik Schluntz's 'Vibe coding in prod, responsibly' Anthropic workshop talk. — Schluntz’s 31-minute talk reframes vibe coding as a management problem, not a tooling one. Source: Anthropic / YouTube

Why care about something that sounds like a recipe for production fires? The curve, Schluntz argues. The length of tasks an AI can complete is doubling roughly every seven months. Today it’s about an hour. Next year it’s a workday. The year after that it’s a week. “There is no way that we’re gonna be able to keep up with that if we still need to move in lockstep,” he says. If you don’t learn to step out of the loop, you become the bottleneck.

His compiler analogy is the one to remember. Early developers didn’t trust compilers; they read the assembly output line by line to be sure. That stopped scaling, so the industry built verifiable abstractions and stopped reading assembly. The same shift has to happen with agent-written code. The question is how to do it without breaking production.

Schluntz’s three rules are concrete enough to copy verbatim.

Be Claude’s PM. “Ask not what Claude can do for you, but what you can do for Claude.” On every non-trivial change, Schluntz spends 15 to 20 minutes building a plan with Claude before he lets it execute. That plan-building is itself a separate session: Claude explores the codebase, surfaces relevant files, drafts the approach. Schluntz reviews. Then he hands the artefact to a fresh context and says “go execute this.” The success rate, he says, is very high once that prep is done. This is the same iterative-context pattern Hannah and Christian were teaching from the prompt-engineering side; Schluntz frames it as the management discipline that makes vibe coding tractable.

Target leaf nodes, not the trunk. Leaf nodes are end-of-line features that nothing else depends on. Tech debt buried inside a leaf is contained; tech debt in the core architecture compounds. Schluntz’s caveat is honest: tech debt is the one thing today’s verification tools cannot catch from outside. Until that gap closes, vibe-code the tips of the tree, not the spine.

Design for verifiability. A CTO does not read every line their best engineer ships. A CEO does not redo their accountant’s books. They verify at the boundary: acceptance tests, product use, spot checks against numbers they understand. Schluntz argues software engineers are uniquely bad at this because they’re trained as ICs who own the full stack. The shift to vibe coding in production is the same shift any first-time manager makes.

The proof point lands hard. Schluntz’s team merged a 22,000-line PR to Anthropic’s production reinforcement-learning codebase, most of it Claude-written. Three things made it safe: days of human work on the requirements before Claude touched code, scoping the change to leaf nodes, and a stress-test harness that verified stability without anyone reading every line. The work would have taken two weeks by hand. Claude finished the implementation in roughly a day.

“We will forget that the code exists, but not that the product exists.”
Erik Schluntz, Anthropic

The independent reading is more sober. Developer Eric Jinks, in a writeup of the talk, frames the shift as moving “from code writers to system designers and verifiers” and flags the obvious risk: motivated developers will learn faster than ever, but anyone coasting through agent output will hollow out fast. The talk has the same warning baked in. Schluntz is explicit that fully non-technical builders should not be vibe coding production systems: asking the right questions is the skill that gates everything else, and a non-technical PM cannot ask them. In the Q&A, he also pushes back on the public examples of vibe-coded apps that leaked credentials and exposed databases: those were people with no business shipping production software, not a problem with the technique.

What still holds, and what has shifted in 12 months

The interesting question with a re-released talk is which advice has aged. Most of the 2025 tapes still hold. Some of it has shifted.

Still true. Codebase Q&A as the entry point. Iterate with feedback. Plan before edit. CLAUDE.md as the share-once-then-forget context layer. MCP for team tools. XML structure. Prefill for output shape. Extended thinking as a debugger. The four-step ladder. Schluntz’s leaf-nodes-first scoping rule. None of these have been replaced by anything; they’ve been productised.

Shifted. Plan mode is now a first-class flag, not an incantation, with an explicit “do not write code yet” contract. Sub-agents ship as named, scoped agents you can address from your main session. Output styles replace ad-hoc “answer in this format” instructions. Hooks let you wire deterministic shell commands to lifecycle events. Skills, the late-2025 packaging concept, give you reusable slash commands plus context, which is exactly what Boris’s MCP-and-CLAUDE.md combination was approximating by hand. The GitHub app moved from “today’s announcement” to a default tool, with issue-triage and PR review running in production on Anthropic’s own repos.

The model has moved too. The Prompting 101 demo runs on Claude 4 at temperature zero. Anthropic shipped Claude 4.5 in September 2025, 4.6 in February 2026, and Opus 4.7 earlier this month. Long-context retrieval has improved on most benchmarks since then; one regression on the MRCR v2 256k benchmark is real and worth knowing about, but day-to-day prompting on a 4-image insurance form will work better on 4.7 than on the original 4.

There’s one Boris line worth grading on a curve. In the Q&A, asked why he built a CLI rather than an IDE, he said “there’s a good chance that by the end of the year, people aren’t using IDEs anymore”. Twelve months on, IDEs are very much alive. Cursor is at a reported $50B valuation and GitHub Copilot moved to usage-based billing on June 1. What did happen is that the line between IDE and agent blurred. Cursor ships an in-editor agent that calls Claude. VS Code has an official Anthropic extension. Boris was directionally right about the agentic shift; he was specifically wrong about the timing and the form factor. Worth keeping in mind if you take “model is good, scaffolding doesn’t matter” too literally.

What to watch on May 6

The Code with Claude SF 2026 agenda gives you four sessions to circle, all of which build directly on the workshop tapes.

Cat Wu, “State of Claude Code.” This is the one most likely to give a numbers-heavy update on adoption, latency, and the agent’s success rates on internal benchmarks. If you’ve been wondering whether the six-week quality dip Anthropic admitted to in March is fully resolved, this is where you’ll get the official answer.

Daisy Hollman, “Claude Code best practices” (workshop). The 2025 tape is Boris’s version of best practices. Daisy’s workshop is the one that will reflect a year of real-world adoption inside and outside Anthropic. Expect concrete patterns for sub-agents, plan-mode discipline, and how teams structure CLAUDE.md at scale.

Sid Bidasaria, “Rearchitecting your workflows with Claude Code.” Sid was the SDK speaker hallway-mentioned in Boris’s 2025 talk. A year on, the SDK is the substrate for serious internal tooling at companies like Disney. This session is the one to attend if you’re building Claude into CI, incident response, or off-hours automation.

Guillermo Rauch, “How Vercel builds for model step-changes.” The outsider’s view on shipping in a fast-moving model market. Vercel’s v0 product has been one of the clearer examples of an external team architecting around model-version churn, and his perspective will land differently than the in-house ones.

The full conference livestreams free for anyone who registers. London follows on May 19, Tokyo on June 10.

What to actually do this weekend

If you’re already on Claude Code, three concrete moves before Tuesday.

One. Write a short CLAUDE.md for the repo you spend the most time in. Three to five bullets. Common test command. The directory that holds the most-edited code. The lint rule that bites you. Commit it. Share with the team.

Two. Find one place you’re hand-checking Claude’s output and replace it with a feedback loop. Unit tests for backend. Playwright screenshots for UI. A curl for an API. Then ask Claude to iterate against that loop, not against your eyes.

Three. Pick one prompt you’ve been hand-running in chat and rewrite it with the 10-point structure from Prompting 101. Task description, background data in XML tags, detailed instructions, output format, final reminder. Run both versions on the same input. Note which one you’d actually trust.

The 2025 tapes aren’t a marketing artifact dressed up as education. Boris built the tool. Hannah and Christian taught the rest of Anthropic how to drive the model. Schluntz wrote the management theory that explains why letting an agent ship 22,000 lines is a discipline, not a stunt. The fact that all three are circulating again, days before the 2026 conference, says a useful thing about how Anthropic sees its own moat. The model is good. The context is yours. Get the scaffolding right and the rest follows.