
AI News
29 Sep 2025
Read 16 min
Claude Sonnet 4 1M token guide How to process codebases
Claude Sonnet 4 1M token guide helps you process whole codebases in one request and speed reviews.
Claude Sonnet 4 1M token guide: Process entire repositories and long documents in one pass. With a 1 million token context window, you can load tens of thousands of lines of code plus docs, keep full cross-file awareness, and run multi-step agent workflows without losing history. This guide shows setup, prompts, costs, and best practices.
The new long context window for Claude Sonnet 4 changes how you work on large codebases. You can place whole projects, tests, and documentation in a single prompt, reason about architecture across modules, and apply changes with fewer iterations. Long context is in public beta on the Claude Developer Platform, and also available in Amazon Bedrock and on Google Cloud’s Vertex AI. Pricing adapts for prompts over 200K tokens, and you can cut costs with prompt caching and batch processing. Use this Claude Sonnet 4 1M token guide to plan end-to-end workflows that are accurate, fast, and audit-ready.
Claude Sonnet 4 1M token guide: plan a whole-repo session
What a million tokens really means
A token is a small chunk of text. One million tokens can cover an entire mid-sized repository—more than 75,000 lines of code—and still leave room for documentation, API references, and a task plan. You can also bring dozens of research papers, legal contracts, or specifications into the same request and ask the model to synthesize findings with traceable citations to the included text.
For engineering work, a single large prompt reduces back-and-forth. Claude can inspect files together, see cross-file dependencies, and propose changes that respect the design of the whole system. It can write tests based on real interfaces, not guesses from a partial sample.
When to use long context vs retrieval
You can still use retrieval or embeddings when:
- Your repo far exceeds 1M tokens, and you want selective loading based on a query.
- You want low-latency, smaller calls for interactive search.
- You need to stream in fresh data on demand.
With a 1M window, many teams can skip retrieval for day-to-day tasks and gain accuracy from the full in-memory view of the project.
Pick the right platform and limits
Long context support is in public beta:
- Claude Developer Platform: available natively for accounts with Tier 4 and custom rate limits, with broader access rolling out.
- Amazon Bedrock: long context enabled.
- Google Cloud’s Vertex AI: now available.
Pricing on the Anthropic API reflects larger compute:
- Prompts up to 200K tokens: Input $3 per MTok; Output $15 per MTok.
- Prompts over 200K tokens: Input $6 per MTok; Output $22.50 per MTok.
You can cut latency and cost with prompt caching. Batch processing offers an additional 50% cost savings for offline or queued workloads. If your workflow reuses the same large context with small deltas, caching can deliver large efficiency gains.
Set up: prepare your codebase for the model
Create a repository manifest
Start with a compact manifest at the top of your prompt that orients the model:
- Project summary: what the system does, core domain concepts, data flows.
- Key modules and boundaries: services, packages, libraries, and how they talk.
- Entry points and interfaces: CLI commands, HTTP endpoints, event handlers.
- Build/run details: language, package manager, frameworks, version constraints.
- Tests: structure, coverage strategy, how to run locally and in CI.
- Known issues and priorities: bugs, debt, performance hot spots.
This short map helps Claude navigate thousands of lines without guesswork and improves the quality of suggested changes.
Curate what to include
You do not need every byte. Include the parts that drive behavior:
- Source files for the feature or subsystem you want to change.
- Shared libraries and utility modules used by those files.
- Configuration and infrastructure that affects runtime behavior (env, Docker, IaC).
- API contracts: OpenAPI/GraphQL schemas, protobufs, or internal service interfaces.
- Docs that describe business rules, SLAs, compliance needs, and performance targets.
Exclude or compress:
- Generated code, lockfiles, and minified bundles unless they drive behavior you must change.
- Large vendor directories or binaries—replace with a note and version pin.
- Huge data files—include small samples and describe the schema instead.
Strategic curation keeps the context within budget and focuses attention on what matters.
Token budgeting and chunking
Large contexts still benefit from order and structure:
- Group files by module or feature and place related files together.
- Prefix each file with a clear header that includes the path, language, and purpose.
- Put the manifest and instructions first, then the files, then your task at the end with a checklist.
- If you exceed your limit, prune low-value files first and keep interfaces, tests, and business logic.
- Use consistent markers like “BEGIN FILE … / END FILE …” to help navigation and quoting.
Prompt patterns that work for large code
Give a clear role and goal
State the goal in one sentence. Name the constraints. Ask for explicit steps and a final diff or patch. Keep language simple and precise.
- Role: “You are a senior software engineer working on X.”
- Goal: “Add Y feature without breaking Z tests.”
- Constraints: coding standards, performance budget, security rules.
- Deliverables: plan, impacted files, diffs, tests, and a rollback note.
Use navigation anchors
Help the model cite and edit exact locations:
- Each file block: include the path and a short description.
- Encourage references by path and line anchors, not by vague descriptions.
- Ask the model to quote the original code snippet before proposing changes.
Ask for structured reasoning, then actions
Break the task into a safe sequence:
- Plan: list steps with file paths.
- Analyze: explain how changes affect dependencies and tests.
- Apply: produce unified diffs per file.
- Verify: propose or update tests and describe how to run them.
- Review: list risks and follow-up tasks.
This format reduces mistakes and gives you a reviewable trail.
Maintain an internal “scratchpad”
In long sessions with agents or tools, keep a short scratchpad at the top of the prompt that records decisions, assumptions, and open questions. Ask Claude to update it as steps complete. This prevents drift over hundreds of tool calls and keeps the workflow coherent.
Quality checks at scale
Compile, test, and lint after every change
Automate verification between model calls:
- Run the build and unit tests locally or in CI.
- Apply static analysis and security scans (linters, SAST).
- Feed only the relevant error output back to the model with the file paths.
Short, targeted feedback loops keep the model focused and prevent context bloat.
Ask for assertions and invariants
Have Claude call out assumptions as inline comments or docstrings. Ask it to flag invariants it depends on and to add tests that enforce them. This improves reliability and makes later edits safer.
Demand reversible changes
Request a simple rollback plan for each edit and a short migration note for schema or interface changes. Clear reversal steps reduce risk during rollout.
Cost, latency, and reliability tips
Use prompt caching for stable context
If each iteration reuses the same large base prompt, caching can lower cost and speed up responses. Update only the delta (for example, new test failures or a small file diff). Combine caching with batch processing to save an additional 50% on queued workloads.
Right-size your window
Do not fill the entire 1M tokens just because you can. Smaller prompts are faster, cheaper, and often more accurate. Include only what is essential for the current task.
Be mindful of pricing tiers
Costs double for input over 200K tokens and rise for output. Trim prompts to stay under key thresholds when possible:
- At or below 200K input tokens: $3 per MTok input, $15 per MTok output.
- Above 200K input tokens: $6 per MTok input, $22.50 per MTok output.
Manage long-running agent sessions
For agents that call tools many times:
- Summarize the interaction history periodically and replace the raw log.
- Keep a stable instruction block and cache it.
- Limit tool output to the smallest slice needed for the next step.
Use cases beyond code
Document synthesis
Load dozens or hundreds of documents—contracts, research papers, audits—into one prompt. Ask for:
- A structured summary with key findings, dates, and named entities.
- Traceable citations to source passages you included.
- Contradiction checks across documents.
- Action lists and risks with references.
Context-aware agents
Build agents that hold full tool definitions, API docs, and long interaction histories. The agent can maintain state over hundreds of steps and still answer with precise citations, because everything needed stays in context.
Field notes from teams in production
Bolt.new integrates Claude into a browser-based development platform. They report that Sonnet 4 remains their go-to model for code generation, and that the 1M context window lets developers work on larger projects while keeping accuracy high.
iGent AI uses Maestro, a software engineering agent. With 1M tokens, they describe stronger autonomous capabilities and multi-day sessions on real-world codebases that operate at production scale. These stories show how long context supports real workflows, not just demos.
Step-by-step example workflow for a feature change
- Create a repo manifest with modules, interfaces, build commands, and known issues.
- Select files: core logic, shared libs, configs, relevant tests, and API contracts.
- Add high-level business rules and performance constraints from docs.
- Draft the prompt: role, goal, constraints, deliverables, and a checklist.
- Insert files with clear headers and “BEGIN/END FILE” markers by path.
- Ask for a plan first: impacted files, risks, and test strategy.
- Request diffs per file, with reasoning and references to code lines.
- Run build and tests. Feed only error excerpts back to Claude.
- Iterate on fixes. Keep the base context cached; send small deltas.
- When green, ask for a migration note, rollback steps, and a PR description.
- Batch similar tasks overnight to use the 50% savings from batch processing.
This sequence works for refactors, performance tuning, test hardening, and documentation updates. With a million tokens, you keep everything important in view while you apply safe, reviewable changes.
Limitations and best practices
- Public beta: features and limits may change. Monitor platform updates.
- Verification needed: always compile, test, and scan outputs before merge.
- Order matters: put instructions and the manifest first, tasks last, and group related files.
- Be selective: include interfaces and tests before implementation details if you must trim.
- Security: do not paste secrets. Use redaction and separate secret stores.
- Performance near the limit: very large prompts can increase latency. Keep prompts as small as possible.
Getting started today
Long context for Sonnet 4 is available in the Claude Developer Platform for Tier 4 and custom limit customers, with broader rollouts coming soon. It is also live in Amazon Bedrock and on Google Cloud’s Vertex AI. Review the pricing thresholds, set up prompt caching, and stage batch jobs for background tasks. Use the structure in this Claude Sonnet 4 1M token guide to prepare your repo, write robust prompts, and automate tests.
The 1M-token window unlocks whole-repo reasoning, richer agents, and large-scale document synthesis. With smart curation and clear instructions, you can ship changes with fewer cycles and higher confidence. As you scale, return to this Claude Sonnet 4 1M token guide for checklists, patterns, and cost controls that keep your workflows efficient and reliable.
(Source: https://www.anthropic.com/news/1m-context)
For more news: Click Here
FAQ
Q: What does the Claude Sonnet 4 1M token guide cover and what can a 1M token context do?
A: The guide shows setup, prompts, costs, and best practices for processing entire repositories and long documents in one pass. With a 1M-token context window you can load tens of thousands of lines of code, keep full cross-file awareness, and run multi-step agent workflows without losing history.
Q: Which platforms currently support Sonnet 4 long context?
A: Long context support for Sonnet 4 is in public beta on the Claude Developer Platform for accounts with Tier 4 and custom rate limits, and it is also available in Amazon Bedrock and on Google Cloud’s Vertex AI. Broader availability is rolling out over the coming weeks.
Q: How does API pricing change for prompts over 200K tokens?
A: Prompts up to 200K tokens are billed at $3 per MTok input and $15 per MTok output, while prompts over 200K tokens are billed at $6 per MTok input and $22.50 per MTok output. Costs double for input over 200K tokens and output rates increase, so trim prompts where possible.
Q: What setup steps does the Claude Sonnet 4 1M token guide recommend for preparing a codebase?
A: Start with a compact repository manifest that includes project summary, key modules and boundaries, entry points and interfaces, build/run details, tests, and known issues. Curate included files by adding relevant source files, shared libraries, configuration and API contracts, and exclude generated code, lockfiles, large vendor directories, or huge data files (replace them with notes and version pins).
Q: Which prompt patterns improve results when working with large codebases?
A: Give a clear role and goal, name constraints and deliverables, ask for explicit steps and a final diff or patch, and use navigation anchors with file paths so the model can cite exact locations. Break tasks into plan, analyze, apply, verify, and review stages and keep a short scratchpad at the top of the prompt that Claude updates to prevent drift.
Q: When should teams use long context instead of retrieval or embeddings?
A: Use retrieval or embeddings when your repository far exceeds 1M tokens, when you need low-latency smaller calls for interactive search, or when you must stream fresh data on demand. With a 1M window many teams can skip retrieval for day-to-day tasks and gain accuracy from the full in-memory view of the project.
Q: How should I budget tokens and organize files to stay within the 1M limit?
A: Group files by module, prefix each file with a header that includes path, language, and purpose, place the manifest and instructions first, and prune low-value files first if you exceed your limit. Use consistent “BEGIN FILE … / END FILE …” markers and prioritize interfaces, tests, and business logic to maximize value within your token budget.
Q: What are the best ways to lower cost and latency when following the Claude Sonnet 4 1M token guide?
A: Use prompt caching to reduce latency and costs when reusing the same large base prompt, and combine caching with batch processing to save an additional 50% on queued workloads. Right-size your window rather than filling the full 1M tokens, run builds/tests/lints after changes, and avoid pasting secrets to keep workflows reliable and secure.
Contents