Limited Edition Jonathan
Why You Keep Hitting Claude's Usage Limit
And the 15 Habits That Fix It — The Token Guide: How Claude's Limits Actually Work
[original source]If you're one of the millions of people who switched to Claude in the last month, you've probably already hit the wall. You opened Claude, had a 20-minute conversation, and got a message telling you to come back later.
Welcome. ChatGPT didn't prepare you for this.
Before we go any further, a hot take: you need to be paying for this. Claude has a free tier, but it's a sample, not a workspace. If you're reading a guide about token management, you're past the point where free is going to cut it. Pro is $20 a month. If you're using Claude for work (and especially if it's making you money or saving you time that's worth money), Max at $100 is where the math starts to make real sense. Claude is a premium product.
I'll show you how to get dramatically more from Claude, where your tokens are actually going, and how to maximize what you can get out of it. Every tip here is something you can do today, in Claude's desktop or web interface.
This whole guide is about two things: managing token cost and managing token burn.
Token cost is what each token costs you against your usage limits. This is controlled by which model you're using. The same conversation on Opus costs five times more than on Haiku, even if the exact same number of tokens are exchanged. Most people never touch the model selector and don't realize they're paying a premium on every single interaction.
Token burn is how many tokens you're consuming through your habits and workflow. Uploading a full PDF when you could paste markdown. Letting Claude write a 500-word response when you needed one sentence. Keeping a conversation running for 30 messages when you should have started fresh at message 10. These are the behaviors that multiply your token consumption regardless of which model you're on.
Cost is the multiplier. Burn is the volume. Reduce either one and you'll notice. Reduce both and you'll wonder how you ever hit a limit.
You're probably using a bigger model than you need
This is the single highest-leverage change most people can make, and it takes two seconds.
Claude comes in three models: Haiku, Sonnet, and Opus. Each one consumes your rate limit differently, but not for the reason you'd assume.
Your usage limit isn't a raw token count. It's cost-based. Every token you send or receive has a price, and that price varies dramatically by model. Haiku tokens are cheap. Sonnet tokens cost roughly three times more. Opus tokens cost roughly five times more than Haiku. So an identical conversation — same prompts, same length responses, same number of back-and-forths — will drain your usage limit five times faster on Opus than on Haiku, not because Opus used more tokens, but because each token costs more.
This means you could have a long, rambling, verbose conversation on Haiku and barely dent your limits. The same conversation on Opus would chew through a significant chunk of your allowance. The model you select isn't just about output quality. It's the multiplier on every single token in the conversation.
If you're using Opus for a task that Sonnet could handle (or Sonnet for a task Haiku could handle), you're paying a premium per token for zero quality improvement.
Most people coming from ChatGPT don't even think about model selection because ChatGPT doesn't really make you choose. Claude does, and the default is Sonnet. Sonnet is excellent. But here's what I'd actually recommend: start with Haiku.
Haiku 4.5 is the sleeper of the lineup. Most people dismiss it as the "lite" model, but it rivals the reasoning capabilities of the previous generation's Sonnet. It runs four to five times faster, at roughly a third of the token cost. For quick questions, summaries, brainstorming, drafting emails, simple code edits, everyday back-and-forth conversation, and honestly a lot of things you'd assume need a bigger model? Haiku handles it. Most people could use Haiku for the majority of their daily work and never notice they weren't on Sonnet.
Here's the approach: start every task on Haiku. When you feel the ceiling — the output isn't nuanced enough, it's missing complexity in the reasoning, it can't hold a multi-step plan together — step up to Sonnet for that task. You'll know when you hit it because the answer will feel shallow or it'll miss something you expected it to catch. That's your signal, not before.
Sonnet 4.6 handles complex coding, detailed analysis, long-form writing, research synthesis, and most creative work without breaking a sweat. For the vast majority of people reading this, Sonnet is as much model as you'll ever need. The difference between Sonnet and Opus is genuinely hard to notice unless you're working on seriously complex multi-step reasoning or long-horizon agentic tasks.
Save Opus for the stuff that actually demands it. You'll know those tasks when you see them, and they're rarer than you think.
Full disclosure: I personally default to Opus for almost everything because I believe in overkill for reasons I can't fully defend. But I know I'm leaving usage on the table every time I do it, and I'm not going to tell you to do what I do when what I do is objectively wasteful. Start at Haiku. Step up when you need to. Your usage limits will thank you.
That covers the cost side. Everything from here on is about burn: the habits and workflows that determine how many tokens you're consuming, regardless of which model you're on.
File formats have hidden costs (PDFs are the worst)
This is the one that gets people. PDFs are uniquely expensive in Claude because of how they're processed. When you upload a PDF, Claude extracts the text AND converts every single page into an image. You're paying for both. A 50-page PDF doesn't just cost you the text tokens. It costs you the text tokens plus 50 images worth of visual processing tokens.
Anthropic's own documentation puts the cost at 1,500 to 3,000 tokens per page for text alone, before the image conversion even kicks in. A 50-page document can eat 75,000 to 150,000 tokens just sitting there. On a 200,000-token context window, one PDF can consume most of your working space before you've asked a single question.
But PDFs aren't the only offenders. Every time Claude has to crack open a proprietary file format — DOCX, PPTX, XLSX — there's processing overhead that wouldn't exist if you'd given it the raw content instead. Word documents carry styling, formatting metadata, and structural information that Claude has to parse through to get to the text. PowerPoint files are even worse because every slide has layout data, embedded assets, and formatting layers on top of the actual content. Spreadsheets have formula references, cell formatting, and sheet metadata.
If you're giving Claude a file because you need it to analyze the content (not the layout or design), converting to a text-based format first is almost always cheaper. Plain text works. Rich text works. HTML works. CSV works for tabular data. The point is that any text-based file eliminates the processing overhead of cracking open a proprietary container.
Here's the practical workflow for PDFs specifically: ask Claude to convert it to text first. Upload the PDF and say "Convert this document to clean text." Copy that output, start a new conversation, and paste the text version in. You just cut your token cost by more than half for every subsequent interaction with that document.
For web content, you can skip Claude entirely. Browser extensions like "MarkDownload" or similar tools let you save any webpage as clean text with one click. Instead of asking Claude to search for and read a webpage (which burns tokens on the search, the fetch, and the processing), just grab the content yourself and paste it in.
For spreadsheets, export to CSV before uploading. Claude can parse tabular text directly, and a CSV is a fraction of the overhead of an .xlsx file. If you only need Claude to look at specific columns or rows, paste just those rows as text. There's no reason to upload a 500-row spreadsheet when your question is about 20 rows.
Screenshots are the same problem, just smaller
The file format issue gets the most attention because the numbers are dramatic, but images in regular conversation have the exact same dynamic. Claude tokenizes images by pixel count. A full-screen 1,000 by 1,000 pixel screenshot costs roughly 1,334 tokens. A cropped 200 by 200 pixel region of the same image costs about 54 tokens. That's a 25x difference for what might be the same useful information.
People drop full-screen screenshots into conversations constantly. An error message that takes up three lines of a terminal, surrounded by 90% irrelevant UI. A paragraph from a document they could have just copied as text. Every one of those is paying hundreds or thousands of tokens for visual information that could have been communicated in a sentence.
The test is simple: could you describe what's in this image in one or two sentences? If yes, write the sentences instead. If you're uploading a screenshot of an error message, paste the error text. If you're showing Claude a small UI element, crop to just that element before uploading.
Images earn their token cost when they contain visual structure that language can't efficiently convey: a diagram, a layout, a chart, a design mockup. Everything else is expensive decoration.
Paste less, fence more
This is two habits that solve the same problem: wasted tokens from Claude not understanding what you gave it or why.
Paste less. People dump entire documents into conversations when they have a question about one section. A 15-page report when the relevant context is in paragraph three. A full email chain when only the last two messages matter. An entire codebase when the bug is in one function. Every extra word you paste is tokens Claude has to process, and worse, tokens that sit in your conversation history getting re-transmitted with every subsequent message.
Before you paste anything, ask yourself: what does Claude actually need to see to answer this question? Trim to that. If you're not sure what's relevant, paste just the part you think matters and let Claude ask for more if it needs it. That's cheaper than dumping everything in preemptively.
Fence more. When you do give Claude multiple pieces of context, mark what's what. Simple tags work:
<document>
[your source material here]
</document>
<what bob said>
bobs dumb opinion
</what bob said>
<what sally said>
sallys opinion
</what sally said>
<my thoughts>
my super awesome opinion
</my thoughts>
<instructions>
Summarize the key findings and flag anything that contradicts our Q2 projections.
</instructions>
This costs you maybe 20 extra tokens in tags. What it saves you is the turns where Claude misinterprets your instructions as part of the document, or treats your example as your actual request, and you have to clarify. Each clarification round hauls the full conversation history. Three rounds of "no, I meant THIS part is the document and THIS part is what I want you to do with it" costs orders of magnitude more than those 20 tokens of fencing ever would.
You don't need to learn XML or use any special syntax. Angle brackets with descriptive labels are enough. The point is giving Claude unambiguous boundaries so it gets it right the first time.
Your tools are eating your context and you don't know it
Every connector you have enabled takes up space in your conversation. Google Drive, Slack, Calendar, Linear, whatever you've connected. Each one loads tool definitions into your context window, and those definitions cost tokens whether you use the tools or not.
Claude has two tool access modes, and most people don't even know the setting exists. You'll find it by clicking the "+" button in the lower left of your chat, hovering over "Connectors," then "Tool access."
Tools already Loaded loads every connector into every conversation. If you have five connectors, fine. If you have fifteen, you're giving up a meaningful chunk of your context window before you type your first message.
Load tools when needed loads nothing until Claude searches for the right tool. This saves the most context but comes with a real tradeoff: Claude doesn't know what tools it has available until it goes looking. If your request doesn't obviously signal that a tool could help, Claude won't proactively offer to use one.
Anthropic recommends "Load tools when needed" if you have 10 or more connectors. That's good advice for token conservation, but understand what you're trading. If you need a specific connector to always be there, you can set individual connectors to "Always available" while keeping the rest on demand.
The bigger point: if you're not actively using a connector in your current conversation, it's just overhead. Disable what you don't need, conversation by conversation.
Extended thinking costs tokens too
With Opus 4.6 and Sonnet 4.6, extended thinking has shifted to "adaptive" mode. Claude now decides for itself whether a problem needs deep reasoning or a quick answer. At the default effort level (high), Claude almost always thinks. That thinking uses tokens.
For casual questions, quick edits, brainstorming, or anything that doesn't require multi-step reasoning, you're paying for cognitive overhead you don't need. You can toggle extended thinking off under "Search and tools." (One thing to note: switching between models starts a new conversation. Toggling thinking on or off within the same model does not.)
The adaptive system is genuinely smarter than the old binary toggle. But if you're watching your usage and doing straightforward work, turning it off for those sessions is free savings.
Tell Claude to shut up (nicely)
Most of the tips in this post are about reducing what you send to Claude. This one's about reducing what Claude sends back.
Claude's default mode is thorough. It explains its reasoning, adds context, provides caveats, and generally writes three paragraphs where one would do. That thoroughness is great when you're learning something. It's brutal on your token budget when you just need the answer.
Output tokens count against your usage just like input tokens do. A response that's 800 words when you needed 80 is burning ten times more tokens than necessary, and all of that excess content then sits in your conversation history, getting re-processed with every future message.
The fix is embarrassingly simple: tell Claude what you want. "Just the code, no commentary." "Answer in one sentence." "Give me the diff only." "Three bullet points max." These constraints cost you a few extra words in your prompt and can cut response tokens by half or more. You're not being rude. You're being specific about what you need.
This is especially relevant for iterative work. If you're going back and forth on a document and every revision comes back with a paragraph of "here's what I changed and why," those explanations are accumulating in your context window. "Apply the edits and return just the updated document" keeps the conversation lean.
You're not starting new conversations often enough
This is the most counterintuitive tip and probably the most impactful one.
Every message you send in a conversation includes the entire conversation history. Not a summary. Not a reference. The whole thing. Message one sends just your prompt. Message two sends your prompt plus the entire first exchange plus your new message. Message three sends all of that plus the second exchange. It compounds.
Here's what that actually looks like. Assume each exchange is roughly the same size — call it one unit. Turn 1 costs 2 units. Turn 2 costs 4 units total. Turn 3 costs 6. Turn 4 costs 8. Each turn gets linearly more expensive. But the cumulative cost grows much faster. By turn 5 you've spent 30 units total. By turn 10 you've spent 110 units. If you'd done those same ten tasks as ten separate one-turn conversations, you'd have spent 20 units. The same work, five and a half times more expensive, just because you kept one conversation open.
People treat conversations like documents. They keep adding to the same thread for hours, thinking they're being efficient by keeping everything in one place. They're actually doing the most expensive thing possible.
Claude now has automatic context management (compaction) that kicks in when conversations approach the 200K limit. It summarizes earlier messages to make room. That's better than hitting a wall, but compaction means Claude is working from a summary of what you said, not what you actually said. Details get lost. Nuance gets flattened. Instructions from the beginning of the conversation fade.
The better approach: when you finish one task and start another, open a new conversation. If you need continuity, use what I call a manual checkpoint. At a natural break point, ask Claude to produce a concise summary of everything established: decisions made, constraints confirmed, current state of the work. Copy it. Open fresh. Paste it in. A well-written checkpoint is typically 500 to 1,500 tokens and replaces 5,000 to 15,000 tokens of conversation history. That's an 80 to 95 percent reduction in what you carry forward, and unlike automatic compaction, you control exactly what gets preserved.
While you're in a conversation, batch your questions. Every follow-up you send as a separate message hauls the full conversation history along for the ride. If you have three related questions, combining them into a single message means you pay the context cost once instead of three times.
And here's a related habit that sounds almost too simple to mention: construct your turn in a notepad before you send it. Not in the chat window. In a separate text editor, a notes app, wherever you think. Write out what you want, organize it, make sure you're including everything relevant (and nothing that isn't), then paste the whole thing into Claude. This does two things. First, it naturally batches your thoughts into one clean message instead of a stream of half-formed follow-ups. Second, it forces you to think about what you actually need before you spend tokens asking for it.
If Claude goes sideways on an approach, don't keep arguing in the same thread. Edit your earlier message and try a different direction. The failed attempt disappears from the active context instead of sitting there consuming tokens for the rest of the conversation.
Stop asking Claude to Google things for you
This one's going to be unpopular, but it needs to be said.
When you hit something you don't understand — a technical concept, an API behavior, a feature you've never used — the instinct is to ask Claude to research it for you. Claude will happily oblige. It'll fire off web searches, synthesize results, and give you a nice summary. It'll also burn through tokens doing it.
Here's the problem beyond the token cost: Claude's research is a summary of a summary. It searches, reads snippets, and compresses what it found into an answer. That's two layers of information loss before it reaches you. And if the documentation has been updated since Claude last trained, you're building on a foundation that might be wrong.
The better move: ask Claude to list search terms and websites you can visit with resources you need for the answers, then go to the documentation yourself. Read the relevant section. Copy the parts that matter. Paste them into Claude's conversation as raw context.
This does two things. First, you actually understand what you're working with. Reading the docs forces you to engage with the material in a way that passively receiving a summary never will. Second, Claude gets the real source material instead of its own interpretation of search results. You're giving it ground truth instead of asking it to guess.
The pattern: research yourself, bring the context, let Claude do the thinking. That's using the tool correctly. Asking Claude to both find the information and process it is paying twice for a worse result.
Projects are free context (seriously)
Here's a detail from Anthropic's own documentation that most people completely overlook: content in Projects is cached and doesn't count against your usage limits when reused.
Read that again. If you upload documents to a Project's knowledge base, every conversation within that Project can reference those documents without re-uploading them and without those documents counting against your per-message usage. Compare that to pasting the same document into five separate conversations, paying for it five times.
Projects have two components that serve very different purposes, and understanding the distinction matters for token management.
Project Instructions load into every conversation within that project. Every single one. They're the persistent context that shapes how Claude behaves. Keep these lean: your role, key constraints, style preferences, the essentials. Every word here costs you tokens in every conversation. Think of Project Instructions like a function that runs in a hot loop. Strip everything that isn't load-bearing.
Project Knowledge — your uploaded files and documents — uses RAG on paid plans. That means Claude doesn't load all of it into every conversation. It searches the knowledge base and pulls in only what's relevant to your current question. You can upload a huge amount of material here and Claude will retrieve what it needs without blowing up your context window.
The practical implication: stop pasting the same reference material into conversations. Put it in a Project. Use Project Instructions for the short, essential context that should always be present. Use Project Knowledge for everything else.
Skills: teach Claude once, use it forever
Skills are the single most underused feature in Claude for token management, and almost nobody talks about them because most people don't know they exist.
Every time you explain something to Claude that you've explained before, you're paying tokens for repetition. Every time you re-establish your preferences, your constraints, your style requirements, your workflow, you're burning context on work you've already done. The principle is simple: anything you've figured out once should never cost you tokens again.
A Skill is a reusable instruction file — a markdown file called SKILL.md — that only loads when Claude needs it. Skills sit in your custom settings, and Claude reads them when it recognizes a task that matches the Skill's description. When the task doesn't call for it, the Skill costs you nothing.
Think about it practically. Say you've spent three conversations getting Claude to write in your company's voice. You've refined the tone, corrected the formatting, dialed in the vocabulary. All of that work exists in those conversations and nowhere else. Next week when you need the same thing, you start over. With a Skill, you capture those instructions once. Every future conversation in that Project gets the benefit without re-explaining anything.
The fastest way to build a Skill is to capture a workflow you've already refined. At the end of a conversation where you and Claude figured out the right approach to something, ask Claude to extract the working instructions into a SKILL: "Based on the instructions and corrections I gave you in this conversation, use your skill-creator skill to capture this workflow so I can reuse it." Claude will pull together everything you taught it into a structured file you can upload to a Project.
Name your chats (Claude is terrible at it)
This sounds trivial. It isn't.
Claude auto-generates conversation titles from your first message. These titles are almost universally useless. "Help me with this document" or "Quick question about formatting" tells you nothing when you're scrolling through your chat history two days later.
Why this matters for token burn: Claude has memory and chat search now. It can search your previous conversations and pull relevant context into new ones. But search works on conversation titles and content. If all your conversations have garbage names, finding previous work gets harder. And when you can't find a previous conversation, you start a new one and re-explain everything from scratch. That's pure waste.
Get in the habit of renaming conversations the moment you start them. Give it something descriptive: "Q1 report analysis - client X" or "Blog post draft - token management." Future you will thank present you.
Project memory is isolated (use this deliberately)
Each Project in Claude has its own separate memory space and project summary, completely independent from your global memory and from other Projects. This is a feature, not a bug, but you need to understand it to use it well.
Your global memory (the synthesis Claude builds from your non-project conversations) doesn't automatically inform Project conversations. If you've spent weeks teaching Claude your preferences in standalone chats, none of that carries into a Project unless you set it up there.
The flip side: context you build within a Project stays in that Project. If you're doing client work, this is exactly what you want. No bleed between projects. No accidentally surfacing one client's information in another client's conversation.
Memory is essentially free context. It loads outside your conversation window and gives Claude a running understanding of who you are and what you're working on. Without it, you're re-explaining your background and preferences in every new conversation. Turn memory on (Settings > Capabilities) if you haven't already.
Schedule the heavy lifting for off-peak hours
Anthropic has made it official: your usage limits are not the same at all hours of the day. During peak hours (weekdays, 5am to 11am Pacific), you burn through your five-hour session limits faster. Outside those hours, and all day on weekends, you get significantly more room.
The practical implication is obvious once you see it: stop running your most token-hungry work during peak hours. That document analysis project where you're feeding Claude 80 pages of research? That's an evening task now. The long iterative coding session going back and forth twenty times on the same component? Move it to the afternoon.
If you're on a Pro or Max plan, Cowork has scheduled tasks. You can set up a recurring task, give it a prompt and the connectors it needs, and it runs automatically at whatever time you specify. Think about what that means for token management. You have a weekly report that requires Claude to pull from Slack, summarize project activity, and format a status update — schedule it for Friday at 4pm Pacific. Claude does the work while you're wrapping up for the week. You come back to a finished report.
The real problem
All of these tips will genuinely help you stretch your usage further. But I want to be honest about something: the limits are real, they're tight, and they just got tighter during peak hours. Anthropic recently adjusted session limits so that during weekdays from 5am to 11am Pacific, you move through your five-hour allowance faster than before. Your weekly limits haven't changed, but how they're distributed across the day has. Roughly 7% of users are hitting session limits they wouldn't have before, especially on Pro plans.
If you came to Claude from ChatGPT expecting unlimited conversations, that's not what this is. Claude is a different tool with a different cost structure. The quality of the output is — in my experience — worth the trade-off, but it requires you to be more intentional about how you use it.
The people who thrive on Claude are the ones who think before they type, structure their work in Projects, keep conversations focused, and don't feed Claude the same information twice. That's the actual skill. And the good news is that every single thing in this post is something you can start doing in your next conversation.