Crypto
21 Jan 2026
Read 12 min
How to fix 401 unauthorized error when scraping websites now *
how to fix 401 unauthorized error when scraping websites to quickly get access with headers and auth
How to fix 401 unauthorized error when scraping websites: a step-by-step plan
- Confirm permission: Read the site’s terms and robots.txt. Use an official API if available.
- Check the URL: Use the correct path, domain, and protocol (https vs http).
- Identify the auth method: Look for Basic, Bearer (API key/OAuth), or session cookies.
- Reproduce in your browser: Log in normally and watch network requests in DevTools.
- Match required headers: Authorization, Cookie, Accept, Origin/Referer, and others.
- Keep a session: Store and send cookies, follow redirects, and include CSRF tokens if needed.
- Refresh tokens: Renew expired API tokens or sessions before they time out.
- Handle redirects: Let your client follow 302/303 to the login flow when required.
- Respect limits: Use sensible rates and backoff. Do not try to bypass restrictions.
- Ask for help: If you own the account or key, contact support with request details.
Start with the server’s hint: WWW-Authenticate
Read the challenge
When you get a 401, check the WWW-Authenticate header in the response. It often tells you the scheme and next step:- Basic realm=”…” means send an Authorization: Basic header (username:password).
- Bearer realm=”…” or error=”invalid_token” means you need a valid API key or OAuth token.
- Digest or other schemes are rare but also instructive.
Compare a successful request
Use your browser’s DevTools:- Log in or access the page as a normal user (when allowed).
- Open the Network tab and capture the successful request.
- Compare headers, cookies, query params, and the request method with your script.
Handle the common authentication patterns
Basic authentication
Some endpoints still use Basic auth. If you have a username and password, encode them as specified and send an Authorization header. Only do this over HTTPS to protect your credentials. If the server sends a 401 with a Basic challenge, re-check your username, password, and encoding.API keys and Bearer tokens
APIs often use keys or OAuth tokens:- Place the key or token exactly where docs say (often Authorization: Bearer YOUR_TOKEN).
- Confirm the token has the right scope or role for the endpoint.
- Refresh tokens before expiry. Many 401s come from expired tokens.
- Check your system clock. Large clock skew can break signed requests.
Sessions, cookies, and CSRF
Websites that need login often use session cookies. They may also require a CSRF token for POST requests:- Start a session, visit the login page, and get the CSRF token if present.
- Submit the login form with the token, then store the set-cookie headers.
- Send the cookies with each request, and include the CSRF token for protected actions.
- Follow redirects after login. Many sites set cookies during redirects.
Headers that matter more than you think
Some servers verify not just who you are, but also how you ask:- Authorization: Required for Basic/Bearer. Spacing and capitalization matter.
- Cookie: Include the session cookie after login. Keep it fresh.
- Accept and Content-Type: Match what the site expects, like application/json for APIs.
- Origin and Referer: Some endpoints block requests without the right origin/referrer.
- Accept-Language: Send a simple, valid value if the server relies on it.
- User-Agent: Use a clear, honest user-agent that identifies your crawler and contact email.
Follow the flow, not just the URL
A 401 may appear when you skip steps. Many flows expect you to:- Get a CSRF token from a GET page, then POST with that token.
- Redirect through a login gateway, then land on the resource.
- Consent to scopes in OAuth before using the token.
Respect rules and reduce block signals
Some protection systems respond with 401 when they suspect automation. To avoid that:- Respect robots.txt and the site’s terms. Stop if disallowed.
- Use the official API when offered. It is more stable and faster to integrate.
- Send a crawl delay and backoff on errors. Avoid bursts.
- Cache results to cut repeat requests.
- Contact the site owner for permission or higher rate limits if needed.
Troubleshoot with a simple checklist
If you use an API
- Is the endpoint correct and not a different environment (prod vs staging)?
- Is your API key or token valid, unexpired, and scoped for this endpoint?
- Are you sending the Authorization header exactly as documented?
- Is your system clock in sync (use NTP)?
- Are you following the required content type and method (GET vs POST)?
If you scrape a logged-in site
- Can you log in manually? If not, fix your credentials first.
- Did you include all set-cookie values from the login response?
- Do you send the CSRF token for POST/PUT/DELETE?
- Do you maintain the same session across requests?
- Do Origin and Referer match what the server expects?
If nothing works
- Capture the full request and response headers (no secrets) and compare with a working example.
- Look for hints in WWW-Authenticate and response bodies.
- Try a small test in a tool like Postman to isolate your code from the request.
- Reach out to the site or API support with timestamps, correlation IDs, and a sample request.
Practical examples to guide your fixes
Example: API returns 401 invalid_token
You make a request with an Authorization: Bearer token and get 401 invalid_token. Check the token’s expiry. Refresh it with the refresh token, then retry. If scopes are wrong, request the missing scope and get a new token.Example: Website returns 401 after login
Your script logs in, but the first data request returns 401. Check if the login response set more than one cookie. Store and send them all. Confirm you followed redirects and ended on the final domain. If the next request is a POST, include the CSRF token the page provides.Example: 401 only when sending JSON
Some servers require Content-Type: application/json and a CSRF token for state changes. If you send form data by mistake, the server may reject you. Match Content-Type and send the CSRF header or body field as the site shows in DevTools.When to rethink your approach
Sometimes the cleanest fix is to stop scraping the page and switch to the official API. You will get stable auth, clear rate limits, and fewer errors. If the site blocks scraping by policy, do not push back. Ask for data access, partner with the site, or adjust your project scope. In short, knowing how to fix 401 unauthorized error when scraping websites starts with following the site’s real auth flow, sending the right headers and cookies, and staying within the rules. Trace the request, match what works in your browser, and keep tokens and sessions fresh. If the server says no, ask for access or use the API.For more news: Click Here
FAQ
* The information provided on this website is based solely on my personal experience, research and technical knowledge. This content should not be construed as investment advice or a recommendation. Any investment decision must be made on the basis of your own independent judgement.
Contents