Skip to content

Authentication

Many websites require authentication before they serve real content. CSP Analyser supports three authentication patterns, each suited to a different workflow.

Storage State File

The recommended approach for repeatable, automated analysis. A storage state file is a JSON snapshot of cookies, localStorage, and sessionStorage exported by Playwright.

Generating a storage state file

Use the interactive command with --save-storage-state to log in manually and export the session:

bash
# Open a headed browser, log in, browse around, then close the browser
csp-analyser interactive https://app.example.com --save-storage-state auth.json

When you close the browser, the session's cookies, localStorage, and sessionStorage are saved to auth.json with secure file permissions (0600). Permissions are enforced even when overwriting an existing file, and symlink targets are rejected. You can then reuse this file for headless crawls.

Workflow: interactive login → headless crawl

bash
# Step 1: Log in interactively and save the session
csp-analyser interactive https://app.example.com --save-storage-state auth.json

# Step 2: Use the saved session for a deep headless crawl
csp-analyser crawl https://app.example.com --storage-state auth.json --depth 3 --max-pages 50

This is the recommended workflow for authenticated sites. Log in once interactively, then run repeatable headless crawls with the saved state.

You can also generate a storage state file with Playwright directly:

bash
npx playwright codegen --save-storage=auth.json https://app.example.com

Using a storage state file

bash
csp-analyser crawl https://app.example.com --storage-state auth.json
json
// start_session or crawl_url tool
{
  "targetUrl": "https://app.example.com",
  "storageStatePath": "/absolute/path/to/auth.json"
}

The file must have a .json extension and must exist on disk. Symlinks are resolved to prevent path traversal. The real target must also end in .json.

Available through the MCP start_session, crawl_url, and audit_policy tools when you already have session cookies (e.g., extracted from browser DevTools or a login API response). Cookies are injected into a fresh browser context before navigation.

json
// MCP start_session tool - cookies parameter
{
  "targetUrl": "https://app.example.com",
  "cookies": [
    {
      "name": "session_id",
      "value": "abc123",
      "domain": "app.example.com",
      "path": "/",
      "httpOnly": true,
      "secure": true,
      "sameSite": "Lax"
    }
  ]
}

Only name and value are required. Optional fields are domain, path, httpOnly, secure, and sameSite (Strict, Lax, or None). If domain is omitted, it defaults to the target URL's hostname. If path is omitted, it defaults to /.

Cookie names and values are validated against RFC 6265:

  • Names must be valid HTTP tokens (no control characters, spaces, or separators)
  • Values must not contain control characters, semicolons, spaces, double quotes, commas, or backslashes

Manual Login (Interactive Mode)

For sites with complex login flows (MFA, CAPTCHA, SSO redirects) that cannot be captured as cookies or storage state.

bash
csp-analyser interactive https://app.example.com

This opens a headed (visible) Chromium browser. You log in manually, browse the pages you want to analyse, and close the browser tab when done. CSP Analyser captures violations in real time while you browse.

WARNING

Manual login requires headed mode. It cannot be used in headless environments (CI, SSH without X11, containers). Use a storage state file for those environments.

Choosing an auth pattern

PatternBest forRepeatableCI-friendly
Storage stateAutomated crawls, CI pipelinesYesYes
Cookie injectionMCP agent workflowsYesYes
Manual loginComplex auth flows, initial explorationNoNo

sessionStorage and token-based auth (MSAL / Azure AD B2C)

Many modern SPAs use token-based authentication (MSAL, Auth0, Firebase Auth) where tokens are stored in sessionStorage rather than cookies. Playwright's built-in storageState() only captures cookies and localStorage — sessionStorage is normally lost.

CSP Analyser extends the storage state format to also capture and restore sessionStorage. When you use --save-storage-state, sessionStorage is captured through multiple mechanisms to ensure no tokens are lost:

  • beforeunload handler (via addInitScript) — captures the final state right before each page closes, surviving all navigations during the auth flow
  • load + 1s delay — catches async MSAL token writes after auth redirects
  • 5-second periodic snapshots — catches silent token refreshes between page loads

The captured entries are written into the JSON file alongside cookies and localStorage.

When you later use --storage-state to load the file, CSP Analyser:

  1. Passes the file to Playwright (restoring cookies + localStorage as normal)
  2. Reads the sessionStorage extension from the file
  3. Registers an addInitScript that calls sessionStorage.setItem() for each entry — this runs before any page JavaScript, so tokens are available when frameworks like MSAL initialize

This means MSAL access tokens, ID tokens, and refresh tokens are restored correctly:

bash
# Step 1: Log in via Azure AD B2C and save everything
csp-analyser interactive https://app.example.com --save-storage-state auth.json

# Step 2: Headless crawl with full auth state (including sessionStorage tokens)
csp-analyser crawl https://app.example.com --storage-state auth.json

TIP

If the crawl still redirects to login, the tokens have likely expired. MSAL access tokens typically expire after 1 hour. Re-run the interactive login to generate a fresh storage state file.

Cross-origin sessionStorage

Only sessionStorage entries for the target origin are restored. Cross-origin sessionStorage (e.g. from the identity provider domain) cannot be injected and is not useful for CSP crawling.

How CSP injection works with auth redirects

When you analyse an authenticated site, the browser navigates through external identity providers (Azure AD B2C, Okta, Auth0, etc.) before returning to your app. CSP Analyser needs to inject its deny-all CSP header on your app's pages — but not on the IdP's pages, which would pollute the generated policy with violations from the IdP's own resources.

The redirect challenge

CSP Analyser intercepts all HTTP requests via Playwright's route handler and injects CSP headers only on responses from the target origin. Requests to other origins (the IdP login page, intermediate auth endpoints) pass through unmodified.

The challenge is that Playwright's route handler does not re-intercept requests that result from HTTP 302 redirects. When the IdP responds with 302 Location: https://your-app.com/auth/callback, the browser follows the redirect internally and the route handler never sees the callback request — so CSP headers are never injected on the post-auth page.

The workaround

CSP Analyser works around this by intercepting non-target-origin navigation requests and replacing HTTP redirect responses (301/302/303) with a small HTML page that performs the same redirect via JavaScript (window.location.href). The browser processes any Set-Cookie headers from the original response (preserving auth state), then navigates to the redirect target as a fresh request that the route handler can intercept and inject CSP on.

This happens transparently — you won't notice the intermediate page during the auth flow.

Multi-hop redirect chains

Some identity providers redirect through multiple intermediate endpoints before returning to your app (e.g., auth/authorizeProcessAuthkmsiapp/callback). Each hop is handled individually: the route handler rewrites each redirect as a JS navigation, so the browser's full cookie jar is available at every step. Cookie-dependent intermediate endpoints work correctly.

Known limitation: 307/308 redirects

HTTP 307 and 308 redirects preserve the request method and body (unlike 302 which converts to GET). When such a redirect targets your app's origin, CSP Analyser rewrites it as a JS redirect to ensure CSP injection, which converts the request to GET. This can change the behaviour of callback endpoints that depend on POST semantics.

In practice this rarely matters:

  • OAuth 2.0 and OIDC specify 302 for authorization responses — 307/308 on the auth return leg is essentially non-existent
  • SPA callback endpoints typically accept GET (the authorization code is in the URL query string)
  • The alternative (no CSP injection) would silently miss all violations on the post-auth page

For intermediate hops between external origins, 307/308 are passed through unchanged to preserve method and body semantics.

Security notes

DANGER

Storage state files and cookies contain session secrets. Never commit them to version control.

  • Add auth.json and *.storage-state.json to your .gitignore
  • Storage state files can contain localStorage and sessionStorage data, which may include auth tokens, user data, or API keys
  • The --storage-state path is resolved through symlinks using fs.realpathSync() to prevent symlink-based path traversal attacks
  • Cookie values are validated against RFC 6265 to prevent header injection

FAQ

How long does a storage state file stay valid?

It depends on the site's session expiry. Most session cookies expire after hours or days. If a crawl fails with authentication errors, regenerate the storage state by logging in again with csp-analyser interactive --save-storage-state auth.json.

Can I use OAuth / SSO / MFA?

Yes — use the interactive command to log in through any browser-based flow (OAuth redirects, SAML SSO, TOTP/MFA). Once logged in, save the session with --save-storage-state for reuse in headless crawls.

Cookie injection is currently only available through the MCP tools. The CLI supports storage state files (--storage-state) and interactive login. For CLI workflows, generate a storage state file first, then pass it to crawl or audit.

Can I analyse multiple authenticated sites in one session?

No. Each session targets a single URL origin. Run separate crawls for each site, each with its own storage state file or cookie set.


Released under the MIT License.