crawl
Headlessly crawl a website, capture CSP violations, and generate a policy.
Usage
csp-analyser crawl <url> [options]Options
| Option | Default | Description |
|---|---|---|
--depth <n> | 1 | How many links deep to follow from the start URL. 0 means visit only the start URL. |
--max-pages <n> | 10 | Maximum number of pages to visit before stopping. Must be a positive integer. |
--strictness <level> | moderate | Policy generation strictness: strict, moderate, or permissive. |
--format <fmt> | header | Output format: header, meta, nginx, apache, cloudflare, cloudflare-pages, azure-frontdoor, helmet, or json. |
--storage-state <path> | -- | Path to a Playwright storage state JSON file for authenticated crawling. |
--violation-limit <n> | 10000 | Maximum violations to capture per session. Set to 0 for unlimited. |
--nonce | false | Replace 'unsafe-inline' with nonce placeholders in script/style directives. |
--strict-dynamic | false | Add 'strict-dynamic' alongside nonces in script directives. Implies --nonce. |
--hash | false | Compute SHA-256 hashes for all inline content and remove 'unsafe-inline' from directives that have hash sources. |
--strip-unsafe-eval | false | Remove 'unsafe-eval' from the generated policy even if violations were captured for it. |
--report-only | false | Generate a Content-Security-Policy-Report-Only header instead of an enforcing one. |
How crawling works
The crawler uses breadth-first search (BFS) starting from the provided URL:
- Visit the start URL at depth 0
- Extract all
<a href="...">links from the page - Filter to same-origin links only (different origins are ignored)
- Normalize URLs (strip fragments, resolve relative paths)
- Add unvisited links to the queue at depth + 1
- Repeat until
--max-pagesis reached or no more links remain within--depth
On each page, a deny-all Content-Security-Policy-Report-Only header is injected. Every resource the page tries to load triggers a violation, which is captured via both DOM event listeners and an HTTP report endpoint. After each page loads, the crawler also extracts all inline content (<script>, <style>, event handlers, style attributes) from the DOM and computes SHA-256 hashes. After crawling completes, these violations and inline hashes are aggregated into a minimal policy.
A settlement delay (500ms by default) follows each page load to catch violations from lazy-loaded resources and deferred scripts.
Progress output
Progress is written to stderr so it does not interfere with the policy on stdout. You will see:
- Status messages (browser launch, CSP injection, crawl start)
- A line for each visited page showing the page count, total, and URL
- A summary table at the end with pages visited, violations found, unique directives, elapsed time, and the top violated directives
Examples
Basic crawl
csp-analyser crawl https://example.comCrawls up to 10 pages at depth 1 with moderate strictness, outputs a Content-Security-Policy header.
Deep crawl
csp-analyser crawl https://example.com --depth 3 --max-pages 50Follows links up to 3 levels deep, visiting at most 50 pages.
With authentication
csp-analyser crawl https://app.example.com --storage-state auth.jsonUses a Playwright storage state file (cookies + localStorage) to crawl an authenticated area. Generate the storage state file using npx playwright codegen --save-storage=auth.json.
Output as JSON
csp-analyser crawl https://example.com --format jsonStrict policy
csp-analyser crawl https://example.com --strictness strictGenerates a tighter policy. Prefers hashes over host-based allowlisting where possible.
Report-only mode
csp-analyser crawl https://example.com --report-onlyOutputs a Content-Security-Policy-Report-Only header suitable for testing in production without blocking resources.
Save to file
csp-analyser crawl https://example.com > csp-header.txtProgress still appears in the terminal (stderr), policy goes to the file (stdout).
Unlimited violations
csp-analyser crawl https://large-site.com --violation-limit 0Disables the 10,000 violation cap. Useful for very large sites, but increases memory and database usage.