Everything from a one-off extraction to scheduled, distributed crawls — with the data quality and ops tooling to keep them running.
Click elements to capture CSS/XPath selectors. Match-all-similar, attribute & regex extraction, typed fields.
Point at a list page and Crawley proposes the repeating record and its fields automatically.
One-click starter projects scaffold a working crawler + scraper for common sites.
Optional Python transform(row) hooks to derive, clean or drop rows.
Headless-browser rendering, infinite scroll and reCAPTCHA solving for SPA sites.
Geo-targeted managed proxy network, realistic header sets and per-project cookie sessions.
Conditional requests skip unchanged pages; resumable frontiers continue where they left off.
Shared dupefilter across workers, AutoThrottle, retries, robots.txt and per-host rate limits.
Cron schedules with a job queue and concurrency units; export schedules too.
Field validation, quality scores, dedup, and anomaly alerts when item counts drop.
Slack, signed webhooks and email — only when data changed, if you want.
REST API + OpenAPI, the crawley CLI, and exports to CSV/JSON/XLSX, S3, Sheets, BigQuery.