Releases: apify/crawlee-python
Releases · apify/crawlee-python
1.0.4
1.0.4 (2025-10-24)
🐛 Bug Fixes
- Respect
enqueue_strategyinenqueue_links(#1505) (6ee04bc) by @Mantisus - Exclude incorrect links before checking
robots.txt(#1502) (3273da5) by @Mantisus - Resolve compatibility issue between
SqlStorageClientandAdaptivePlaywrightCrawler(#1496) (ce172c4) by @Mantisus - Fix
BasicCrawlerstatistics persistence (#1490) (1eb1c19) by @Pijukatel - Save context state in result for
AdaptivePlaywrightCrawlerafter isolated processing inSubCrawler(#1488) (62b7c70) by @Mantisus
1.0.3
1.0.3 (2025-10-17)
🐛 Bug Fixes
- Add support for Pydantic v2.12 (#1471) (35c1108) by @Mantisus
- Fix database version warning message (#1485) (18a545e) by @Mantisus
- Fix
reclaim_requestinSqlRequestQueueClientto correctly update the request state (#1486) (1502469) by @Mantisus - Fix
KeyValueStore.auto_saved_valuefailing in some scenarios (#1438) (b35dee7) by @Pijukatel
1.0.2
1.0.1
1.0.1 (2025-10-06)
🐛 Bug Fixes
- Fix memory leak in
PlaywrightCrawleron browser context creation (#1446) (bb181e5) by @Pijukatel - Update templates to handle optional httpx client (#1440) (c087efd) by @Pijukatel
1.0.0
1.0.0 (2025-09-29)
- Check out the Release blog post for more details.
- Check out the Upgrading guide to ensure a smooth update.
🚀 Features
- Add utility for load and parse Sitemap and
SitemapRequestLoader(#1169) (66599f8) by @Mantisus - Add periodic status logging and
status_message_callbackparameter for customization (#1265) (b992fb2) by @Mantisus - Add crawlee-cli option to skip project installation (#1294) (4d5aef0) by @Pijukatel
- Improve
CrawleeCLI help text (#1297) (afbe10f) by @Pijukatel - Add basic
OpenTelemetryinstrumentation (#1255) (a92d8b3) by @Pijukatel - Add
ImpitHttpClienthttp-client client using theimpitlibrary (#1151) (0d0d268) by @Mantisus - Prevent overloading system memory when running locally (#1270) (30de3bd) by @janbuchar
- Expose
PlaywrightPersistentBrowserclass (#1314) (b5fa955) by @Mantisus - Add
impitoption for Crawlee CLI (#1312) (508d7ce) by @Mantisus - Persist RequestList state (#1274) (cc68014) by @janbuchar
- Persist
DefaultRenderingTypePredictorstate (#1340) (fad4c25) by @Mantisus - Persist the
SitemapRequestLoaderstate (#1347) (27ef9ad) by @Mantisus - Add support for NDU storages (#1401) (5dbd212) by @vdusek
- Add RQ id, name, alias args to
add_requestsandenqueue_linksmethods (#1413) (1cae2bc) by @Mantisus - Add
SqlStorageClientbased onsqlalchemyv2+ (#1339) (07c75a0) by @Mantisus
🐛 Bug Fixes
- Fix memory estimation not working on MacOS (#1330) (ab020eb) by @Pijukatel
- Fix retry count to not count the original request (#1328) (74fa1d9) by @Pijukatel
- [breaking] Remove unused "stats" field from RequestQueueMetadata (#1331) (0a63bef) by @vdusek
- Ignore unknown parameters passed in cookies (#1336) (50d3ef7) by @Mantisus
- Fix
timeoutforstreammethod inImpitHttpClient(#1352) (54b693b) by @Mantisus - Include reason in the session rotation warning logs (#1363) (d6d7a45) by @vdusek
- Improve crawler statistics logging (#1364) (1eb6da5) by @vdusek
- Do not add a request that is already in progress to
MemoryRequestQueueClient(#1384) (3af326c) by @Mantisus - Save
RequestQueueStateforFileSystemRequestQueueClientin default KVS (#1411) (6ee60a0) by @Mantisus - Set default desired concurrency for non-browser crawlers to 10 (#1419) (1cc9401) by @vdusek
Refactor
- [breaking] Introduce new storage client system (#1194) (de1c03f) by @vdusek
- [breaking] Split
BrowserTypeliteral into two different literals based on context (#1070) (72b5698) by @Pijukatel - [breaking] Change method
HttpResponse.readfrom sync to async (#1296) (83fa8a4) by @Mantisus - [breaking] Replace
HttpxHttpClientwithImpitHttpClientas default HTTP client (#1307) (c803a97) by @Mantisus - [breaking] Change Dataset unwind parameter to accept list of strings (#1357) (862a203) by @vdusek
- [breaking] Remove
Request.idfield (#1366) (32f3580) by @Pijukatel - [breaking] Refactor storage creation and caching, configuration and services (#1386) (04649bd) by @Pijukatel
0.6.12
0.6.12 (2025-07-30)
🚀 Features
🐛 Bug Fixes
- Use
perf_counter_nsfor request duration tracking (#1260) (9e92f6b) by @Pijukatel, closes #1256 - Fix memory estimation not working on MacOS (#1330) (8558954) by @Pijukatel, closes #1329
- Fix retry count to not count the original request (#1328) (1aff3aa) by @Pijukatel, closes #1326
- Ignore unknown parameters passed in cookies (#1336) (0f2610c) by @Mantisus, closes #1333
0.6.11
0.6.11 (2025-06-23)
🚀 Features
🐛 Bug Fixes
- Fix
ClientSnapshotoverload calculation (#1228) (a4fc1b6) by @Pijukatel - Use
PSSinstead ofRSSto estimate children process memory usage on Linux (#1210) (436032f) by @Pijukatel - Do not raise an error to check 'same-domain' if there is no hostname in the url (#1251) (a6c3aab) by @Mantisus
0.6.10
0.6.10 (2025-06-02)
🐛 Bug Fixes
- Allow config change on
PlaywrightCrawler(#1186) (f17bf31) by @mylank - Add
payloadtoSendRequestFunctionto supportPOSTrequest (#1202) (e7449f2) by @Mantisus - Fix match check for specified enqueue strategy for requests with redirect (#1199) (d84c30c) by @Mantisus
- Set
WindowsSelectorEventLoopPolicyonly for curl-impersonate template withoutplaywright(#1209) (f3b839f) by @Mantisus - Add support non-GET requests for
PlaywrightCrawler(#1208) (dbb9f44) by @Mantisus - Respect
EnqueueLinksKwargsforextract_linksfunction (#1213) (c9907d6) by @Mantisus
0.6.9
0.6.9 (2025-05-02)
🚀 Features
- Add an internal
HttpClientto be used insend_requestforPlaywrightCrawlerusingAPIRequestContextbound to the browser context (#1134) (e794f49) by @Mantisus - Make timeout error log cleaner (#1170) (78ea9d2) by @Pijukatel
- Add
on_skipped_requestdecorator, to process links skipped according torobots.txtrules (#1166) (bd16f14) by @Mantisus
🐛 Bug Fixes
0.6.8
0.6.8 (2025-04-25)
🚀 Features
- Handle unprocessed requests in
add_requests_batched(#1159) (7851175) by @Pijukatel - Add
respect_robots_txt_fileoption (#1162) (c23f365) by @Mantisus
🐛 Bug Fixes
- Update
UnprocessedRequestto match actual data (#1155) (a15a1f3) by @Pijukatel - Fix the order in which cookies are saved to the
SessionCookiesand the handler is executed forPlaywrightCrawler(#1163) (82ff69a) by @Mantisus - Call
failed_request_handlerforSessionErrorwhen session rotation count exceeds maximum (#1147) (b3637b6) by @Mantisus