-
Notifications
You must be signed in to change notification settings - Fork 515
Closed
Labels
t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Description
Test code
import asyncio
from crawlee.crawlers import ParselCrawler, ParselCrawlingContext
from crawlee import ConcurrencySettings
from crawlee.storage_clients import MemoryStorageClient
async def main() -> None:
storage_client = MemoryStorageClient()
crawler = ParselCrawler(
storage_client=storage_client,
concurrency_settings=ConcurrencySettings(desired_concurrency=20),
)
@crawler.router.default_handler
async def request_handler(context: ParselCrawlingContext) -> None:
data = {
'url': context.request.url,
'title': context.selector.css('title::text').get(),
}
await context.push_data(data)
await context.enqueue_links(strategy='same-domain')
await crawler.run(['http://crawlee.dev/'])
if __name__ == '__main__':
asyncio.run(main())Results:
[crawlee._autoscaling.autoscaled_pool] INFO Waiting for remaining tasks to finish
[ParselCrawler] INFO Final request statistics:
┌───────────────────────────────┬────────────────┐
│ requests_finished │ 9024 │
│ requests_failed │ 0 │
│ retry_histogram │ [9024] │
│ request_avg_failed_duration │ None │
│ request_avg_finished_duration │ 979.7ms │
│ requests_finished_per_minute │ 655 │
│ requests_failed_per_minute │ 0 │
│ request_total_duration │ 2h 27min 20.9s │
│ requests_total │ 9024 │
│ crawler_runtime │ 13min 46.7s │
└───────────────────────────────┴────────────────┘
Expected number of links for crawlee.dev - 4512. Apparently, each link was processed twice.
Metadata
Metadata
Assignees
Labels
t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.