Skip to content

Update the crawlers structure #764

@vdusek

Description

@vdusek

Place all the crawlers into a common directory named crawlers:

$ tree src/crawlee/crawlers/
src/crawlee/crawlers//
├── basic_crawler/
│   ├── _basic_crawler.py
│   ├── _context_pipeline.py
│   └── __init__.py
├── beautifulsoup_crawler/
│   ├── _beautifulsoup_crawler.py
│   ├── _beautifulsoup_crawling_context.py
│   ├── _beautifulsoup_parser.py
│   └── __init__.py
├── http_crawler/
│   ├── _http_crawler.py
│   ├── _http_crawling_context.py
│   └── __init__.py
├── parsel_crawler/
│   ├── __init__.py
│   ├── _parsel_crawler.py
│   ├── _parsel_crawling_context.py
│   └── _parsel_parser.py
├── playwright_crawler/
│   ├── __init__.py
│   ├── _playwright_crawler.py
│   ├── _playwright_crawling_context.py
│   ├── _playwright_pre_navigation_context.py
│   └── _utils.py
├── static_content_crawler/
│   ├── __init__.py
│   ├── _static_content_crawler.py
│   ├── _static_content_parser.py
│   └── _static_crawling_context.py
└── __init__.py

with __init__.py:

from .basic_crawler import BasicCrawler, BasicCrawlingContext
from .beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from .http_crawler import HttpCrawler, HttpCrawlingContext
from .parsel_crawler import ParselCrawler, ParselCrawlingContext
from .playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from .static_content_crawler import StaticContentCrawler, StaticContentCrawlingContext

To be able to do imports:

from crawlee.crawlers import HttpCrawler, HttpCrawlingContext

Metadata

Metadata

Assignees

Labels

debtCode quality improvement or decrease of technical debt.t-toolingIssues with this label are in the ownership of the tooling team.v0.5

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions