Skip to content

PlaywrightCrawler extract_links does not respect strategy #1212

@phughesion-h3

Description

@phughesion-h3

I was crawling a test site that I have hosted locally (localhost).

My PlaywrightCrawler subclass must add additional user_data to each request, so I extract and form each request manually.

new_requests = await context.extract_links(strategy='same-origin')
for new_request in new_requests:
        print(f"[LINK] Extracted link: {new_request.url}")

I start my crawl pointed at http://localhost, yet the crawler ends up crawling YouTube since there is a link to YouTube on my site.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions