Skip to content

Conversation

@Mantisus
Copy link
Collaborator

Description

  • The need for PR is due to the fact that now the enqueue strategy check, is used in extract_links and can be applied to invalid urls.

@Mantisus Mantisus requested review from Copilot and vdusek June 16, 2025 13:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR changes the behavior of the same-domain enqueue strategy to return False instead of raising an error when either URL lacks a hostname, preventing exceptions on invalid URLs.

  • Replace ValueError with a silent skip (return False) for missing hostnames in same-domain checks.
  • Ensures extract_links can apply the enqueue strategy without erroring on malformed URLs.
Comments suppressed due to low confidence (2)

src/crawlee/crawlers/_basic/_basic_crawler.py:970

  • Add a unit test for the case where origin_url.hostname or target_url.hostname is None with the same-domain strategy to verify that it returns False and does not enqueue.
return False

src/crawlee/crawlers/_basic/_basic_crawler.py:967

  • Update the method docstring for _check_enqueue_strategy to document that URLs without hostnames will simply return False for the same-domain strategy instead of raising an error.
def _check_enqueue_strategy(

@Mantisus Mantisus self-assigned this Jun 16, 2025
Copy link
Collaborator

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable 👍

@vdusek vdusek changed the title fix: Do not raise an error to check ‘same-domain’ if there is no hostname in the url fix: Do not raise an error to check 'same-domain' if there is no hostname in the url Jun 17, 2025
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vdusek vdusek merged commit a6c3aab into apify:master Jun 19, 2025
44 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants