Fix homepage url restriction #1013

ljluestc · 2025-06-22T16:53:51Z

Fixes issue #134 (originally #455): href url in news html source and scrape urls from Newspaper counts differ.

Added restrict_to_homepage_urls option to newspaper.build to limit articles to homepage <a href> links.
Integrated BeautifulSoup for homepage URL extraction.
Fixed indexing bug in user example code.
Added test case for Reuters homepage scraping.
Updated documentation with new option.

ljluestc added 2 commits June 22, 2025 09:47

Add restrict_to_homepage_urls option to limit scraping to homepage li…

dd61ba7

…nks (codelucas#134)

remove irrelevant commits

60529ea

Provide feedback