feat(saved-jobs): add saved/bookmarked jobs scraping with pagination and progress by IfThingsThenStuff · Pull Request #167 · stickerdaniel/linkedin-mcp-server

IfThingsThenStuff · 2026-02-26T04:02:46Z

Thanks for your work here - useful tool. Appreciate your efforts. I wanted the ability to read out my saved jobs - so, I added it. It will handle multiple pages.

Let me know if this is aligned to what you would like to include. Let me know of any changes you think are needed.

Summary

Add scrape_saved_jobs to LinkedInExtractor — scrapes the LinkedIn jobs tracker page, extracts job IDs from link hrefs, and paginates through results using numbered page buttons
Add get_saved_jobs MCP tool with progress reporting via on_progress callback
Cap total_pages with max_pages for accurate progress percentages
Use Set for O(1) job ID deduplication in the DOM polling function
Add navigation delay between page clicks consistent with other scraping methods

Test plan

test_scrape_saved_jobs_single_page — single page with progress callback
test_scrape_saved_jobs_paginates — multi-page with progress and ID collection
test_scrape_saved_jobs_timeout_stops_gracefully — timeout returns partial results
test_scrape_saved_jobs_stops_at_max_pages_despite_more_buttons — respects max_pages cap
test_scrape_saved_jobs_empty — empty results
test_get_saved_jobs — tool-level success path
test_get_saved_jobs_error — session expired error handling
Full suite: 112/112 passing

Greptile Summary

Adds get_saved_jobs MCP tool to scrape saved/bookmarked jobs from LinkedIn's job tracker with pagination and progress reporting.

Key Changes:

Pagination: Navigates through numbered page buttons, extracting job IDs from link hrefs (/jobs/view/<id>/)
Deduplication: Uses Set for O(1) job ID lookups in both JavaScript extraction and Python filtering
Progress Reporting: Implements on_progress callback with accurate page counts capped by max_pages
Error Handling: Gracefully handles timeouts, missing buttons, and empty results
Navigation: Adds 2s delay between page clicks consistent with other scraping methods

Implementation Quality:

Exposes max_pages parameter (default 10) for user control
Embeds job IDs in text sections for LLM visibility
Returns both structured job_ids list and formatted text
Comprehensive test suite: 6 new tests covering pagination, timeouts, edge cases
Full test suite passing: 112/112
Documentation updated in README.md, AGENTS.md, docs/docker-hub.md, and CLAUDE.md per development workflow

Previous Review Items Addressed:

✅ Set-based deduplication in _EXTRACT_JOB_IDS_JS (lines 389-390)
✅ Exposed max_pages parameter in tool signature (line 75)
✅ Documentation updates completed across all required files

Confidence Score: 5/5

This PR is safe to merge with no identified issues
Score reflects comprehensive test coverage (112/112 tests passing including 6 new tests), complete documentation updates per development workflow, robust error handling with graceful degradation, efficient O(1) deduplication using Sets, proper pagination logic with multiple safety breaks, and all previous review comments fully addressed
No files require special attention

Important Files Changed

Filename	Overview
linkedin_mcp_server/scraping/extractor.py	Added `scrape_saved_jobs` method with robust pagination logic, Set-based O(1) deduplication, proper error handling, and progress callbacks
linkedin_mcp_server/tools/job.py	Added `get_saved_jobs` MCP tool with exposed `max_pages` parameter, progress reporting, and consistent error handling
tests/test_scraping.py	Added comprehensive test suite with 5 tests covering single-page, pagination, timeout, max_pages cap, and empty results scenarios
tests/test_tools.py	Added tool-level tests for `get_saved_jobs` success path and error handling

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start([Start]) --> Navigate[Navigate to jobs-tracker]
    Navigate --> ExtractPage1[Extract page 1 text and IDs]
    ExtractPage1 --> CountButtons[Count pagination buttons]
    CountButtons --> CalcTotal[Calculate total_pages cap]
    CalcTotal --> ReportP1[Report progress page 1]
    ReportP1 --> CheckMore{More pages?}
    
    CheckMore -->|Yes| CheckButton{Button exists?}
    CheckButton -->|No| Append[Append ID summary]
    CheckButton -->|Yes| ClickButton[Click page button]
    ClickButton --> WaitDelay[Wait nav delay]
    WaitDelay --> WaitNewIDs{Wait for new IDs}
    
    WaitNewIDs -->|Timeout| Append
    WaitNewIDs -->|Success| Scroll[Scroll to bottom]
    Scroll --> ExtractText[Extract page text]
    ExtractText --> ExtractIDs[Extract job IDs]
    ExtractIDs --> FilterDups[Filter duplicates]
    FilterDups --> CheckNewIDs{New IDs?}
    
    CheckNewIDs -->|No| Append
    CheckNewIDs -->|Yes| AddIDs[Add to all_job_ids]
    AddIDs --> ReportProgress[Report progress]
    ReportProgress --> CheckMore
    
    CheckMore -->|No| Append
    Append --> BuildSections[Build sections dict]
    BuildSections --> Return([Return result])

_{Last reviewed commit: 5e68717}

…orting - Fix wait_for_function positional arg bug (arg= keyword required) - Switch pagination from broken "Next" button to numbered page buttons (button[aria-label="Page N"]) which reliably triggers content updates - Replace arbitrary asyncio.sleep() calls with DOM-based waiting via wait_for_function to detect new job links - Embed job IDs summary in section text so LLMs always surface them - Add on_progress callback for per-page progress reporting Co-Authored-By: Claude Opus 4.6 <[email protected]>

Detect total pages from pagination buttons on the page instead of using max_pages (10), so progress reports reflect reality (1/2, 2/2 instead of 1/10, 2/10). Co-Authored-By: Claude Opus 4.6 <[email protected]>

…kups, and add tests Address review findings: cap total_pages with max_pages to fix misleading progress percentages, add _NAV_DELAY between page clicks for rate-limit safety, convert JS prevIds.includes() to Set.has() for O(1) lookups, guard division by zero in _report, fix docstring inaccuracies, and add 5 targeted tests covering progress callbacks, timeout graceful stop, max_pages cap, and session expired error handling. Co-Authored-By: Claude Opus 4.6 <[email protected]>

greptile-apps

_{4 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

linkedin_mcp_server/scraping/extractor.py

linkedin_mcp_server/tools/job.py

Address Greptile review: use Set for O(1) dedup in _EXTRACT_JOB_IDS_JS, expose max_pages parameter on get_saved_jobs MCP tool, and document the new tool in AGENTS.md, README.md, and docs/docker-hub.md. Co-Authored-By: Claude Opus 4.6 <[email protected]>

IfThingsThenStuff · 2026-03-02T00:41:54Z

Hey there @stickerdaniel - hope you're doing well. Is there anything I can do to help get this merged in sir? Thanks in advance. Let me know.

IfThingsThenStuff and others added 3 commits February 25, 2026 22:13

fix(saved-jobs): use actual page count for progress reporting

e7217f0

Detect total pages from pagination buttons on the page instead of using max_pages (10), so progress reports reflect reality (1/2, 2/2 instead of 1/10, 2/10). Co-Authored-By: Claude Opus 4.6 <[email protected]>

IfThingsThenStuff marked this pull request as draft February 26, 2026 04:03

IfThingsThenStuff marked this pull request as ready for review February 26, 2026 04:03

greptile-apps bot reviewed Feb 26, 2026

View reviewed changes

linkedin_mcp_server/scraping/extractor.py Show resolved Hide resolved

linkedin_mcp_server/tools/job.py Outdated Show resolved Hide resolved

linkedin_mcp_server/tools/job.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(saved-jobs): add saved/bookmarked jobs scraping with pagination and progress#167

feat(saved-jobs): add saved/bookmarked jobs scraping with pagination and progress#167
IfThingsThenStuff wants to merge 4 commits intostickerdaniel:mainfrom
IfThingsThenStuff:feat/saved-jobs-fix-and-progress

IfThingsThenStuff commented Feb 26, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IfThingsThenStuff commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

IfThingsThenStuff commented Feb 26, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IfThingsThenStuff commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

IfThingsThenStuff commented Feb 26, 2026 •

edited by greptile-apps bot

Loading