feat: enhanced job search with structured listings and pagination by NoahStarkenburg · Pull Request #171 · stickerdaniel/linkedin-mcp-server

NoahStarkenburg · 2026-03-02T01:29:03Z

Problem
search_jobs returns raw innerText from the page, which only captures visible text. Job IDs exist solely in link href attributes (/jobs/view/12345/), so they are never returned. This makes it impossible to chain search_jobs →
get_job_details. Additionally, only one page of results is loaded (~7-10 jobs), and scrolling targets the main window instead of the sidebar container where job cards are rendered.

Closes #195

Solution

Job ID extraction from hrefs

Added _extract_job_listings() which finds job links via querySelectorAll('a[href*="/jobs/view/"]'), extracts the ID from the href, and gets the title from the link's innerText. No DOM walking or markup-dependent card parsing - stays
consistent with the project's innerText-based design.

Multi-page pagination with dynamic offset

search_jobs now accepts a max_pages parameter (1-100, default 3). The pagination offset advances dynamically by the actual number of listings found per page instead of a hardcoded value. Job IDs are deduplicated across pages, and
pagination stops early if a page returns no new results.

Sidebar scrolling fix

LinkedIn renders job cards in a scrollable sidebar div, not the main page. _scroll_job_list() walks up the DOM from the first job link to find the actual scrollable ancestor and scrolls that instead. Loads ~30% more results per page
(tested: 10 vs 7).

Changes

linkedin_mcp_server/scraping/extractor.py - new methods: _extract_job_listings, _scroll_job_list, _extract_job_page; modified: search_jobs
linkedin_mcp_server/tools/job.py - exposed max_pages parameter on the search_jobs MCP tool
tests/test_scraping.py - 6 new tests covering pagination, dedup, early stopping, and clamping

Test plan

All existing tests pass, plus 6 new tests
Live tested job search with 1, 3, 10, 20, and 25 pages (207 unique jobs on 25 pages)
Verified get_job_details works when chained with IDs from search results
Verified deduplication across pages and early stopping
A/B tested sidebar scrolling vs default scroll_to_bottom (10 vs 7 results per page)
Dynamic pagination offset confirmed working (offsets: 0, 10, 21, 32... based on actual results)

Greptile Summary

This PR enhances search_jobs with structured job listings (returning {job_id, title} per result), multi-page pagination via a max_pages parameter, sidebar-aware scrolling, and cross-page deduplication. The implementation enables the search_jobs → get_job_details workflow that was previously impossible.

Strengths:

_extract_job_listings queries a[href*="/jobs/view/"] to extract job IDs and titles without fragile DOM walking — a clean, resilient approach.
_scroll_job_list walks up the DOM from the first job link to find and scroll the actual scrollable sidebar ancestor, yielding ~30% more results per page.
Multi-page pagination with deduplication works as implemented and is live-tested.

Minor issues:

Docstrings in both extractor.py and job.py claim "~10 results per page" but the pagination offset defaults to 25, creating contradictory expectations.
The _scroll_job_list default parameter max_scrolls=25 is never exercised — the only call site overrides it with 20, making the default dead code.

Confidence Score: 4/5

Safe to merge. Minor documentation/parameter inconsistencies do not affect functionality.
The core feature (multi-page job search with ID extraction and deduplication) is solid and live-tested across 25 pages. The two findings are cosmetic issues: docstring inconsistency about page size and a dead default parameter. These don't impact functionality or behavior.
No files require special attention. Update docstrings in extractor.py and job.py (lines 479-481 and 88-90) to say ~25 results per page, and align the default max_scrolls parameter to 20.

Sequence Diagram

sequenceDiagram
    participant LLM as LLM / MCP Client
    participant Tool as search_jobs (job.py)
    participant Extractor as LinkedInExtractor
    participant Page as Playwright Page
    participant LI as LinkedIn

    LLM->>Tool: search_jobs(keywords, location, max_pages)
    Tool->>Extractor: search_jobs(keywords, location, max_pages)

    loop For each page (up to max_pages)
        Extractor->>Extractor: build URL (?start=N)
        Extractor->>Page: _extract_job_page(url)
        Page->>LI: goto(url)
        LI-->>Page: DOM loaded
        Page->>Page: _scroll_job_list() — scroll sidebar ancestors
        Page->>Page: evaluate(main.innerText) — raw text
        Page->>Page: _extract_job_listings() — querySelectorAll a[href*="/jobs/view/"]
        Page-->>Extractor: (text, listings)
        Extractor->>Extractor: deduplicate listings by job_id
        alt new_on_page == 0
            Extractor->>Extractor: early stop
        else more pages remain
            Extractor->>Extractor: sleep(_NAV_DELAY)
        end
    end

    Extractor-->>Tool: {url, sections, job_listings, pages_visited, sections_requested}
    Tool-->>LLM: result dict

_{Last reviewed commit: cee8fd1}

Greptile also left 2 inline comments on this PR.

greptile-apps

_{3 files reviewed, 12 comments}

_{Edit Code Review Agent Settings | Greptile}

linkedin_mcp_server/tools/job.py

linkedin_mcp_server/scraping/extractor.py

tests/test_scraping.py

linkedin_mcp_server/scraping/extractor.py

tests/test_scraping.py

linkedin_mcp_server/scraping/extractor.py

stickerdaniel · 2026-03-05T12:48:45Z

Hey, thanks for the PR and for filing the issue

stickerdaniel · 2026-03-05T12:51:58Z

I won't merge this fix as-is though, because the structured card parsing (walking the DOM to extract title, company, location, work_type, etc. per card) goes against the core design of this project. I deliberately use innerText extraction so this mcp don't break every time LinkedIn changes their markup.

stickerdaniel · 2026-03-05T12:53:08Z

I do want to keep the Job ID extraction from hrefs, Sidebar scrolling and Pagination, but the pagination offset should be dynamic instead of hardcoded start=25

- Extract job IDs and titles from link hrefs on search results pages - Add multi-page pagination (max_pages 1-100, default 3) with dynamic offset - Smart sidebar scrolling that walks the DOM to find scrollable job list container - Deduplicate job IDs across pages, stop early on empty results - Add 6 new tests covering pagination, dedup, early stopping, and clamping

linkedin_mcp_server/scraping/extractor.py

tests/test_scraping.py

linkedin_mcp_server/scraping/extractor.py

greptile-apps bot reviewed Mar 2, 2026

View reviewed changes

linkedin_mcp_server/tools/job.py Outdated Show resolved Hide resolved

linkedin_mcp_server/scraping/extractor.py Show resolved Hide resolved

linkedin_mcp_server/scraping/extractor.py Show resolved Hide resolved

linkedin_mcp_server/scraping/extractor.py Outdated Show resolved Hide resolved

NoahStarkenburg force-pushed the feat/enhanced-job-search branch 4 times, most recently from bb75fb3 to 0634d21 Compare March 2, 2026 13:57

greptile-apps bot reviewed Mar 2, 2026

View reviewed changes

linkedin_mcp_server/scraping/extractor.py Show resolved Hide resolved

tests/test_scraping.py Outdated Show resolved Hide resolved

NoahStarkenburg force-pushed the feat/enhanced-job-search branch 2 times, most recently from 0b3d29a to d529b1f Compare March 2, 2026 15:40

greptile-apps bot reviewed Mar 2, 2026

View reviewed changes

linkedin_mcp_server/scraping/extractor.py Outdated Show resolved Hide resolved

linkedin_mcp_server/scraping/extractor.py Outdated Show resolved Hide resolved

linkedin_mcp_server/scraping/extractor.py Show resolved Hide resolved

tests/test_scraping.py Outdated Show resolved Hide resolved

NoahStarkenburg force-pushed the feat/enhanced-job-search branch from d529b1f to db9501b Compare March 2, 2026 15:48

greptile-apps bot reviewed Mar 2, 2026

View reviewed changes

linkedin_mcp_server/scraping/extractor.py Show resolved Hide resolved

linkedin_mcp_server/scraping/extractor.py Show resolved Hide resolved

NoahStarkenburg mentioned this pull request Mar 4, 2026

[BUG] Search jobs does not return any job Ids #195

Closed

9 tasks

NoahStarkenburg force-pushed the feat/enhanced-job-search branch from db9501b to 4160c34 Compare March 5, 2026 13:52

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

linkedin_mcp_server/scraping/extractor.py Show resolved Hide resolved

tests/test_scraping.py Show resolved Hide resolved

retrigger CI

cee8fd1

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

linkedin_mcp_server/scraping/extractor.py Show resolved Hide resolved

linkedin_mcp_server/scraping/extractor.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enhanced job search with structured listings and pagination#171

feat: enhanced job search with structured listings and pagination#171
NoahStarkenburg wants to merge 2 commits intostickerdaniel:mainfrom
NoahStarkenburg:feat/enhanced-job-search

NoahStarkenburg commented Mar 2, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stickerdaniel commented Mar 5, 2026 •

edited

Loading

Uh oh!

stickerdaniel commented Mar 5, 2026

Uh oh!

stickerdaniel commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NoahStarkenburg commented Mar 2, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stickerdaniel commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stickerdaniel commented Mar 5, 2026

Uh oh!

stickerdaniel commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NoahStarkenburg commented Mar 2, 2026 •

edited by greptile-apps bot

Loading

greptile-apps bot left a comment •

edited

Loading

stickerdaniel commented Mar 5, 2026 •

edited

Loading