fix(auth): import all LinkedIn cookies in cross-platform bridge by andrema2 · Pull Request #217 · stickerdaniel/linkedin-mcp-server

andrema2 · 2026-03-12T12:17:42Z

Summary

Cookie bridge now imports all LinkedIn cookies instead of only li_at and li_rm
Session cookies (JSESSIONID, bcookie, lidc, bscookie, liap, etc.) are now preserved during cross-platform bridge
Still validates that required auth cookies (li_at/li_rm) are present before importing

Closes #216

Test plan

Run --login on macOS to create fresh profile and cookies.json
Start MCP server (Docker or uvx) and verify scraping returns content
Verify cookies.json contains all LinkedIn cookies after export
Verify cross-platform bridge imports all cookies (check logs for count)
Run uv run pytest — 163 tests passing

🤖 Generated with Claude Code

Greptile Summary

This PR delivers the stated cookie-bridge fix — import_cookies now filters by domain ("linkedin.com" in domain) instead of by cookie name, importing all LinkedIn session cookies (e.g. JSESSIONID, bcookie, lidc) while still validating that at least one auth token (li_at/li_rm) is present before committing. It also ships a significant batch of new features: posts/comments/notifications scraping tools, a TTL scraping cache, session-level rate-limit state with exponential backoff, an auth-check TTL cache in the driver, and humanized navigation delays. New unit tests specifically address the import_cookies behavior requested in the previous review.

Key changes:

core/browser.py: import_cookies domain filter replaces name-based filter; _AUTH_COOKIE_NAMES → _REQUIRED_COOKIE_NAMES for clarity
scraping/posts.py: New 824-line module for get_my_recent_posts, get_post_comments, get_post_content, get_notifications, find_unreplied_comments
scraping/cache.py: New in-memory TTL cache (ScrapingCache) with module-level singleton
core/utils.py: New RateLimitState (exponential backoff), humanized_delay(), wait_for_cooldown(); detect_rate_limit now records state on each trigger
drivers/browser.py: 120s auth-check cache (_auth_valid_until), scraping_cache.clear() on browser close
scraping/extractor.py: Integrates cache, humanized delays, and rate-limit state tracking

Issues found:

In _unreplied_via_notifications the JS expression (a.closest('li') || a.closest('div')).innerText can throw TypeError if the anchor has no li or div ancestor, silently triggering the expensive post-scanning fallback
TestGetMyRecentPosts mocks page.evaluate to return a plain list (the legacy code path) rather than the { items, scrollHeight } dict the current JS produces, leaving the scroll-loop and height-comparison logic untested

Confidence Score: 4/5

Safe to merge; the core cookie-bridge fix is correct and well-tested, with one low-severity JS null-guard issue and a test coverage gap in the new posts scraper.
The targeted fix is sound and directly addresses the regression from Cookie bridge imports only auth cookies, losing session state #216. New features are well-structured and follow existing patterns. The two issues found are: (1) a JS null-dereference edge case that degrades gracefully by falling back to the post-scanning path rather than crashing, and (2) a test coverage gap where the primary dict-return path of get_my_recent_posts is not exercised. Neither blocks merging but both are worth addressing.
linkedin_mcp_server/scraping/posts.py (JS null guard in _unreplied_via_notifications) and tests/test_posts_scraping.py (dict-path coverage for get_my_recent_posts)

Important Files Changed

Filename	Overview
linkedin_mcp_server/core/browser.py	Core fix: `import_cookies` now filters by domain (`"linkedin.com" in domain`) instead of by name, and validates that at least one required auth cookie (`li_at`/`li_rm`) is present before importing. Logic is correct and well-tested in `test_core_browser.py`.
linkedin_mcp_server/scraping/posts.py	Large new file (824 lines) adding posts/comments/notifications scraping. Contains a potential null-dereference in the `_unreplied_via_notifications` JS evaluation that could silently trigger the expensive fallback path on structural DOM edge cases.
linkedin_mcp_server/core/utils.py	Adds `RateLimitState` with exponential backoff (30s → 300s cap), `humanized_delay()`, and `wait_for_cooldown()`. Implementation is clean; `detect_rate_limit` now records state on each detection. Module-level singleton reset is handled in `conftest.py`.
linkedin_mcp_server/drivers/browser.py	Adds 120s auth-check TTL cache (`_auth_valid_until`) to avoid redundant DOM queries on every tool call. Cache is correctly invalidated in `close_browser()` and `reset_browser_for_testing()`. `scraping_cache.clear()` is also called on close.
linkedin_mcp_server/scraping/cache.py	New in-memory TTL cache backed by a dict of `(value, expires_at)` tuples. Clean implementation with `get`, `put`, `invalidate`, and `clear`. Module-level singleton `scraping_cache` is well-tested in `test_cache.py`.
tests/test_core_browser.py	New test file directly addressing the previous review's request for `import_cookies` unit tests. Covers all four key scenarios: LinkedIn-only cookies, mixed-domain filtering, missing auth cookies → False, empty/missing file → False, and domain normalization.
tests/test_posts_scraping.py	Good coverage for `_normalize_post_url`, `get_post_comments`, `find_unreplied_comments`, and `get_notifications`. However, `get_my_recent_posts` tests mock `evaluate` to return a plain list (legacy path) rather than the dict the JS actually produces, leaving the scroll-loop and dict-handling code paths untested.
linkedin_mcp_server/tools/posts.py	New MCP tool registrations for `get_my_recent_posts`, `get_post_comments`, `get_post_content`, `get_notifications`, `find_unreplied_comments`. Consistent with existing tool patterns; all have `readOnlyHint=True` and use `handle_tool_error` for error handling.
linkedin_mcp_server/scraping/extractor.py	Integrates `scraping_cache`, `humanized_delay`, `rate_limit_state`, and `wait_for_cooldown`. Fixed static `_NAV_DELAY` replaced with randomized delay. Both `extract_page` and `_extract_overlay` now cache successful results and call `rate_limit_state.record_success()` after navigation.

Sequence Diagram

sequenceDiagram
    participant Client as MCP Client
    participant Tool as tools/posts.py
    participant Driver as drivers/browser.py
    participant Scraper as scraping/posts.py
    participant Cache as scraping/cache.py
    participant Page as Patchright Page
    participant RLS as RateLimitState

    Client->>Tool: get_my_recent_posts(limit)
    Tool->>Driver: ensure_authenticated()
    Note over Driver: Return early if _auth_valid_until not expired
    Driver->>Driver: validate_session() [DOM check]
    Driver-->>Tool: ok
    Tool->>Driver: get_or_create_browser()
    Driver-->>Tool: browser
    Tool->>Scraper: get_my_recent_posts(page, limit)
    Scraper->>RLS: wait_for_cooldown()
    Scraper->>Page: goto(_MY_POSTS_URL)
    Scraper->>RLS: record_success()
    loop scroll & collect until limit or stable height
        Scraper->>Page: evaluate(JS, limit)
        Page-->>Scraper: {items, scrollHeight}
        Scraper->>Page: scroll_to_bottom()
    end
    Scraper-->>Tool: posts[]
    Tool-->>Client: {posts: [...]}

    Client->>Tool: get_post_comments(post_url)
    Tool->>Scraper: get_post_comments(page, url)
    Scraper->>Cache: get(cache_key)
    alt cache hit
        Cache-->>Scraper: cached comments[]
    else cache miss
        Scraper->>RLS: wait_for_cooldown()
        Scraper->>Page: goto(post_url)
        Scraper->>RLS: record_success()
        Scraper->>Page: evaluate(JS comments extractor)
        Page-->>Scraper: comments[]
        Scraper->>Cache: put(cache_key, comments)
    end
    Scraper-->>Tool: comments[]
    Tool-->>Client: {comments: [...]}

    Client->>Tool: find_unreplied_comments(since_days, max_posts)
    Tool->>Scraper: find_unreplied_comments(page, ...)
    Scraper->>Scraper: _unreplied_via_notifications()
    alt notifications have comment items
        Scraper-->>Tool: unreplied from notifications
    else notifications empty (loaded OK)
        Scraper-->>Tool: []
    else notifications failed (None)
        Scraper->>Scraper: get_my_recent_posts() fallback
        loop each post (max 5 navigations)
            Scraper->>Scraper: get_post_comments()
        end
        Scraper-->>Tool: unreplied from post scan
    end
    Tool-->>Client: {unreplied_comments: [...]}

Comments Outside Diff (3)

scripts/test_my_recent_posts.py, line 1910-1913 (link)

Portuguese strings in an otherwise English codebase

This script uses Portuguese for all user-facing output (e.g. "Erro de autenticação. Faça login uma vez:", "Buscando até … posts", "Encontrados:", "Nenhum post encontrado", "JSON completo:"). The rest of the project — docs, comments, log messages, and other scripts — is entirely in English. This inconsistency makes the script harder to use for non-Portuguese-speaking contributors and conflicts with the project's language convention.

Please translate these strings to English for consistency.

Prompt To Fix With AI

This is a comment left during a code review.
Path: scripts/test_my_recent_posts.py
Line: 1910-1913

Comment:
**Portuguese strings in an otherwise English codebase**

This script uses Portuguese for all user-facing output (e.g. `"Erro de autenticação. Faça login uma vez:"`, `"Buscando até … posts"`, `"Encontrados:"`, `"Nenhum post encontrado"`, `"JSON completo:"`). The rest of the project — docs, comments, log messages, and other scripts — is entirely in English. This inconsistency makes the script harder to use for non-Portuguese-speaking contributors and conflicts with the project's language convention.

Please translate these strings to English for consistency.

How can I resolve this? If you propose a fix, please make it concise.

linkedin_mcp_server/scraping/posts.py, line 1443 (link)

Potential null-dereference in JS evaluation

a.closest('li') || a.closest('div') can return null if the anchor element has no <li> or <div> ancestor (e.g., an anchor directly under <article> or <section>). Calling .innerText on null throws a TypeError, which causes page.evaluate() to reject. The outer except Exception handler then returns None, silently falling back to the expensive full post-scanning path even when the notifications page loaded correctly.

Add a null guard before accessing .innerText:

tests/test_posts_scraping.py, line 2349-2376 (link)

Tests exercise legacy code path, not the current JS return format

The get_my_recent_posts JS now returns { items: [...], scrollHeight: ... } (a dict), but these tests mock page.evaluate to return a plain list. This causes the tests to exercise the legacy else fallback branch in the Python code instead of the primary isinstance(raw, dict) path.

As a result, the scroll-loop termination logic (height comparison, multi-call behavior) and the dict-path deduplication are not tested at all. A regression in the dict-handling code would silently go undetected.

Consider adding a complementary test that mocks evaluate to return the dict format the JS actually produces:

async def test_returns_posts_from_evaluate_dict_format(
    self, mock_scroll, mock_modal, mock_rate_limit, mock_page
):
    mock_page.evaluate = AsyncMock(
        return_value={
            "items": [
                {
                    "post_url": "https://www.linkedin.com/feed/update/urn:li:activity:1/",
                    "post_id": "urn:li:activity:1",
                    "text_preview": "First post",
                    "created_at": None,
                }
            ],
            "scrollHeight": 1000,
        }
    )
    result = await get_my_recent_posts(mock_page, limit=10)
    assert len(result) == 1
    assert result[0]["post_url"] == "https://www.linkedin.com/feed/update/urn:li:activity:1/"

Prompt To Fix All With AI

This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/posts.py
Line: 1443

Comment:
**Potential null-dereference in JS evaluation**

`a.closest('li') || a.closest('div')` can return `null` if the anchor element has no `<li>` or `<div>` ancestor (e.g., an anchor directly under `<article>` or `<section>`). Calling `.innerText` on `null` throws a `TypeError`, which causes `page.evaluate()` to reject. The outer `except Exception` handler then returns `None`, silently falling back to the expensive full post-scanning path even when the notifications page loaded correctly.

Add a null guard before accessing `.innerText`:

```suggestion
                const container = a.closest('li') || a.closest('div') || a.closest('article') || a;
                const text = (container.innerText || '').trim();
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: tests/test_posts_scraping.py
Line: 2349-2376

Comment:
**Tests exercise legacy code path, not the current JS return format**

The `get_my_recent_posts` JS now returns `{ items: [...], scrollHeight: ... }` (a dict), but these tests mock `page.evaluate` to return a plain list. This causes the tests to exercise the legacy `else` fallback branch in the Python code instead of the primary `isinstance(raw, dict)` path.

As a result, the scroll-loop termination logic (height comparison, multi-call behavior) and the dict-path deduplication are not tested at all. A regression in the dict-handling code would silently go undetected.

Consider adding a complementary test that mocks `evaluate` to return the dict format the JS actually produces:

```python
async def test_returns_posts_from_evaluate_dict_format(
    self, mock_scroll, mock_modal, mock_rate_limit, mock_page
):
    mock_page.evaluate = AsyncMock(
        return_value={
            "items": [
                {
                    "post_url": "https://www.linkedin.com/feed/update/urn:li:activity:1/",
                    "post_id": "urn:li:activity:1",
                    "text_preview": "First post",
                    "created_at": None,
                }
            ],
            "scrollHeight": 1000,
        }
    )
    result = await get_my_recent_posts(mock_page, limit=10)
    assert len(result) == 1
    assert result[0]["post_url"] == "https://www.linkedin.com/feed/update/urn:li:activity:1/"
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: 3532114}

- get_my_recent_posts: incremental scroll with scrollHeight stable stop, seen_urns dedupe - Add _expand_comments_section helper (Load more / Ver mais) - get_post_comments: use data-urn urn:li:comment, top-level only, expand before extract - _get_current_user_name: avatar alt, Me/Eu menu, fallback nav link - Notifications filter: PT/EN terms (comentário, comentou, resposta, respondeu) - Tests: legacy evaluate format, find_unreplied since_days/max_scrolls assert Made-with: Cursor

- Add humanized delays with jitter (1.5-4.0s random) replacing fixed 2.0s - Add in-memory TTL cache (5min) to avoid re-scraping same pages - Add session-level rate limit awareness with exponential backoff - Optimize find_unreplied_comments: cap fallback at 5 navigations - Improve notifications path to return early on successful empty result - Cache auth checks for 120s to reduce redundant DOM queries - Cache post comments to avoid re-fetching in same session Co-Authored-By: Claude Opus 4.6 <[email protected]>

Add new MCP tool to read the text content of a specific LinkedIn post given its URL, URN, or activity ID. Reuses LinkedInExtractor.extract_page() for navigation, caching, rate limits, and noise stripping. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Add MCP tool to scrape LinkedIn notifications page, returning structured items with type classification (comment, reaction, connection, mention, endorsement, job, post, birthday, etc.) Closes stickerdaniel#211 Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Implement since_days date filtering in get_my_recent_posts - Add get_notifications, get_post_content, get_profile_recent_posts to scraping/__init__.py exports - Use link-based dedup key for notifications to reduce false positives - Include current_user_name in comment cache key to prevent stale reply detection Co-Authored-By: Claude Opus 4.6 <[email protected]>

The cookie bridge only imported li_at and li_rm, discarding session cookies (JSESSIONID, bcookie, lidc, etc.) that LinkedIn requires for valid requests. This caused empty responses after bridge activation. Closes stickerdaniel#216 Co-Authored-By: Claude Opus 4.6 <[email protected]>

greptile-apps · 2026-03-12T12:22:23Z

linkedin_mcp_server/core/browser.py

+            # Verify that required auth cookies are present
+            cookie_names = {c.get("name") for c in cookies}
+            if not self._REQUIRED_COOKIE_NAMES & cookie_names:
                logger.warning("No auth cookies (li_at/li_rm) found in %s", path)
                return False


No unit tests for the new import_cookies behavior

The core fix of this PR — filtering cookies by domain instead of by name — has no dedicated unit tests. The only reference to import_cookies in the test suite (tests/test_browser_driver.py:40) mocks it out entirely (browser.import_cookies = AsyncMock(return_value=False)), so the new domain-filter logic and the updated required-cookie check are not exercised at all.

Consider adding tests to tests/test_browser_driver.py (or a new tests/test_core_browser.py) that cover at minimum:

A cookie file containing only LinkedIn-domain cookies (including li_at) → should return True and import all of them.

A cookie file containing mixed-domain cookies → only LinkedIn cookies should be imported.

A cookie file with LinkedIn cookies but neither li_at nor li_rm → should return False.

An empty or missing cookie file → should return False.

Without these, a regression to the old name-based filtering (or a typo in the domain string) would go undetected.

Prompt To Fix With AI

This is a comment left during a code review. Path: linkedin_mcp_server/core/browser.py Line: 218-222 Comment: **No unit tests for the new `import_cookies` behavior** The core fix of this PR — filtering cookies by domain instead of by name — has no dedicated unit tests. The only reference to `import_cookies` in the test suite (`tests/test_browser_driver.py:40`) mocks it out entirely (`browser.import_cookies = AsyncMock(return_value=False)`), so the new domain-filter logic and the updated required-cookie check are not exercised at all. Consider adding tests to `tests/test_browser_driver.py` (or a new `tests/test_core_browser.py`) that cover at minimum: - A cookie file containing only LinkedIn-domain cookies (including `li_at`) → should return `True` and import all of them. - A cookie file containing mixed-domain cookies → only LinkedIn cookies should be imported. - A cookie file with LinkedIn cookies but **neither** `li_at` nor `li_rm` → should return `False`. - An empty or missing cookie file → should return `False`. Without these, a regression to the old name-based filtering (or a typo in the domain string) would go undetected. How can I resolve this? If you propose a fix, please make it concise.

linkedin_mcp_server/scraping/posts.py

Add dedicated tests for BrowserManager.import_cookies covering domain filtering, auth-cookie validation, empty/missing files, and domain normalization. Remove unused _FEED_URL constant from posts.py. Co-Authored-By: Claude Opus 4.6 <[email protected]>

andrema2 and others added 13 commits March 3, 2026 10:35

Update README.md

42039a8

novos mcps

ec02fdf

ok

152f72d

testes

fa220d1

Update AGENTS.md

4e0de6b

Merge branch 'feature/get-post-content' into main

e17ab74

Merge branch 'feature/211-get-notifications'

c2eddcc

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auth): import all LinkedIn cookies in cross-platform bridge#217

fix(auth): import all LinkedIn cookies in cross-platform bridge#217
andrema2 wants to merge 14 commits intostickerdaniel:mainfrom
andrema2:feature/216-import-all-linkedin-cookies

andrema2 commented Mar 12, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

greptile-apps bot Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andrema2 commented Mar 12, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (3)

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andrema2 commented Mar 12, 2026 •

edited by greptile-apps bot

Loading