fix(auth): import all LinkedIn cookies in cross-platform bridge#217
Open
andrema2 wants to merge 14 commits intostickerdaniel:mainfrom
Open
fix(auth): import all LinkedIn cookies in cross-platform bridge#217andrema2 wants to merge 14 commits intostickerdaniel:mainfrom
andrema2 wants to merge 14 commits intostickerdaniel:mainfrom
Conversation
- get_my_recent_posts: incremental scroll with scrollHeight stable stop, seen_urns dedupe - Add _expand_comments_section helper (Load more / Ver mais) - get_post_comments: use data-urn urn:li:comment, top-level only, expand before extract - _get_current_user_name: avatar alt, Me/Eu menu, fallback nav link - Notifications filter: PT/EN terms (comentário, comentou, resposta, respondeu) - Tests: legacy evaluate format, find_unreplied since_days/max_scrolls assert Made-with: Cursor
- Add humanized delays with jitter (1.5-4.0s random) replacing fixed 2.0s - Add in-memory TTL cache (5min) to avoid re-scraping same pages - Add session-level rate limit awareness with exponential backoff - Optimize find_unreplied_comments: cap fallback at 5 navigations - Improve notifications path to return early on successful empty result - Cache auth checks for 120s to reduce redundant DOM queries - Cache post comments to avoid re-fetching in same session Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add new MCP tool to read the text content of a specific LinkedIn post given its URL, URN, or activity ID. Reuses LinkedInExtractor.extract_page() for navigation, caching, rate limits, and noise stripping. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add MCP tool to scrape LinkedIn notifications page, returning structured items with type classification (comment, reaction, connection, mention, endorsement, job, post, birthday, etc.) Closes stickerdaniel#211 Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Implement since_days date filtering in get_my_recent_posts - Add get_notifications, get_post_content, get_profile_recent_posts to scraping/__init__.py exports - Use link-based dedup key for notifications to reduce false positives - Include current_user_name in comment cache key to prevent stale reply detection Co-Authored-By: Claude Opus 4.6 <[email protected]>
The cookie bridge only imported li_at and li_rm, discarding session cookies (JSESSIONID, bcookie, lidc, etc.) that LinkedIn requires for valid requests. This caused empty responses after bridge activation. Closes stickerdaniel#216 Co-Authored-By: Claude Opus 4.6 <[email protected]>
Comment on lines
+218
to
222
| # Verify that required auth cookies are present | ||
| cookie_names = {c.get("name") for c in cookies} | ||
| if not self._REQUIRED_COOKIE_NAMES & cookie_names: | ||
| logger.warning("No auth cookies (li_at/li_rm) found in %s", path) | ||
| return False |
Contributor
There was a problem hiding this comment.
No unit tests for the new import_cookies behavior
The core fix of this PR — filtering cookies by domain instead of by name — has no dedicated unit tests. The only reference to import_cookies in the test suite (tests/test_browser_driver.py:40) mocks it out entirely (browser.import_cookies = AsyncMock(return_value=False)), so the new domain-filter logic and the updated required-cookie check are not exercised at all.
Consider adding tests to tests/test_browser_driver.py (or a new tests/test_core_browser.py) that cover at minimum:
- A cookie file containing only LinkedIn-domain cookies (including
li_at) → should returnTrueand import all of them. - A cookie file containing mixed-domain cookies → only LinkedIn cookies should be imported.
- A cookie file with LinkedIn cookies but neither
li_atnorli_rm→ should returnFalse. - An empty or missing cookie file → should return
False.
Without these, a regression to the old name-based filtering (or a typo in the domain string) would go undetected.
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/core/browser.py
Line: 218-222
Comment:
**No unit tests for the new `import_cookies` behavior**
The core fix of this PR — filtering cookies by domain instead of by name — has no dedicated unit tests. The only reference to `import_cookies` in the test suite (`tests/test_browser_driver.py:40`) mocks it out entirely (`browser.import_cookies = AsyncMock(return_value=False)`), so the new domain-filter logic and the updated required-cookie check are not exercised at all.
Consider adding tests to `tests/test_browser_driver.py` (or a new `tests/test_core_browser.py`) that cover at minimum:
- A cookie file containing only LinkedIn-domain cookies (including `li_at`) → should return `True` and import all of them.
- A cookie file containing mixed-domain cookies → only LinkedIn cookies should be imported.
- A cookie file with LinkedIn cookies but **neither** `li_at` nor `li_rm` → should return `False`.
- An empty or missing cookie file → should return `False`.
Without these, a regression to the old name-based filtering (or a typo in the domain string) would go undetected.
How can I resolve this? If you propose a fix, please make it concise.Add dedicated tests for BrowserManager.import_cookies covering domain filtering, auth-cookie validation, empty/missing files, and domain normalization. Remove unused _FEED_URL constant from posts.py. Co-Authored-By: Claude Opus 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
li_atandli_rmJSESSIONID,bcookie,lidc,bscookie,liap, etc.) are now preserved during cross-platform bridgeli_at/li_rm) are present before importingCloses #216
Test plan
--loginon macOS to create fresh profile and cookies.jsoncookies.jsoncontains all LinkedIn cookies after exportuv run pytest— 163 tests passing🤖 Generated with Claude Code
Greptile Summary
This PR delivers the stated cookie-bridge fix —
import_cookiesnow filters by domain ("linkedin.com" in domain) instead of by cookie name, importing all LinkedIn session cookies (e.g.JSESSIONID,bcookie,lidc) while still validating that at least one auth token (li_at/li_rm) is present before committing. It also ships a significant batch of new features: posts/comments/notifications scraping tools, a TTL scraping cache, session-level rate-limit state with exponential backoff, an auth-check TTL cache in the driver, and humanized navigation delays. New unit tests specifically address theimport_cookiesbehavior requested in the previous review.Key changes:
core/browser.py:import_cookiesdomain filter replaces name-based filter;_AUTH_COOKIE_NAMES→_REQUIRED_COOKIE_NAMESfor clarityscraping/posts.py: New 824-line module forget_my_recent_posts,get_post_comments,get_post_content,get_notifications,find_unreplied_commentsscraping/cache.py: New in-memory TTL cache (ScrapingCache) with module-level singletoncore/utils.py: NewRateLimitState(exponential backoff),humanized_delay(),wait_for_cooldown();detect_rate_limitnow records state on each triggerdrivers/browser.py: 120s auth-check cache (_auth_valid_until),scraping_cache.clear()on browser closescraping/extractor.py: Integrates cache, humanized delays, and rate-limit state trackingIssues found:
_unreplied_via_notificationsthe JS expression(a.closest('li') || a.closest('div')).innerTextcan throwTypeErrorif the anchor has noliordivancestor, silently triggering the expensive post-scanning fallbackTestGetMyRecentPostsmockspage.evaluateto return a plain list (the legacy code path) rather than the{ items, scrollHeight }dict the current JS produces, leaving the scroll-loop and height-comparison logic untestedConfidence Score: 4/5
get_my_recent_postsis not exercised. Neither blocks merging but both are worth addressing.linkedin_mcp_server/scraping/posts.py(JS null guard in_unreplied_via_notifications) andtests/test_posts_scraping.py(dict-path coverage forget_my_recent_posts)Important Files Changed
import_cookiesnow filters by domain ("linkedin.com" in domain) instead of by name, and validates that at least one required auth cookie (li_at/li_rm) is present before importing. Logic is correct and well-tested intest_core_browser.py._unreplied_via_notificationsJS evaluation that could silently trigger the expensive fallback path on structural DOM edge cases.RateLimitStatewith exponential backoff (30s → 300s cap),humanized_delay(), andwait_for_cooldown(). Implementation is clean;detect_rate_limitnow records state on each detection. Module-level singleton reset is handled inconftest.py._auth_valid_until) to avoid redundant DOM queries on every tool call. Cache is correctly invalidated inclose_browser()andreset_browser_for_testing().scraping_cache.clear()is also called on close.(value, expires_at)tuples. Clean implementation withget,put,invalidate, andclear. Module-level singletonscraping_cacheis well-tested intest_cache.py.import_cookiesunit tests. Covers all four key scenarios: LinkedIn-only cookies, mixed-domain filtering, missing auth cookies → False, empty/missing file → False, and domain normalization._normalize_post_url,get_post_comments,find_unreplied_comments, andget_notifications. However,get_my_recent_poststests mockevaluateto return a plain list (legacy path) rather than the dict the JS actually produces, leaving the scroll-loop and dict-handling code paths untested.get_my_recent_posts,get_post_comments,get_post_content,get_notifications,find_unreplied_comments. Consistent with existing tool patterns; all havereadOnlyHint=Trueand usehandle_tool_errorfor error handling.scraping_cache,humanized_delay,rate_limit_state, andwait_for_cooldown. Fixed static_NAV_DELAYreplaced with randomized delay. Bothextract_pageand_extract_overlaynow cache successful results and callrate_limit_state.record_success()after navigation.Sequence Diagram
sequenceDiagram participant Client as MCP Client participant Tool as tools/posts.py participant Driver as drivers/browser.py participant Scraper as scraping/posts.py participant Cache as scraping/cache.py participant Page as Patchright Page participant RLS as RateLimitState Client->>Tool: get_my_recent_posts(limit) Tool->>Driver: ensure_authenticated() Note over Driver: Return early if _auth_valid_until not expired Driver->>Driver: validate_session() [DOM check] Driver-->>Tool: ok Tool->>Driver: get_or_create_browser() Driver-->>Tool: browser Tool->>Scraper: get_my_recent_posts(page, limit) Scraper->>RLS: wait_for_cooldown() Scraper->>Page: goto(_MY_POSTS_URL) Scraper->>RLS: record_success() loop scroll & collect until limit or stable height Scraper->>Page: evaluate(JS, limit) Page-->>Scraper: {items, scrollHeight} Scraper->>Page: scroll_to_bottom() end Scraper-->>Tool: posts[] Tool-->>Client: {posts: [...]} Client->>Tool: get_post_comments(post_url) Tool->>Scraper: get_post_comments(page, url) Scraper->>Cache: get(cache_key) alt cache hit Cache-->>Scraper: cached comments[] else cache miss Scraper->>RLS: wait_for_cooldown() Scraper->>Page: goto(post_url) Scraper->>RLS: record_success() Scraper->>Page: evaluate(JS comments extractor) Page-->>Scraper: comments[] Scraper->>Cache: put(cache_key, comments) end Scraper-->>Tool: comments[] Tool-->>Client: {comments: [...]} Client->>Tool: find_unreplied_comments(since_days, max_posts) Tool->>Scraper: find_unreplied_comments(page, ...) Scraper->>Scraper: _unreplied_via_notifications() alt notifications have comment items Scraper-->>Tool: unreplied from notifications else notifications empty (loaded OK) Scraper-->>Tool: [] else notifications failed (None) Scraper->>Scraper: get_my_recent_posts() fallback loop each post (max 5 navigations) Scraper->>Scraper: get_post_comments() end Scraper-->>Tool: unreplied from post scan end Tool-->>Client: {unreplied_comments: [...]}Comments Outside Diff (3)
scripts/test_my_recent_posts.py, line 1910-1913 (link)Portuguese strings in an otherwise English codebase
This script uses Portuguese for all user-facing output (e.g.
"Erro de autenticação. Faça login uma vez:","Buscando até … posts","Encontrados:","Nenhum post encontrado","JSON completo:"). The rest of the project — docs, comments, log messages, and other scripts — is entirely in English. This inconsistency makes the script harder to use for non-Portuguese-speaking contributors and conflicts with the project's language convention.Please translate these strings to English for consistency.
Prompt To Fix With AI
linkedin_mcp_server/scraping/posts.py, line 1443 (link)Potential null-dereference in JS evaluation
a.closest('li') || a.closest('div')can returnnullif the anchor element has no<li>or<div>ancestor (e.g., an anchor directly under<article>or<section>). Calling.innerTextonnullthrows aTypeError, which causespage.evaluate()to reject. The outerexcept Exceptionhandler then returnsNone, silently falling back to the expensive full post-scanning path even when the notifications page loaded correctly.Add a null guard before accessing
.innerText:tests/test_posts_scraping.py, line 2349-2376 (link)Tests exercise legacy code path, not the current JS return format
The
get_my_recent_postsJS now returns{ items: [...], scrollHeight: ... }(a dict), but these tests mockpage.evaluateto return a plain list. This causes the tests to exercise the legacyelsefallback branch in the Python code instead of the primaryisinstance(raw, dict)path.As a result, the scroll-loop termination logic (height comparison, multi-call behavior) and the dict-path deduplication are not tested at all. A regression in the dict-handling code would silently go undetected.
Consider adding a complementary test that mocks
evaluateto return the dict format the JS actually produces:Prompt To Fix All With AI
Last reviewed commit: 3532114