feat(browser): implement persistent browser context for session management#128
feat(browser): implement persistent browser context for session management#128irvingpop wants to merge 2 commits intostickerdaniel:mainfrom
Conversation
Greptile OverviewGreptile SummaryReplaces manual Key Changes:
Benefits:
Migration: Confidence Score: 4.5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant CLI as cli_main.py
participant Browser as browser.py
participant Persistent as PersistentBrowserManager
participant Playwright
User->>CLI: Start server
CLI->>CLI: Check needs_migration()
alt Legacy session exists
CLI->>Browser: migrate_from_legacy_session()
Browser->>Browser: Load legacy BrowserManager
Browser->>Browser: Extract storage state
Browser->>Persistent: Create new context
Browser->>Persistent: Transfer state
Browser->>Browser: Verify login
Browser-->>CLI: Migration successful
end
CLI->>Browser: get_or_create_browser()
Browser->>Persistent: Initialize with user_data_dir
Browser->>Persistent: start()
Persistent->>Playwright: Start playwright
Persistent->>Playwright: Launch persistent context
Note over Playwright: State persists automatically
Playwright-->>Persistent: BrowserContext with Page
Persistent-->>Browser: PersistentBrowserManager
Browser->>Browser: Navigate to LinkedIn
Browser->>Browser: Verify authentication
Browser-->>CLI: Authenticated browser
CLI->>CLI: Start FastMCP server
Note over CLI: Tools use singleton browser
User->>CLI: Shutdown
CLI->>Browser: close_browser()
Browser->>Persistent: close()
Persistent->>Playwright: Close context
Persistent->>Playwright: Stop playwright
Note over Persistent: Session persisted
|
| await persistent.start() | ||
|
|
||
| # Copy cookies from old session to new persistent context | ||
| storage_state = await temp_browser.context.storage_state() |
There was a problem hiding this comment.
Verify BrowserManager.context property exists - this relies on an undocumented interface from linkedin_scraper
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/drivers/browser.py
Line: 266:266
Comment:
Verify `BrowserManager.context` property exists - this relies on an undocumented interface from `linkedin_scraper`
How can I resolve this? If you propose a fix, please make it concise.…ement Replace manual session.json file management with Playwright's persistent browser context. Sessions now persist automatically in browser profile directory, eliminating need for manual save/load cycles. **Major Changes:** - Add PersistentBrowserManager using launch_persistent_context() - Change session storage: session.json file → browser-profile/ directory - Add automatic migration for existing session.json users - Update configuration with --user-data-dir option - Fix CLI default path (session.json → browser-profile) **Breaking Changes:** - Session location changed from ~/.linkedin-mcp/session.json to ~/.linkedin-mcp/browser-profile/ - Automatic migration provided for existing users - Version bumped to 3.0.0 **Benefits:** - More reliable cookie persistence (behaves like real browser) - No manual save/load cycles needed - Better Docker support with standard volume mount pattern - More LinkedIn-friendly (reduces CAPTCHAggers) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
|
Hey, can you explain the re-authenticating issues you had? The session management is implemented in the upstream scraper; maybe create an issue there suggesting the use of Playwright's persistent browser context. |
|
Hey Daniel, thanks for the quick response! Maybe I got caught in weird moment, I was getting bitten by the is_logged_in() issue and kept having to re-authenticate every time and it was getting rather tedious. But that had me thinking: the session IDs won't last forever, and the is_logged_in() detection is bound to break in the future because it is inherently fragile. So why not make it a little bit easier on myself and others by reusing the same browser session, rather than a fresh one every time? I do agree this could be more elegantly implemented in the upstream scraper, but I saw that project had a really long queue of unreviewed PRs and plus I wanted to verify this was even the right solution so I implemented here. Totally understand if you'd rather see it upstreamed, and if so I can work on that but it'll be a much more circuitous route. |
|
I see where you're coming from, but I think the upstream PR backlog is mostly stale v2 code. My recent issues there were resolved quite fast. |
|
My main constraint is avoiding the maintenance burden of custom session management within this repository |
|
Fair point, and totally understandable. If I refactored this such that persistent context stuff went into the scraper library, would you accept a PR to utilize that? |
|
Yes absolutely |
|
Upstream PR: joeyism/linkedin_scraper#270 |
Summary
Replaces manual session.json file management with Playwright's persistent browser context for more reliable LinkedIn authentication and session persistence.
Motivation
Changes
Core Implementation
PersistentBrowserManagerclass usinglaunch_persistent_context()~/.linkedin-mcp/browser-profile/directoryMigration
session.jsonon first runsession.json.backupConfiguration
--user-data-dirCLI option for custom profile locationssession.json, nowbrowser-profile)Breaking Changes
This is a breaking change (v3.0.0):
~/.linkedin-mcp/session.json→~/.linkedin-mcp/browser-profile/--get-sessionto re-authenticate if migration failsBenefits
Testing
Verification Checklist
--get-sessioncreates profile--session-inforeports correct status--clear-sessionremoves profileMigration Guide for Users
Existing users (v2.x → v3.0):
Migration is automatic! On first run with v3.0, the server will:
session.jsonsession.json.backupBtw, I'm happy to go with whatever version numbering you want here.