Add search_people_with_past_company tool for advanced people filtering#205
Add search_people_with_past_company tool for advanced people filtering#205guykwan wants to merge 2 commits intostickerdaniel:mainfrom
Conversation
| @@ -0,0 +1,347 @@ | |||
| """ | |||
There was a problem hiding this comment.
File placed in wrong directory — tool never registered
This file is added to tools/person.py at the repository root, but the MCP server imports from linkedin_mcp_server.tools.person (see linkedin_mcp_server/server.py line 20):
from linkedin_mcp_server.tools.person import register_person_toolsThe actual module that is loaded and registered is at linkedin_mcp_server/tools/person.py. This new file at tools/person.py is never imported by anything, so search_people_with_past_company will never be registered as an MCP tool and is completely dead code. The new tool and helper functions need to be added to linkedin_mcp_server/tools/person.py instead.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 1
Comment:
**File placed in wrong directory — tool never registered**
This file is added to `tools/person.py` at the repository root, but the MCP server imports from `linkedin_mcp_server.tools.person` (see `linkedin_mcp_server/server.py` line 20):
```python
from linkedin_mcp_server.tools.person import register_person_tools
```
The actual module that is loaded and registered is at `linkedin_mcp_server/tools/person.py`. This new file at `tools/person.py` is never imported by anything, so `search_people_with_past_company` will never be registered as an MCP tool and is completely dead code. The new tool and helper functions need to be added to `linkedin_mcp_server/tools/person.py` instead.
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| await ctx.report_progress( | ||
| progress=30 + int((idx / len(profile_urls)) * 60), | ||
| total=100, | ||
| message=f"Checking profile {idx + 1}/{len(profile_urls[:max_results * 3])}: {username}" |
There was a problem hiding this comment.
Wrong keyword argument name causes TypeError at runtime
scrape_person is defined with the parameter name requested (see linkedin_mcp_server/scraping/extractor.py line 254):
async def scrape_person(self, username: str, requested: set[str]) -> dict[str, Any]:Calling it with the keyword argument requested_sections will raise a TypeError: scrape_person() got an unexpected keyword argument 'requested_sections' at runtime, causing every profile check to fail.
| await ctx.report_progress( | |
| progress=30 + int((idx / len(profile_urls)) * 60), | |
| total=100, | |
| message=f"Checking profile {idx + 1}/{len(profile_urls[:max_results * 3])}: {username}" | |
| profile_result = await extractor.scrape_person( | |
| username, requested={"experience"} | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 218-222
Comment:
**Wrong keyword argument name causes `TypeError` at runtime**
`scrape_person` is defined with the parameter name `requested` (see `linkedin_mcp_server/scraping/extractor.py` line 254):
```python
async def scrape_person(self, username: str, requested: set[str]) -> dict[str, Any]:
```
Calling it with the keyword argument `requested_sections` will raise a `TypeError: scrape_person() got an unexpected keyword argument 'requested_sections'` at runtime, causing every profile check to fail.
```suggestion
profile_result = await extractor.scrape_person(
username, requested={"experience"}
)
```
How can I resolve this? If you propose a fix, please make it concise.| ) | ||
|
|
||
| # Extract profile URLs from search results |
There was a problem hiding this comment.
URL extraction from innerText will always return an empty list
extractor.search_people() calls extract_page(), which returns main.innerText — plain text with no HTML markup. LinkedIn profile URLs (e.g. https://www.linkedin.com/in/username) are rendered as hyperlinks in the DOM, not printed as visible text. They will never appear in the innerText string, so _extract_profile_urls will always return [], meaning the second-step filtering never runs and the function always returns zero matches.
To reliably extract profile URLs, the extractor would need to read href attributes directly from the DOM (similar to how _extract_job_ids does it via page.evaluate) rather than parsing plain text.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 196-198
Comment:
**URL extraction from `innerText` will always return an empty list**
`extractor.search_people()` calls `extract_page()`, which returns `main.innerText` — plain text with no HTML markup. LinkedIn profile URLs (e.g. `https://www.linkedin.com/in/username`) are rendered as hyperlinks in the DOM, not printed as visible text. They will never appear in the `innerText` string, so `_extract_profile_urls` will always return `[]`, meaning the second-step filtering never runs and the function always returns zero matches.
To reliably extract profile URLs, the extractor would need to read `href` attributes directly from the DOM (similar to how `_extract_job_ids` does it via `page.evaluate`) rather than parsing plain text.
How can I resolve this? If you propose a fix, please make it concise.| def _extract_profile_urls(search_text: str) -> list[str]: | ||
| """Extract LinkedIn profile URLs from search results text.""" | ||
| import re |
There was a problem hiding this comment.
profile_result.get("username") always returns None
scrape_person returns {"url": ..., "sections": ...} — there is no "username" key in its return dict. This means every profile in matching_profiles and partial_matches will have "username": None, making it impossible for callers to look up or identify the matching profiles.
| def _extract_profile_urls(search_text: str) -> list[str]: | |
| """Extract LinkedIn profile URLs from search results text.""" | |
| import re | |
| "username": url.split("/in/")[-1].rstrip("/") if url else None, |
Or more cleanly, pass the username variable (already extracted on line 210) into _parse_profile_for_filters.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 270-272
Comment:
**`profile_result.get("username")` always returns `None`**
`scrape_person` returns `{"url": ..., "sections": ...}` — there is no `"username"` key in its return dict. This means every profile in `matching_profiles` and `partial_matches` will have `"username": None`, making it impossible for callers to look up or identify the matching profiles.
```suggestion
"username": url.split("/in/")[-1].rstrip("/") if url else None,
```
Or more cleanly, pass the `username` variable (already extracted on line 210) into `_parse_profile_for_filters`.
How can I resolve this? If you propose a fix, please make it concise.| ) -> dict[str, Any]: | ||
| """Parse profile result and check if it matches filters.""" | ||
| sections = profile_result.get("sections", {}) | ||
| experience_text = sections.get("experience", "") | ||
| main_text = sections.get("main", "") |
There was a problem hiding this comment.
Non-deterministic URL ordering from set() deduplication
_extract_profile_urls returns [f"https://linkedin.com/in/{username}" for username in set(matches)]. The set conversion removes duplicates but destroys the original ordering from the search results page (where LinkedIn orders results by relevance). Each call may iterate profiles in a different order, producing inconsistent results. Use dict.fromkeys to preserve insertion order while deduplicating:
| ) -> dict[str, Any]: | |
| """Parse profile result and check if it matches filters.""" | |
| sections = profile_result.get("sections", {}) | |
| experience_text = sections.get("experience", "") | |
| main_text = sections.get("main", "") | |
| seen = dict.fromkeys(matches) | |
| return [f"https://linkedin.com/in/{username}" for username in seen] |
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 290-294
Comment:
**Non-deterministic URL ordering from `set()` deduplication**
`_extract_profile_urls` returns `[f"https://linkedin.com/in/{username}" for username in set(matches)]`. The `set` conversion removes duplicates but destroys the original ordering from the search results page (where LinkedIn orders results by relevance). Each call may iterate profiles in a different order, producing inconsistent results. Use `dict.fromkeys` to preserve insertion order while deduplicating:
```suggestion
seen = dict.fromkeys(matches)
return [f"https://linkedin.com/in/{username}" for username in seen]
```
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| except Exception as e: | ||
| raise_tool_error(e, "search_people") # NoReturn | ||
|
|
There was a problem hiding this comment.
Non-English inline comment
The comment # 更长超时,因为需要获取多个档案 is in Chinese. The rest of the codebase uses English exclusively for comments and documentation. Please translate this to English to keep the codebase consistent:
| timeout=TOOL_TIMEOUT_SECONDS * 3, # Longer timeout because multiple profiles need to be fetched |
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 124
Comment:
**Non-English inline comment**
The comment `# 更长超时,因为需要获取多个档案` is in Chinese. The rest of the codebase uses English exclusively for comments and documentation. Please translate this to English to keep the codebase consistent:
```suggestion
timeout=TOOL_TIMEOUT_SECONDS * 3, # Longer timeout because multiple profiles need to be fetched
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| profile_result: dict[str, Any], | ||
| past_company_list: list[str], | ||
| current_title: str | None, | ||
| ) -> dict[str, Any]: | ||
| """Parse profile result and check if it matches filters.""" | ||
| sections = profile_result.get("sections", {}) | ||
| experience_text = sections.get("experience", "") | ||
| main_text = sections.get("main", "") | ||
|
|
There was a problem hiding this comment.
import re inside function body
re is imported inside both _extract_profile_urls (line 288) and _extract_username_from_url (line 298). While Python caches module imports, the convention in this codebase (and generally) is to place all imports at the top of the module. Move import re to the module-level imports alongside import asyncio and import logging.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 287-295
Comment:
**`import re` inside function body**
`re` is imported inside both `_extract_profile_urls` (line 288) and `_extract_username_from_url` (line 298). While Python caches module imports, the convention in this codebase (and generally) is to place all imports at the top of the module. Move `import re` to the module-level imports alongside `import asyncio` and `import logging`.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Summary
This PR introduces a new tool
search_people_with_past_companythat enables advanced people search with filtering by past companies and current job titles.New Feature:
search_people_with_past_companyUse Cases
Parameters
keywords(required): Search keywords (e.g., "founder", "CEO")location(optional): Location filter (e.g., "Beijing", "San Francisco")past_companies(optional): Comma-separated company names (e.g., "Alibaba,ByteDance,Tencent")current_title(optional): Current job title filter (e.g., "founder", "CEO")max_results(optional): Maximum results (default: 10)Example
mcporter call linkedin.search_people_with_past_company \ keywords="founder" \ location="Beijing" \ past_companies="Alibaba,ByteDance" \ current_title="founder"Implementation
Changes
search_people_with_past_company()toolasyncioimportTesting
Related
Useful for talent acquisition, investment research, and competitive intelligence.
Greptile Summary
This PR introduces a
search_people_with_past_companytool that performs a two-step search: first fetching LinkedIn people search results, then iterating each profile to filter by past company and current title. Unfortunately, the implementation has several blocking issues that prevent it from functioning at all.Key issues found:
tools/person.py(repository root) instead oflinkedin_mcp_server/tools/person.py(the actual module). The server only imports fromlinkedin_mcp_server.tools.person, so the new tool is never registered.extractor.scrape_person(username, requested_sections={"experience"})uses a non-existent parameter name — the actual parameter isrequested. This raises aTypeErroron every profile fetch._extract_profile_urlssearches for fullhttps://linkedin.com/in/...URLs insideinnerText, butextract_pagereturns plain text (no HTML). Profile URLs are only inhrefattributes and are never printed as visible text, so this function always returns an empty list.usernamefield alwaysNone:scrape_personreturns{"url": ..., "sections": ...}— no"username"key — so every matched profile'susernamefield will beNone.set()in_extract_profile_urlsloses LinkedIn's relevance-ranked ordering.import restyle issues.Confidence Score: 1/5
tools/person.py— all changes are in this single file, which needs to be moved tolinkedin_mcp_server/tools/person.pyand the logic bugs fixed before any of the new functionality can work.Important Files Changed
tools/instead oflinkedin_mcp_server/tools/), making the new tool completely unreachable. Contains multiple critical bugs: wrong keyword argument name onscrape_person, URL extraction frominnerTextthat will always return empty, andusernamealways beingNonein output.Sequence Diagram
sequenceDiagram participant Client participant MCP as MCP Server participant Tool as search_people_with_past_company participant Extractor as LinkedInExtractor Client->>MCP: call search_people_with_past_company(keywords, location, past_companies, current_title) MCP->>Tool: invoke Tool->>Extractor: search_people(keywords, location) Extractor-->>Tool: {url, sections: {search_results: innerText}} Note over Tool: _extract_profile_urls(innerText)<br/>⚠️ Always returns [] — URLs not in plain text loop For each profile URL (up to max_results × 3) Tool->>Extractor: scrape_person(username, requested_sections={"experience"})<br/>⚠️ TypeError: wrong kwarg name (should be 'requested') Extractor-->>Tool: {url, sections: {experience: text}} Note over Tool: _parse_profile_for_filters()<br/>profile_result.get("username") → None always alt matches_all Tool->>Tool: append to matching_profiles else matches_partial Tool->>Tool: append to partial_matches end Note over Tool: asyncio.sleep(1.5) end Tool-->>Client: {search_url, total_checked, filters, matching_profiles, partial_matches}Last reviewed commit: 0192ab1
(2/5) Greptile learns from your feedback when you react with thumbs up/down!