Skip to content

Add search_people_with_past_company tool for advanced people filtering#205

Open
guykwan wants to merge 2 commits intostickerdaniel:mainfrom
guykwan:main
Open

Add search_people_with_past_company tool for advanced people filtering#205
guykwan wants to merge 2 commits intostickerdaniel:mainfrom
guykwan:main

Conversation

@guykwan
Copy link

@guykwan guykwan commented Mar 6, 2026

Summary

This PR introduces a new tool search_people_with_past_company that enables advanced people search with filtering by past companies and current job titles.

New Feature: search_people_with_past_company

Use Cases

  • Find founders who previously worked at major tech companies
  • Identify executives with experience at specific companies
  • Build talent pools based on company background

Parameters

  • keywords (required): Search keywords (e.g., "founder", "CEO")
  • location (optional): Location filter (e.g., "Beijing", "San Francisco")
  • past_companies (optional): Comma-separated company names (e.g., "Alibaba,ByteDance,Tencent")
  • current_title (optional): Current job title filter (e.g., "founder", "CEO")
  • max_results (optional): Maximum results (default: 10)

Example

mcporter call linkedin.search_people_with_past_company \
    keywords="founder" \
    location="Beijing" \
    past_companies="Alibaba,ByteDance" \
    current_title="founder"

Implementation

  • Two-step search: basic search + detailed profile filtering
  • Rate limiting protection (1.5s delay between profiles)
  • Progress reporting during search
  • Returns both full and partial matches

Changes

  • Added search_people_with_past_company() tool
  • Added 5 helper functions for URL extraction and profile parsing
  • Added asyncio import

Testing

  • ✅ Code syntax validated
  • ✅ No breaking changes
  • ✅ Follows existing patterns

Related

Useful for talent acquisition, investment research, and competitive intelligence.

Greptile Summary

This PR introduces a search_people_with_past_company tool that performs a two-step search: first fetching LinkedIn people search results, then iterating each profile to filter by past company and current title. Unfortunately, the implementation has several blocking issues that prevent it from functioning at all.

Key issues found:

  • Wrong file location: The new file is created at tools/person.py (repository root) instead of linkedin_mcp_server/tools/person.py (the actual module). The server only imports from linkedin_mcp_server.tools.person, so the new tool is never registered.
  • Wrong keyword argument: extractor.scrape_person(username, requested_sections={"experience"}) uses a non-existent parameter name — the actual parameter is requested. This raises a TypeError on every profile fetch.
  • URL extraction is fundamentally broken: _extract_profile_urls searches for full https://linkedin.com/in/... URLs inside innerText, but extract_page returns plain text (no HTML). Profile URLs are only in href attributes and are never printed as visible text, so this function always returns an empty list.
  • username field always None: scrape_person returns {"url": ..., "sections": ...} — no "username" key — so every matched profile's username field will be None.
  • Non-deterministic profile ordering: Use of set() in _extract_profile_urls loses LinkedIn's relevance-ranked ordering.
  • Non-English comment and module-level import re style issues.

Confidence Score: 1/5

  • Not safe to merge — the new tool is unreachable due to wrong file placement and contains multiple critical runtime errors.
  • Three independent blocking defects (wrong module path, wrong keyword argument, broken URL extraction from plain text) each individually prevent the feature from functioning. The tool is effectively dead code in its current form.
  • tools/person.py — all changes are in this single file, which needs to be moved to linkedin_mcp_server/tools/person.py and the logic bugs fixed before any of the new functionality can work.

Important Files Changed

Filename Overview
tools/person.py New file added at wrong path (root-level tools/ instead of linkedin_mcp_server/tools/), making the new tool completely unreachable. Contains multiple critical bugs: wrong keyword argument name on scrape_person, URL extraction from innerText that will always return empty, and username always being None in output.

Sequence Diagram

sequenceDiagram
    participant Client
    participant MCP as MCP Server
    participant Tool as search_people_with_past_company
    participant Extractor as LinkedInExtractor

    Client->>MCP: call search_people_with_past_company(keywords, location, past_companies, current_title)
    MCP->>Tool: invoke
    Tool->>Extractor: search_people(keywords, location)
    Extractor-->>Tool: {url, sections: {search_results: innerText}}
    Note over Tool: _extract_profile_urls(innerText)<br/>⚠️ Always returns [] — URLs not in plain text
    loop For each profile URL (up to max_results × 3)
        Tool->>Extractor: scrape_person(username, requested_sections={"experience"})<br/>⚠️ TypeError: wrong kwarg name (should be 'requested')
        Extractor-->>Tool: {url, sections: {experience: text}}
        Note over Tool: _parse_profile_for_filters()<br/>profile_result.get("username") → None always
        alt matches_all
            Tool->>Tool: append to matching_profiles
        else matches_partial
            Tool->>Tool: append to partial_matches
        end
        Note over Tool: asyncio.sleep(1.5)
    end
    Tool-->>Client: {search_url, total_checked, filters, matching_profiles, partial_matches}
Loading

Last reviewed commit: 0192ab1

Greptile also left 7 inline comments on this PR.

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

@@ -0,0 +1,347 @@
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File placed in wrong directory — tool never registered

This file is added to tools/person.py at the repository root, but the MCP server imports from linkedin_mcp_server.tools.person (see linkedin_mcp_server/server.py line 20):

from linkedin_mcp_server.tools.person import register_person_tools

The actual module that is loaded and registered is at linkedin_mcp_server/tools/person.py. This new file at tools/person.py is never imported by anything, so search_people_with_past_company will never be registered as an MCP tool and is completely dead code. The new tool and helper functions need to be added to linkedin_mcp_server/tools/person.py instead.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 1

Comment:
**File placed in wrong directory — tool never registered**

This file is added to `tools/person.py` at the repository root, but the MCP server imports from `linkedin_mcp_server.tools.person` (see `linkedin_mcp_server/server.py` line 20):

```python
from linkedin_mcp_server.tools.person import register_person_tools
```

The actual module that is loaded and registered is at `linkedin_mcp_server/tools/person.py`. This new file at `tools/person.py` is never imported by anything, so `search_people_with_past_company` will never be registered as an MCP tool and is completely dead code. The new tool and helper functions need to be added to `linkedin_mcp_server/tools/person.py` instead.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +218 to +222

await ctx.report_progress(
progress=30 + int((idx / len(profile_urls)) * 60),
total=100,
message=f"Checking profile {idx + 1}/{len(profile_urls[:max_results * 3])}: {username}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong keyword argument name causes TypeError at runtime

scrape_person is defined with the parameter name requested (see linkedin_mcp_server/scraping/extractor.py line 254):

async def scrape_person(self, username: str, requested: set[str]) -> dict[str, Any]:

Calling it with the keyword argument requested_sections will raise a TypeError: scrape_person() got an unexpected keyword argument 'requested_sections' at runtime, causing every profile check to fail.

Suggested change
await ctx.report_progress(
progress=30 + int((idx / len(profile_urls)) * 60),
total=100,
message=f"Checking profile {idx + 1}/{len(profile_urls[:max_results * 3])}: {username}"
profile_result = await extractor.scrape_person(
username, requested={"experience"}
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 218-222

Comment:
**Wrong keyword argument name causes `TypeError` at runtime**

`scrape_person` is defined with the parameter name `requested` (see `linkedin_mcp_server/scraping/extractor.py` line 254):

```python
async def scrape_person(self, username: str, requested: set[str]) -> dict[str, Any]:
```

Calling it with the keyword argument `requested_sections` will raise a `TypeError: scrape_person() got an unexpected keyword argument 'requested_sections'` at runtime, causing every profile check to fail.

```suggestion
                    profile_result = await extractor.scrape_person(
                        username, requested={"experience"}
                    )
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +196 to +198
)

# Extract profile URLs from search results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL extraction from innerText will always return an empty list

extractor.search_people() calls extract_page(), which returns main.innerText — plain text with no HTML markup. LinkedIn profile URLs (e.g. https://www.linkedin.com/in/username) are rendered as hyperlinks in the DOM, not printed as visible text. They will never appear in the innerText string, so _extract_profile_urls will always return [], meaning the second-step filtering never runs and the function always returns zero matches.

To reliably extract profile URLs, the extractor would need to read href attributes directly from the DOM (similar to how _extract_job_ids does it via page.evaluate) rather than parsing plain text.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 196-198

Comment:
**URL extraction from `innerText` will always return an empty list**

`extractor.search_people()` calls `extract_page()`, which returns `main.innerText` — plain text with no HTML markup. LinkedIn profile URLs (e.g. `https://www.linkedin.com/in/username`) are rendered as hyperlinks in the DOM, not printed as visible text. They will never appear in the `innerText` string, so `_extract_profile_urls` will always return `[]`, meaning the second-step filtering never runs and the function always returns zero matches.

To reliably extract profile URLs, the extractor would need to read `href` attributes directly from the DOM (similar to how `_extract_job_ids` does it via `page.evaluate`) rather than parsing plain text.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +270 to +272
def _extract_profile_urls(search_text: str) -> list[str]:
"""Extract LinkedIn profile URLs from search results text."""
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

profile_result.get("username") always returns None

scrape_person returns {"url": ..., "sections": ...} — there is no "username" key in its return dict. This means every profile in matching_profiles and partial_matches will have "username": None, making it impossible for callers to look up or identify the matching profiles.

Suggested change
def _extract_profile_urls(search_text: str) -> list[str]:
"""Extract LinkedIn profile URLs from search results text."""
import re
"username": url.split("/in/")[-1].rstrip("/") if url else None,

Or more cleanly, pass the username variable (already extracted on line 210) into _parse_profile_for_filters.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 270-272

Comment:
**`profile_result.get("username")` always returns `None`**

`scrape_person` returns `{"url": ..., "sections": ...}` — there is no `"username"` key in its return dict. This means every profile in `matching_profiles` and `partial_matches` will have `"username": None`, making it impossible for callers to look up or identify the matching profiles.

```suggestion
        "username": url.split("/in/")[-1].rstrip("/") if url else None,
```

Or more cleanly, pass the `username` variable (already extracted on line 210) into `_parse_profile_for_filters`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +290 to +294
) -> dict[str, Any]:
"""Parse profile result and check if it matches filters."""
sections = profile_result.get("sections", {})
experience_text = sections.get("experience", "")
main_text = sections.get("main", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-deterministic URL ordering from set() deduplication

_extract_profile_urls returns [f"https://linkedin.com/in/{username}" for username in set(matches)]. The set conversion removes duplicates but destroys the original ordering from the search results page (where LinkedIn orders results by relevance). Each call may iterate profiles in a different order, producing inconsistent results. Use dict.fromkeys to preserve insertion order while deduplicating:

Suggested change
) -> dict[str, Any]:
"""Parse profile result and check if it matches filters."""
sections = profile_result.get("sections", {})
experience_text = sections.get("experience", "")
main_text = sections.get("main", "")
seen = dict.fromkeys(matches)
return [f"https://linkedin.com/in/{username}" for username in seen]
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 290-294

Comment:
**Non-deterministic URL ordering from `set()` deduplication**

`_extract_profile_urls` returns `[f"https://linkedin.com/in/{username}" for username in set(matches)]`. The `set` conversion removes duplicates but destroys the original ordering from the search results page (where LinkedIn orders results by relevance). Each call may iterate profiles in a different order, producing inconsistent results. Use `dict.fromkeys` to preserve insertion order while deduplicating:

```suggestion
    seen = dict.fromkeys(matches)
    return [f"https://linkedin.com/in/{username}" for username in seen]
```

How can I resolve this? If you propose a fix, please make it concise.


except Exception as e:
raise_tool_error(e, "search_people") # NoReturn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-English inline comment

The comment # 更长超时,因为需要获取多个档案 is in Chinese. The rest of the codebase uses English exclusively for comments and documentation. Please translate this to English to keep the codebase consistent:

Suggested change
timeout=TOOL_TIMEOUT_SECONDS * 3, # Longer timeout because multiple profiles need to be fetched
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 124

Comment:
**Non-English inline comment**

The comment `# 更长超时,因为需要获取多个档案` is in Chinese. The rest of the codebase uses English exclusively for comments and documentation. Please translate this to English to keep the codebase consistent:

```suggestion
        timeout=TOOL_TIMEOUT_SECONDS * 3,  # Longer timeout because multiple profiles need to be fetched
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +287 to +295
profile_result: dict[str, Any],
past_company_list: list[str],
current_title: str | None,
) -> dict[str, Any]:
"""Parse profile result and check if it matches filters."""
sections = profile_result.get("sections", {})
experience_text = sections.get("experience", "")
main_text = sections.get("main", "")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import re inside function body

re is imported inside both _extract_profile_urls (line 288) and _extract_username_from_url (line 298). While Python caches module imports, the convention in this codebase (and generally) is to place all imports at the top of the module. Move import re to the module-level imports alongside import asyncio and import logging.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/person.py
Line: 287-295

Comment:
**`import re` inside function body**

`re` is imported inside both `_extract_profile_urls` (line 288) and `_extract_username_from_url` (line 298). While Python caches module imports, the convention in this codebase (and generally) is to place all imports at the top of the module. Move `import re` to the module-level imports alongside `import asyncio` and `import logging`.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant