Skip to content

Commit 82dc939

Browse files
authored
feat: curl http client selects chrome impersonation by default (#473)
### Description - `CurlImpersonateHttpClient` selects chrome browser type for impersonation by default. - Before: ```json { "url": "https://httpbin.org/get", "response": { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate, br", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "X-Amzn-Trace-Id": "Root=1-66d04355-5df0396b08e2931c125389aa" }, "origin": "78.80.81.196", "url": "https://httpbin.org/get" } } ``` - After: ```json { "url": "https://httpbin.org/get", "response": { "args": {}, "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", "Accept-Encoding": "gzip, deflate, br, zstd", "Accept-Language": "en-US,en;q=0.9", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "Priority": "u=0, i", "Sec-Ch-Ua": "\"Chromium\";v=\"124\", \"Google Chrome\";v=\"124\", \"Not-A.Brand\";v=\"99\"", "Sec-Ch-Ua-Mobile": "?0", "Sec-Ch-Ua-Platform": "\"macOS\"", "Sec-Fetch-Dest": "document", "Sec-Fetch-Mode": "navigate", "Sec-Fetch-Site": "none", "Sec-Fetch-User": "?1", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36", "X-Amzn-Trace-Id": "Root=1-66d04c27-057e04e635efbbde18e0d85b" }, "origin": "78.80.81.196", "url": "https://httpbin.org/get" } } ``` ### Issues - N/A ### Testing - N/A ### Checklist - [x] CI passed
1 parent fbb6084 commit 82dc939

File tree

1 file changed

+17
-5
lines changed

1 file changed

+17
-5
lines changed

src/crawlee/http_clients/curl_impersonate.py

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
try:
66
from curl_cffi.requests import AsyncSession
77
from curl_cffi.requests.errors import RequestsError
8+
from curl_cffi.requests.impersonate import BrowserType
89
except ImportError as exc:
910
raise ImportError(
1011
"To import anything from this subpackage, you need to install the 'curl-impersonate' extra."
@@ -158,13 +159,24 @@ async def send_request(
158159
def _get_client(self, proxy_url: str | None) -> AsyncSession:
159160
"""Helper to get a HTTP client for the given proxy URL.
160161
161-
If the client for the given proxy URL doesn't exist, it will be created and stored.
162+
The method checks if an `AsyncSession` already exists for the provided proxy URL. If no session exists,
163+
it creates a new one, configured with the specified proxy and additional session options. The new session
164+
is then stored for future use.
162165
"""
166+
# Check if a session for the given proxy URL has already been created.
163167
if proxy_url not in self._client_by_proxy_url:
164-
self._client_by_proxy_url[proxy_url] = AsyncSession(
165-
proxy=proxy_url,
166-
**self._async_session_kwargs,
167-
)
168+
# Prepare a default kwargs for the new session. A provided proxy URL and a chrome for impersonation
169+
# are set as default options.
170+
kwargs: dict[str, Any] = {
171+
'proxy': proxy_url,
172+
'impersonate': BrowserType.chrome,
173+
}
174+
175+
# Update the default kwargs with any additional user-provided kwargs.
176+
kwargs.update(self._async_session_kwargs)
177+
178+
# Create and store the new session with the specified kwargs.
179+
self._client_by_proxy_url[proxy_url] = AsyncSession(**kwargs)
168180

169181
return self._client_by_proxy_url[proxy_url]
170182

0 commit comments

Comments
 (0)