yoink

Fast, fault-tolerant headless browser scraping. One browser per worker process, N async pages per browser.

pip install yoink
pipx install yoink

Playwright browsers are required on first run:

playwright install chromium

CLI

# Single URL — HTML to stdout
yoink "https://example.com"

# Text file or JSON array of URLs
yoink urls.txt
yoink urls.json

# Stdin
echo "https://example.com" | yoink -

# Stream JSONL results as each page completes
yoink --stream urls.txt

# Write one HTML file per URL to a directory
yoink --output ./results/ urls.txt

# Write a .tar.gz archive
yoink --tarball results.tar.gz urls.txt

# Control parallelism
yoink --workers 4 --pages 10 urls.txt

yk is an alias for yoink.

Python API

import yoink

# One-liner
html = yoink.get("https://example.com")

# Parallel fetch, results in completion order
results = yoink.get_all(["https://a.com", "https://b.com"], workers=4)
for r in results:
    print(r.url, r.ok, len(r.html))

# Stream results as they complete
for result in yoink.stream(urls, workers=4):
    print(result.url, result.ok)

# Engine for full control
with yoink.Engine() as engine:
    engine.submit("https://a.com")
    engine.submit(yoink.ExtractReq(url="https://b.com", screenshot=True, clean_html=True))
    for result in engine.results():
        print(result.html)
        if result.screenshot:
            open("shot.png", "wb").write(result.screenshot)

`ExtractReq` options

Field	Default	Description
`url`	required	URL to fetch
`wait_for`	`"networkidle"`	`"networkidle"`, `"domcontentloaded"`, or a CSS selector
`timeout`	`30.0`	Seconds before giving up
`retry`	`RetryPolicy()`	Retry behaviour (see below)
`proxy`	`None`	`ProxyConfig(server, username, password)`
`headers`	`{}`	Extra request headers
`actions`	`[]`	List of `Action` steps to run after page load
`screenshot`	`False`	Capture a PNG screenshot (bytes in result)
`clean_html`	`False`	Strip `<script>` and `<style>` tags before returning
`metadata`	`{}`	Pass-through dict — untouched by yoink

`RetryPolicy`

yoink.RetryPolicy(
    max_attempts=3,       # total attempts including the first
    backoff_factor=2.0,   # exponential backoff multiplier
)

`Action` steps

from yoink import Action, ExtractReq

req = ExtractReq(
    url="https://example.com/login",
    actions=[
        Action(type="setValue", selector="#username", value="user@example.com"),
        Action(type="setValue", selector="#password", value="secret"),
        Action(type="click", selector="button[type=submit]"),
        Action(type="wait", selector=".dashboard"),
    ],
)

Supported action types: click, setValue, setValueAndEnter, hover, scroll, wait.

Configuration

Config is loaded in this order (later wins):

Built-in defaults
~/.config/yoink/config.toml
--config path/to/config.toml
YK_ environment variables

TOML example

[workers]
count = 4
pages_per_worker = 10
headless = true

[rate_limit]
default_delay_ms = 0

[rate_limit.per_domain]
"news.ycombinator.com" = 1000   # 1 second between requests to this domain
"reddit.com" = 500

[log]
level = "INFO"
minimal = false

Environment variables

YK_WORKERS__COUNT=8
YK_WORKERS__PAGES_PER_WORKER=10
YK_LOG__LEVEL=DEBUG
YK_LOG__MINIMAL=true

Double underscore (__) separates section from field.

Development

# Install dev dependencies
uv sync

# Run tests
pytest

# Run integration tests only (requires Playwright browsers)
pytest tests/integration/

# Lint + format
ruff check .
ruff format .

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
docs		docs
src/yoink		src/yoink
tests		tests
.gitignore		.gitignore
.python-version		.python-version
FUTURE_2.md		FUTURE_2.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

yoink

CLI

Python API

`ExtractReq` options

`RetryPolicy`

`Action` steps

Configuration

TOML example

Environment variables

Development

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

yoink

CLI

Python API

ExtractReq options

RetryPolicy

Action steps

Configuration

TOML example

Environment variables

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`ExtractReq` options

`RetryPolicy`

`Action` steps

Packages