Skip to content

Commit e695368

Browse files
authored
Merge pull request #146 from jhogstrom/feat/parallel-page-export-upstream
feat: Add parallel page export with ThreadPoolExecutor
2 parents 45bf393 + 9fd5e5d commit e695368

8 files changed

Lines changed: 144 additions & 49 deletions

File tree

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ jobs:
4242
- name: Test CLI commands
4343
run: |
4444
uv run --with dist/*.whl --no-project confluence-markdown-exporter --help
45-
uv run --with dist/*.whl --no-project cf-export --help
45+
uv run --with dist/*.whl --no-project cme --help
4646
4747
- name: Upload build artifacts for inspection
4848
uses: actions/upload-artifact@v7

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ echo 'eval "$(uv generate-shell-completion bash)"' >> ~/.bashrc
6565

6666
```bash
6767
uv run confluence-markdown-exporter --help
68-
uv run cf-export --help
68+
uv run cme --help
6969
```
7070

7171
## Development Workflow
@@ -75,7 +75,7 @@ echo 'eval "$(uv generate-shell-completion bash)"' >> ~/.bashrc
7575
```bash
7676
# Run with uv (recommended)
7777
uv run confluence-markdown-exporter [commands]
78-
uv run cf-export [commands]
78+
uv run cme [commands]
7979

8080
# Or activate the virtual environment
8181
source .venv/bin/activate

README.md

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -56,51 +56,48 @@ pip install confluence-markdown-exporter
5656

5757
Run the exporter with the desired Confluence page ID or space key. Execute the console application by typing `confluence-markdown-exporter` and one of the commands `pages`, `pages-with-descendants`, `spaces`, `all-spaces` or `config`. If a command is unclear, you can always add `--help` to get additional information.
5858

59-
> [!TIP]
60-
> Instead of `confluence-markdown-exporter` you can also use the shorthand `cf-export`.
61-
6259
#### 2.1. Export Page
6360

6461
Export a single Confluence page by ID:
6562

6663
```sh
67-
confluence-markdown-exporter pages <page-id e.g. 645208921> --output-path <output path e.g. ./output_path/>
64+
cme pages <page-id e.g. 645208921> --output-path <output path e.g. ./output_path/>
6865
```
6966

7067
or by URL:
7168

7269
```sh
73-
confluence-markdown-exporter pages <page-url e.g. https://company.atlassian.net/MySpace/My+Page+Title> --output-path <output path e.g. ./output_path/>
70+
cme pages <page-url e.g. https://company.atlassian.net/MySpace/My+Page+Title> --output-path <output path e.g. ./output_path/>
7471
```
7572

7673
#### 2.2. Export Page with Descendants
7774

7875
Export a Confluence page and all its descendant pages by page ID:
7976

8077
```sh
81-
confluence-markdown-exporter pages-with-descendants <page-id e.g. 645208921> --output-path <output path e.g. ./output_path/>
78+
cme pages-with-descendants <page-id e.g. 645208921> --output-path <output path e.g. ./output_path/>
8279
```
8380

8481
or by URL:
8582

8683
```sh
87-
confluence-markdown-exporter pages-with-descendants <page-url e.g. https://company.atlassian.net/MySpace/My+Page+Title> --output-path <output path e.g. ./output_path/>
84+
cme pages-with-descendants <page-url e.g. https://company.atlassian.net/MySpace/My+Page+Title> --output-path <output path e.g. ./output_path/>
8885
```
8986

9087
#### 2.3. Export Space
9188

9289
Export all Confluence pages of a single Space:
9390

9491
```sh
95-
confluence-markdown-exporter spaces <space-key e.g. MYSPACE> --output-path <output path e.g. ./output_path/>
92+
cme spaces <space-key e.g. MYSPACE> --output-path <output path e.g. ./output_path/>
9693
```
9794

9895
#### 2.4. Export all Spaces
9996

10097
Export all Confluence pages across all spaces:
10198

10299
```sh
103-
confluence-markdown-exporter all-spaces --output-path <output path e.g. ./output_path/>
100+
cme all-spaces --output-path <output path e.g. ./output_path/>
104101
```
105102

106103
### 3. Output
@@ -127,7 +124,7 @@ All configuration and authentication is stored in a single JSON file managed by
127124
To interactively view and change configuration, run:
128125

129126
```sh
130-
confluence-markdown-exporter config
127+
cme config
131128
```
132129

133130
This will open a menu where you can:

confluence_markdown_exporter/api_clients.py

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
import logging
22
import os
33
from functools import lru_cache
4-
from typing import Any
54

65
import questionary
76
import requests
@@ -10,6 +9,7 @@
109
from questionary import Style
1110

1211
from confluence_markdown_exporter.utils.app_data_store import ApiDetails
12+
from confluence_markdown_exporter.utils.app_data_store import AtlassianSdkConnectionConfig
1313
from confluence_markdown_exporter.utils.app_data_store import get_settings
1414
from confluence_markdown_exporter.utils.app_data_store import set_setting
1515
from confluence_markdown_exporter.utils.config_interactive import main_config_menu_loop
@@ -49,8 +49,12 @@ def response_hook(
4949
class ApiClientFactory:
5050
"""Factory for creating authenticated Confluence and Jira API clients with retry config."""
5151

52-
def __init__(self, connection_config: dict[str, Any]) -> None:
53-
self.connection_config = connection_config
52+
def __init__(self, connection_config: AtlassianSdkConnectionConfig) -> None:
53+
# Reconstruct as the base SDK type so model_dump() only yields SDK-compatible fields,
54+
# even when a ConnectionConfig subclass is passed.
55+
self.connection_config = AtlassianSdkConnectionConfig.model_validate(
56+
connection_config.model_dump()
57+
)
5458

5559
def create_confluence(self, auth: ApiDetails) -> ConfluenceApiSdk:
5660
try:
@@ -59,7 +63,7 @@ def create_confluence(self, auth: ApiDetails) -> ConfluenceApiSdk:
5963
username=auth.username.get_secret_value() if auth.api_token else None,
6064
password=auth.api_token.get_secret_value() if auth.api_token else None,
6165
token=auth.pat.get_secret_value() if auth.pat else None,
62-
**self.connection_config,
66+
**self.connection_config.model_dump(),
6367
)
6468
instance.get_all_spaces(limit=1)
6569
except Exception as e:
@@ -74,7 +78,7 @@ def create_jira(self, auth: ApiDetails) -> JiraApiSdk:
7478
username=auth.username.get_secret_value() if auth.api_token else None,
7579
password=auth.api_token.get_secret_value() if auth.api_token else None,
7680
token=auth.pat.get_secret_value() if auth.pat else None,
77-
**self.connection_config,
81+
**self.connection_config.model_dump(),
7882
)
7983
instance.get_all_projects()
8084
except Exception as e:
@@ -87,11 +91,12 @@ def get_confluence_instance() -> ConfluenceApiSdk:
8791
"""Get authenticated Confluence API client using current settings."""
8892
settings = get_settings()
8993
auth = settings.auth
90-
connection_config = settings.connection_config.model_dump(exclude={"use_v2_api"})
9194

9295
while True:
9396
try:
94-
confluence = ApiClientFactory(connection_config).create_confluence(auth.confluence)
97+
confluence = ApiClientFactory(settings.connection_config).create_confluence(
98+
auth.confluence
99+
)
95100
break
96101
except ConnectionError as e:
97102
questionary.print(
@@ -113,11 +118,10 @@ def get_jira_instance() -> JiraApiSdk:
113118
"""Get authenticated Jira API client using current settings with required authentication."""
114119
settings = get_settings()
115120
auth = settings.auth
116-
connection_config = settings.connection_config.model_dump(exclude={"use_v2_api"})
117121

118122
while True:
119123
try:
120-
jira = ApiClientFactory(connection_config).create_jira(auth.jira)
124+
jira = ApiClientFactory(settings.connection_config).create_jira(auth.jira)
121125
break
122126
except ConnectionError:
123127
# Ask if user wants to use Confluence credentials for Jira

confluence_markdown_exporter/confluence.py

Lines changed: 85 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,20 @@
1111
import re
1212
import urllib.parse
1313
from collections.abc import Set
14+
from concurrent.futures import ThreadPoolExecutor
15+
from concurrent.futures import as_completed
1416
from os import PathLike
1517
from pathlib import Path
1618
from string import Template
19+
from threading import local
1720
from typing import Literal
1821
from typing import TypeAlias
1922
from typing import cast
2023
from urllib.parse import unquote
2124
from urllib.parse import urlparse
2225

2326
import yaml
27+
from atlassian import Confluence as ConfluenceApiSdk
2428
from atlassian.errors import ApiError
2529
from atlassian.errors import ApiNotFoundError
2630
from bs4 import BeautifulSoup
@@ -56,6 +60,23 @@
5660
settings = get_settings()
5761
confluence = get_confluence_instance()
5862

63+
# Thread-local storage for API client instances (one per worker thread)
64+
_thread_local = local()
65+
66+
67+
def get_thread_confluence() -> ConfluenceApiSdk:
68+
"""Get or create Confluence instance for current thread.
69+
70+
The atlassian-python-api Confluence client uses requests.Session,
71+
which is NOT thread-safe. Each worker thread needs its own instance.
72+
73+
Returns:
74+
Confluence: A thread-local Confluence API client instance.
75+
"""
76+
if not hasattr(_thread_local, "confluence"):
77+
_thread_local.confluence = get_confluence_instance()
78+
return _thread_local.confluence
79+
5980

6081
class JiraIssue(BaseModel):
6182
key: str
@@ -190,6 +211,7 @@ def pages(self) -> list["Page | Descendant"]:
190211
return [homepage, *homepage.descendants]
191212

192213
def export(self) -> None:
214+
"""Export all pages in this space to Markdown."""
193215
export_pages(self.pages)
194216

195217
@classmethod
@@ -1437,9 +1459,41 @@ def sync_removed_pages() -> None:
14371459
LockfileManager.remove_pages(deleted)
14381460

14391461

1462+
def _export_page_worker(page: "Page | Descendant", use_thread_local: bool = False) -> None: # noqa: FBT001, FBT002
1463+
"""Export a single Confluence page to Markdown (worker function).
1464+
1465+
Args:
1466+
page: The page or descendant to export.
1467+
use_thread_local: If True, use thread-local Confluence instance
1468+
(required for parallel export).
1469+
"""
1470+
if use_thread_local:
1471+
# Use thread-local confluence instance for thread safety
1472+
global confluence # noqa: PLW0603
1473+
old_confluence = confluence
1474+
confluence = get_thread_confluence()
1475+
try:
1476+
_page = Page.from_id(page.id)
1477+
_page.export()
1478+
# Record to lockfile if enabled
1479+
LockfileManager.record_page(_page)
1480+
finally:
1481+
confluence = old_confluence
1482+
else:
1483+
# Serial mode - use global confluence instance
1484+
_page = Page.from_id(page.id)
1485+
_page.export()
1486+
# Record to lockfile if enabled
1487+
LockfileManager.record_page(_page)
1488+
1489+
14401490
def export_pages(pages: list["Page | Descendant"]) -> None:
14411491
"""Export a list of Confluence pages to Markdown.
14421492
1493+
Pages are exported in parallel using ThreadPoolExecutor for significant
1494+
performance improvement. Worker count is read from
1495+
settings.connection_config.max_workers (default: 20).
1496+
14431497
Args:
14441498
pages: List of pages to export.
14451499
"""
@@ -1451,9 +1505,34 @@ def export_pages(pages: list["Page | Descendant"]) -> None:
14511505
logger.info("No pages to export based on lockfile state.")
14521506
return
14531507

1454-
for page in (pbar := tqdm(pages_to_export, smoothing=0.05)):
1455-
pbar.set_postfix_str(f"Exporting page {page.id}")
1456-
_page = Page.from_id(page.id)
1457-
_page.export()
1458-
# Record to lockfile if enabled
1459-
LockfileManager.record_page(_page)
1508+
# Get worker count from config
1509+
max_workers = settings.connection_config.max_workers
1510+
1511+
# Serial mode for debugging or single worker
1512+
if DEBUG or max_workers <= 1:
1513+
logger.info("Using serial export mode (max_workers=1)")
1514+
for page in (pbar := tqdm(pages_to_export, smoothing=0.05)):
1515+
pbar.set_postfix_str(f"Exporting page {page.id}")
1516+
_export_page_worker(page, use_thread_local=False)
1517+
return
1518+
1519+
# Parallel mode
1520+
logger.info(f"Using parallel export mode ({max_workers} workers)")
1521+
with ThreadPoolExecutor(max_workers=max_workers) as executor:
1522+
# Submit all export tasks
1523+
futures = {
1524+
executor.submit(_export_page_worker, page, use_thread_local=True): page
1525+
for page in pages_to_export
1526+
}
1527+
1528+
# Track progress with tqdm
1529+
with tqdm(total=len(pages_to_export), smoothing=0.05) as pbar:
1530+
for future in as_completed(futures):
1531+
page = futures[future]
1532+
try:
1533+
future.result() # Raise exception if export failed
1534+
pbar.set_postfix_str(f"Completed page {page.id}")
1535+
except Exception:
1536+
logger.exception(f"Failed to export page {page.id}")
1537+
finally:
1538+
pbar.update(1)

confluence_markdown_exporter/utils/app_data_store.py

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,13 @@ def get_app_config_path() -> Path:
3030
APP_CONFIG_PATH = get_app_config_path()
3131

3232

33-
class ConnectionConfig(BaseModel):
34-
"""Configuration for the connection like retry options."""
33+
class AtlassianSdkConnectionConfig(BaseModel):
34+
"""Connection parameters forwarded directly to the Atlassian SDK client constructors.
35+
36+
Only fields that are valid constructor keyword arguments for
37+
atlassian.Confluence (ConfluenceApiSdk) and atlassian.Jira (JiraApiSdk)
38+
may be added here.
39+
"""
3540

3641
backoff_and_retry: bool = Field(
3742
default=True,
@@ -76,6 +81,11 @@ class ConnectionConfig(BaseModel):
7681
"Timeout in seconds for API requests. Prevents hanging on slow/unresponsive servers."
7782
),
7883
)
84+
85+
86+
class ConnectionConfig(AtlassianSdkConnectionConfig):
87+
"""Full connection configuration, extending the Atlassian SDK config with app-level settings."""
88+
7989
use_v2_api: bool = Field(
8090
default=False,
8191
title="Use Confluence v2 REST API",
@@ -85,6 +95,15 @@ class ConnectionConfig(BaseModel):
8595
"Must be disabled for older self-hosted Confluence Server instances."
8696
),
8797
)
98+
max_workers: int = Field(
99+
default=20,
100+
title="Max Workers",
101+
description=(
102+
"Maximum number of parallel workers for page export. "
103+
"Set to 1 for serial mode (useful for debugging). "
104+
"Higher values improve performance but may hit API rate limits."
105+
),
106+
)
88107

89108

90109
class ApiDetails(BaseModel):

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Tracker = "https://github.com/Spenhouet/confluence-markdown-exporter/issues"
4242

4343
[project.scripts]
4444
confluence-markdown-exporter = "confluence_markdown_exporter.main:app"
45-
cf-export = "confluence_markdown_exporter.main:app"
45+
cme = "confluence_markdown_exporter.main:app"
4646

4747
[tool.hatch.build.targets.wheel]
4848
packages = ["confluence_markdown_exporter"]

0 commit comments

Comments
 (0)