Skip to content

Commit 43c9edb

Browse files
committed
add README
1 parent 9e23b6e commit 43c9edb

File tree

1 file changed

+52
-0
lines changed

1 file changed

+52
-0
lines changed
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Confluence Data Retrieval Example
2+
3+
In this data job example, we demonstrate how to efficiently retrieve and manage data from a Confluence space using the Confluence Data Retrieval Class. This class is a versatile utility that simplifies the process of fetching, tracking updates, and flagging deleted pages in a Confluence space. The resulting data is stored in a JSON file for further analysis.
4+
5+
## Confluence Data Retrieval Class
6+
7+
The `ConfluenceDataSource` class is the heart of this data job. It provides a set of methods for interacting with Confluence data:
8+
9+
- `fetch_updated_pages_in_confluence_space()`: Fetches updated pages in the Confluence space based on the last modification date.
10+
- `fetch_all_pages_in_confluence_space()`: Retrieves all pages in the Confluence space.
11+
- `fetch_updated_documents_by_parent_id(parent_page_id)`: Recursively fetches updated documents based on a parent page ID, ensuring that nested pages are also captured.
12+
- `flag_deleted_pages()`: Flags deleted pages based on the current Confluence data.
13+
- `update_saved_documents()`: Updates the saved documents in the JSON file with the latest data.
14+
15+
These methods make use of the last_modification.txt file to determine the last modification date and track changes in the Confluence space, allowing for efficient data retrieval and management.
16+
17+
## JSON Data Format
18+
19+
The resulting JSON data (confluence_data.json) is generated using the `ConfluenceDocument` class (see confluence_document.py).
20+
It follows this structured format:
21+
22+
```json
23+
[
24+
{
25+
"metadata": {
26+
"title": "Page Title",
27+
"id": "Page ID",
28+
"source": "Source URL"
29+
},
30+
"page_content": "Page Content Text",
31+
"deleted": false
32+
},
33+
{
34+
"metadata": {
35+
"title": "Another Page Title",
36+
"id": "Another Page ID",
37+
"source": "Another Source URL"
38+
},
39+
"page_content": "Another Page Content Text",
40+
"deleted": false
41+
},
42+
{
43+
"metadata": {
44+
"title": "Yet Another Page Title",
45+
"id": "Yet Another Page ID",
46+
"source": "Yet Another Source URL"
47+
},
48+
"page_content": "Yet Another Page Content Text",
49+
"deleted": false
50+
}
51+
]
52+
```

0 commit comments

Comments
 (0)