Skip to content

dbpedia/databus-python-client

Repository files navigation

Databus Client Python

Quickstart Example

Commands to download the DBpedia Knowledge Graphs generated by Live Fusion. DBpedia Live Fusion publishes two different kinds of KGs:

  1. Open Core Knowledge Graphs under CC-BY-SA license, open with copyleft/share-alike, no registration needed
  2. Industry Knowledge Graphs under BUSL 1.1 license, unrestricted for research and experimentation, commercial license for productive use, free registration needed.

Registration (Access Token)

  1. If you do not have a DBpedia Account yet (Forum/Databus), please register at https://account.dbpedia.org
  2. Login at https://account.dbpedia.org and create your token.
  3. Save the token to a file vault-token.dat.

Docker vs. Python

The databus-python-client comes as docker or python with these patterns. $DOWNLOADTARGET can be any Databus URI including collections OR SPARQL query (or several thereof). Details are documented below.

# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download $DOWNLOADTARGET --token vault-token.dat
# Python
python3 -m pip install databusclient
databusclient download $DOWNLOADTARGET --token vault-token.dat

Download Live Fusion KG Snapshot (BUSL 1.1, registration needed)

TODO One slogan sentence. More information

docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-kg-snapshot --token vault-token.dat

Download Enriched Knowledge Graphs (BUSL 1.1, registration needed)

DBpedia Wikipedia Extraction Enriched TODO One slogan sentence and link Currently EN DBpedia only.

docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikipedia-kg-enriched-snapshot --token vault-token.dat

DBpedia Wikidata Extraction Enriched TODO One slogan sentence and link

docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikidata-kg-enriched-snapshot --token vault-token.dat

Download DBpedia Wikipedia Knowledge Graphs (CC-BY-SA, no registration needed)

TODO One slogan sentence and link

docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/dbpedia-wikipedia-kg-snapshot 

Download DBpedia Wikidata Knowledge Graphs (CC-BY-SA, no registration needed)

TODO One slogan sentence and link

docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/dbpedia-wikidata-kg-snapshot 

Docker Image Usage

A docker image is available at dbpedia/databus-python-client. See download section for details.

CLI Usage

Installation

python3 -m pip install databusclient

Running

databusclient --help
Usage: databusclient [OPTIONS] COMMAND [ARGS]...

Options:
  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install completion for the specified shell.
  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion for the specified shell, to
                                  copy it or customize the installation.
  --help                          Show this message and exit.

Commands:
  deploy
  download

Download command

databusclient download --help
Usage: databusclient download [OPTIONS] DATABUSURIS...

Arguments:
  DATABUSURIS...  databus uris to download from https://databus.dbpedia.org,
                  or a query statement that returns databus uris from https://databus.dbpedia.org/sparql
                  to be downloaded [required]

  Download datasets from databus, optionally using vault access if vault
  options are provided.

Options:
  --localdir TEXT  Local databus folder (if not given, databus folder
                   structure is created in current working directory)
  --databus TEXT   Databus URL (if not given, inferred from databusuri, e.g.
                   https://databus.dbpedia.org/sparql)
  --token TEXT     Path to Vault refresh token file
  --authurl TEXT   Keycloak token endpoint URL  [default:
                   https://auth.dbpedia.org/realms/dbpedia/protocol/openid-
                   connect/token]
  --clientid TEXT  Client ID for token exchange  [default: vault-token-
                   exchange]
  --help           Show this message and exit.        Show this message and exit.

Examples of using download command

File: download of a single file

databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01/mappingbased-literals_lang=az.ttl.bz2

Version: download of all files of a specific version

databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01

Artifact: download of all files with latest version of an artifact

databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals

Group: download of all files with lates version of all artifacts of a group

databusclient download https://databus.dbpedia.org/dbpedia/mappings

If no --localdir is provided, the current working directory is used as base directory. The downloaded files will be stored in the working directory in a folder structure according to the databus structure, i.e. ./$ACCOUNT/$GROUP/$ARTIFACT/$VERSION/.

Collection: download of all files within a collection

databusclient download https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12

Query: download of all files returned by a query (sparql endpoint must be provided with --databus)

databusclient download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparql

Deploy command

databusclient deploy --help
Usage: databusclient deploy [OPTIONS] [DISTRIBUTIONS]...

 Flexible deploy to databus command:

 - Classic dataset deployment

 - Metadata-based deployment

 - Upload & deploy via Nextcloud

Arguments:
  DISTRIBUTIONS...  Depending on mode:
             - Classic mode: List of distributions in the form
               URL|CV|fileext|compression|sha256sum:contentlength
               (where URL is the download URL and CV the key=value pairs,
               separated by underscores)
             - Upload mode: List of local file or folder paths (must exist)
             - Metdata mode: None
             
Options:
  --version-id TEXT   Target databus version/dataset identifier of the form <h
                      ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
                      RSION>  [required]
  --title TEXT        Dataset title  [required]
  --abstract TEXT     Dataset abstract max 200 chars  [required]
  --description TEXT  Dataset description  [required]
  --license TEXT      License (see dalicc.net)  [required]
  --apikey TEXT       API key  [required]
  --metadata PATH     Path to metadata JSON file (for metadata mode)
  --webdav-url TEXT   WebDAV URL (e.g.,
                      https://cloud.example.com/remote.php/webdav)
  --remote TEXT       rclone remote name (e.g., 'nextcloud')
  --path TEXT         Remote path on Nextcloud (e.g., 'datasets/mydataset')
  --help              Show this message and exit.
  

Examples of using deploy command

Mode 1: Classic Deploy (Distributions)
databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'  
databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'  

A few more notes for CLI usage:

  • The content variants can be left out ONLY IF there is just one distribution
    • For complete inferred: Just use the URL with https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml
    • If other parameters are used, you need to leave them empty like https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml||yml|7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653:367116
Mode 2: Deploy with Metadata File

Use a JSON metadata file to define all distributions. The metadata.json should list all distributions and their metadata. All files referenced there will be registered on the Databus.

databusclient deploy \
  --metadata /home/metadata.json \
  --version-id https://databus.org/user/dataset/version/1.0 \
  --title "Metadata Deploy Example" \
  --abstract "This is a short abstract of the dataset." \
  --description "This dataset was uploaded using metadata.json." \
  --license https://dalicc.net/licenselibrary/Apache-2.0 \
  --apikey "API-KEY"

Metadata file structure (file_format and compression are optional):

[
  {
    "checksum": "0929436d44bba110fc7578c138ed770ae9f548e195d19c2f00d813cca24b9f39",
    "size": 12345,
    "url": "https://cloud.example.com/remote.php/webdav/datasets/mydataset/example.ttl",
    "file_format": "ttl"
  },
  {
    "checksum": "2238acdd7cf6bc8d9c9963a9f6014051c754bf8a04aacc5cb10448e2da72c537",
    "size": 54321,
    "url": "https://cloud.example.com/remote.php/webdav/datasets/mydataset/example.csv.gz",
    "file_format": "csv",
    "compression": "gz"
  }
]
Mode 3: Upload & Deploy via Nextcloud

Upload local files or folders to a WebDAV/Nextcloud instance and automatically deploy to DBpedia Databus. Rclone is required.

databusclient deploy \
  --webdav-url https://cloud.example.com/remote.php/webdav \
  --remote nextcloud \
  --path datasets/mydataset \
  --version-id https://databus.org/user/dataset/version/1.0 \
  --title "Test Dataset" \
  --abstract "Short abstract of dataset" \
  --description "This dataset was uploaded for testing the Nextcloud → Databus pipeline." \
  --license https://dalicc.net/licenselibrary/Apache-2.0 \
  --apikey "API-KEY" \
  ./localfile1.ttl \
  ./data_folder

Authentication with vault

For downloading files from the vault, you need to provide a vault token. See getting-the-access-refresh-token for details. You can come back here once you have a vault-token.dat file. To use it, just provide the path to the file with --token /path/to/vault-token.dat.

Example:

databusclient download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-snapshots/fusion/2025-08-23 --token vault-token.dat

If vault authentication is required for downloading a file, the client will use the token. If no vault authentication is required, the token will not be used.

Usage of docker image

A docker image is available at dbpedia/databus-python-client. You can use it like this:

docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01

If using vault authentication, make sure the token file is available in the container, e.g. by placing it in the current working directory.

docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-snapshots/fusion/2025-08-23/fusion_props=all_subjectns=commons-wikimedia-org_vocab=all.ttl.gz --token vault-token.dat

Module Usage

Step 1: Create lists of distributions for the dataset

from databusclient import create_distribution

# create a list
distributions = []

# minimal requirements
# compression and filetype will be inferred from the path
# this will trigger the download of the file to evaluate the shasum and content length
distributions.append(
    create_distribution(url="https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml", cvs={"type": "swagger"})
)

# full parameters
# will just place parameters correctly, nothing will be downloaded or inferred
distributions.append(
    create_distribution(
        url="https://example.org/some/random/file.csv.bz2", 
        cvs={"type": "example", "realfile": "false"}, 
        file_format="csv", 
        compression="bz2", 
        sha256_length_tuple=("7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653", 367116)
    )
)

A few notes:

  • The dict for content variants can be empty ONLY IF there is just one distribution
  • There can be no compression if there is no file format

Step 2: Create dataset

from databusclient import create_dataset

# minimal way
dataset = create_dataset(
  version_id="https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18",
  title="Client Testing",
  abstract="Testing the client....",
  description="Testing the client....",
  license_url="http://dalicc.net/licenselibrary/AdaptivePublicLicense10",
  distributions=distributions,
)

# with group metadata
dataset = create_dataset(
  version_id="https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18",
  title="Client Testing",
  abstract="Testing the client....",
  description="Testing the client....",
  license_url="http://dalicc.net/licenselibrary/AdaptivePublicLicense10",
  distributions=distributions,
  group_title="Title of group1",
  group_abstract="Abstract of group1",
  group_description="Description of group1"
)

NOTE: To be used you need to set all group parameters, or it will be ignored

Step 3: Deploy to databus

from databusclient import deploy

# to deploy something you just need the dataset from the previous step and an APIO key
# API key can be found (or generated) at https://$$DATABUS_BASE$$/$$USER$$#settings
deploy(dataset, "mysterious api key")

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 6

Languages