Analytics on FHIR 2025 Conference Presentation
This repository contains materials for the session demonstrating Helios Software, an open-source Rust implementation of SQL on FHIR. The presentation showcases real-world laboratory analytics using three deployment patterns: batch processing (CLI), microservices (HTTP server), and data science workflows (Python bindings).
Helios Software is an open source Rust implementation of SQL on FHIR that includes a simple CLI for batch transformations, a HTTP server ideal for microservices, and Python bindings for using SQL on FHIR directly in your data science and analytics projects.
In this session, Steve will demonstrate a real-world laboratory analytics challenge across different workflows, illustrating concrete patterns for integrating SQL on FHIR into pipelines for batch, microservice, and data science workloads.
The presentation for this session can be found here.
Tests Pending at Discharge represents a significant quality and safety concern in healthcare. When laboratory tests are ordered during an inpatient stay but results are not available at discharge, several risks emerge:
- Patient Safety: Critical abnormal results may go unnoticed
- Readmissions: Unresolved issues can lead to hospital readmissions
- Continuity of Care: Outpatient providers may lack essential diagnostic information
- Resource Utilization: Follow-up testing may be duplicated unnecessarily
This demonstration uses SQL on FHIR ViewDefinitions to identify and analyze laboratory results that were issued after patient discharge, providing actionable intelligence for quality improvement initiatives.
We'll use two complementary ViewDefinitions:
- LabObservationView: Extracts laboratory observations with timing and encounter references
- EncounterView: Extracts encounter timing information for discharge analysis
By joining these views, we can identify labs that were issued after discharge time - representing tests that were pending when the patient left.
- Java 11+ (for Synthea)
- Python 3.8+ (for pysof)
- curl (for downloading binaries)
macOS:
brew install openjdk@11Linux (Ubuntu/Debian):
sudo apt-get update
sudo apt-get install openjdk-11-jdkWindows: Download and install from Adoptium or use:
winget install EclipseAdoptium.Temurin.11.JDK# Clone Synthea
git clone https://github.com/synthetichealth/synthea.git
cd syntheaCopy the tpd.json module from this repository into Synthea's modules directory:
macOS/Linux:
# From the synthea directory
cp ../tpd.json src/main/resources/modules/Windows:
copy ..\tpd.json src\main\resources\modules\Generate 10000 patients with the TPD module:
macOS/Linux:
./run_synthea -m tpd -p 10000 --exporter.fhir.bulk_data=trueWindows:
.\run_synthea.bat -m tpd -p 10000 --exporter.fhir.bulk_data=trueThis creates FHIR R4 resources in NDJSON format in output/fhir/ with:
- Encounter resources (visits with varied lengths of stay)
- Observation resources (laboratory results with timing information)
The NDJSON format provides one FHIR resource per line, ideal for streaming and bulk processing.
Change directory back to the root of the project.
cd ..Download and extract the appropriate binaries for your platform from the latest hfs release:
# Download and extract the tar.gz archive
curl -L https://github.com/HeliosSoftware/hfs/releases/download/v0.1.32/hfs-0.1.32-aarch64-apple-darwin.tar.gz -o hfs.tar.gz
tar -xzf hfs.tar.gz
# Make binaries executable
chmod +x sof-cli sof-server
# Verify installation
./sof-cli --help
./sof-server --help# Download and extract the tar.gz archive
curl -L https://github.com/HeliosSoftware/hfs/releases/download/v0.1.32/hfs-0.1.32-x86_64-unknown-linux-gnu.tar.gz -o hfs.tar.gz
tar -xzf hfs.tar.gz
# Make binaries executable
chmod +x sof-cli sof-server
# Verify installation
./sof-cli --help
./sof-server --help# Download and extract the zip archive
Invoke-WebRequest -Uri "https://github.com/HeliosSoftware/hfs/releases/download/v0.1.32/hfs-0.1.32-x86_64-pc-windows-msvc.zip" -OutFile "hfs.zip"
Expand-Archive -Path "hfs.zip" -DestinationPath "."
# Verify installation
.\sof-cli.exe --help
.\sof-server.exe --helpThe sof-cli tool is ideal for batch transformations, ETL pipelines, and scheduled analytics jobs.
This ViewDefinition extracts laboratory observations with timing information:
macOS/Linux:
./sof-cli \
--view LabObservationView.json \
--source ./synthea/output/fhir/Observation.ndjson \
--output lab_observations.csv \
--format csvWindows:
.\sof-cli.exe `
--view LabObservationView.json `
--source .\synthea\output\fhir\Observation.ndjson `
--output lab_observations.csv `
--format csvOutput columns:
observation_id: Unique Observation identifierlab_code: LOINC code for the lab testlab_display: Display name of the lab teststatus: Status of the observationeffective_time: When the observation was madeissued_time: When the result was issued/availableencounter_id: Reference to the associated encounterpatient_id: Reference to the patient
This ViewDefinition extracts encounter timing for discharge analysis:
macOS/Linux:
./sof-cli \
--view EncounterView.json \
--source ./synthea/output/fhir/Encounter.ndjson \
--output encounters.csv \
--format csvWindows:
.\sof-cli.exe `
--view EncounterView.json `
--source .\synthea\output\fhir\Encounter.ndjson `
--output encounters.csv `
--format csvOutput columns:
encounter_id: Unique encounter identifierencounter_class: Class of encounter (AMB, IMP, etc.)encounter_type: Type of encounterstart_time: When the encounter startedend_time: Discharge timestamppatient_id: Reference to the patient
macOS/Linux:
# View first 10 rows
head -10 lab_observations.csv
head -10 encounters.csv
# Count total lab observations
wc -l lab_observations.csv
# Count encounters
wc -l encounters.csvWindows (PowerShell):
# View first 10 rows
Get-Content lab_observations.csv | Select-Object -First 10
Get-Content encounters.csv | Select-Object -First 10
# Count total lab observations
(Get-Content lab_observations.csv | Measure-Object -Line).Lines
# Count encounters
(Get-Content encounters.csv | Measure-Object -Line).LinesThe sof-server provides a REST API for on-demand ViewDefinition execution, ideal for integrating SQL on FHIR into microservice architectures.
macOS/Linux:
./sof-server --port 8080 &Windows (PowerShell, run in separate window):
.\sof-server.exe --port 8080The server is stateless - each request must include both the ViewDefinition and the FHIR data source. Note - the source parameter will need an absolute path in this example.
Execute LabObservationView:
macOS/Linux:
curl -X POST "http://localhost:8080/ViewDefinition/\$viewdefinition-run?source=file://\$(pwd)/synthea/output/fhir/Observation.ndjson" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d @LabObservationViewAsParameters.json \
| jq '.'Windows (PowerShell):
Invoke-RestMethod -Method POST -Uri "http://localhost:8080/ViewDefinition/`$viewdefinition-run?source=file:///$((Get-Location).Path -replace '\\','/')/synthea/output/fhir/Observation.ndjson" `
-ContentType "application/json" `
-Headers @{Accept="application/json"} `
-InFile "LabObservationViewAsParameters.json" | ConvertTo-JsonThe Python bindings enable SQL on FHIR integration into Jupyter notebooks, data pipelines, and ML workflows.
It is recommended to use a virtual environment to avoid conflicts with system packages.
macOS/Linux:
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install pandas plotly pysofWindows:
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install --upgrade pip
pip install pandas plotly pysofThe repository includes analyze_tpd.py, a Python script that demonstrates the Tests Pending at Discharge analysis. This script:
- Loads the CSV files generated by sof-cli
- Joins lab observations with encounters on encounter references
- Identifies labs where
issued_time > end_time(issued after discharge) - Calculates summary statistics (pending rate, distribution by lab code)
- Generates an interactive Plotly visualizations
- Exports detailed analysis files for further investigation
First, generate the CSV files using sof-cli (if not already done):
macOS/Linux:
./sof-cli --view LabObservationView.json --source ./synthea/output/fhir/Observation.ndjson --output lab_observations.csv --format csv
./sof-cli --view EncounterView.json --source ./synthea/output/fhir/Encounter.ndjson --output encounters.csv --format csvWindows:
.\sof-cli.exe --view LabObservationView.json --source .\synthea\output\fhir\Observation.ndjson --output lab_observations.csv --format csv
.\sof-cli.exe --view EncounterView.json --source .\synthea\output\fhir\Encounter.ndjson --output encounters.csv --format csvThen run the analysis:
macOS/Linux:
python3 analyze_tpd.pyWindows:
python analyze_tpd.pyThe script generates:
- HTML visualization (open in any web browser)
- 2 CSV files with detailed results for further analysis
Open the visualizations:
macOS:
open tests_pending_by_day.htmlLinux:
xdg-open tests_pending_by_day.htmlWindows:
Start-Process tests_pending_by_day.htmlThis ViewDefinition extracts laboratory observations with their timing and encounter information.
Key Logic:
{
"where": [
{
"path": "category.coding.where(system='http://terminology.hl7.org/CodeSystem/observation-category' and code='laboratory').exists()",
"description": "Filter for Laboratory observations only."
},
{
"path": "encounter.exists()",
"description": "Ensure the observation has an associated encounter."
}
]
}FHIRPath Techniques:
.where()- Filters collections based on conditions.exists()- Checks for presence of values- Compound conditions with
and
This ViewDefinition extracts encounter timing for discharge analysis.
Key Logic:
{
"where": [
{
"path": "period.end.exists()",
"description": "Only include encounters that have ended (patient discharged)."
}
]
}Key Fields:
period.start- When the encounter beganperiod.end- When the patient was dischargedclass.code- Type of encounter (AMB, IMP, etc.)
The analysis script demonstrates a common pattern for SQL on FHIR analytics:
- Extract data from different resource types using ViewDefinitions
- Transform timestamps and references
- Join the datasets on common keys (encounter references)
- Analyze the combined data to identify patterns
This pattern enables complex analytics that span multiple FHIR resource types while keeping each ViewDefinition simple and focused.
From a dataset of 10,000 synthetic patients, you should expect:
- Lab observations with encounter references
- Encounters with timing information
- Labs issued after discharge (based on timing comparison)
Labs issued after discharge suggest:
- Need for improved discharge coordination
- Potential for clinical decision support at discharge
- Opportunity for automated follow-up workflows
- Risk stratification for post-discharge monitoring
analytics-on-fhir-2025/
├── README.md # This file
├── LabObservationView.json # ViewDefinition for lab observations
├── EncounterView.json # ViewDefinition for encounter timing
├── tpd.json # Synthea module for test data generation
├── analyze_tpd.py # Python analysis script
├── synthea/output/fhir/ # Generated FHIR data (created in Step 1)
├── *.csv # Analysis results (created in Steps 3 & 5)
└── viz_*.html # Interactive visualizations (created in Step 5)
For questions about:
- The instructions in this README: Open an issue at analytics-on-fhir-2025 GitHub
- Helios Software: Open an issue at hfs GitHub
- SQL on FHIR: Visit the FHIR Chat #sql-on-fhir channel
The materials in this repository are provided for educational purposes as part of the Analytics on FHIR 2025 conference under the open source MIT license.
Analytics on FHIR 2025 | Presented by Steve Munini | Helios Software