SplineSketch: a new quantile sketch with uniform error guarantees and high accuracy in practice

Here, we provide its prototype implementation in Python and in Java, and an experimental pipeline for evaluating its accuracy on synthetic and real-world datasets and also its update, merge, and query times, in comparison with t-digest, KLL, GKAdaptive, and MomentSketch. Additionally, DDSketch may also be included in the evaluation.

SplineSketch in Java comes in two main versions:

SplineSketch.java -- faster version without frequent items filtering.
SplineSketchMG.java -- with frequent items filtering by the Misra-Gries sketch. Additionally, SplineSketchAdjustable.java is a modification of SplineSketch.java that allows for changing the parameters and components of the sketch, intended for ablation studies.

The plots/ directory contains results of the experiments.

Running experiments

Setup: Clone the repository and then run make to compile the Java wrappers that run the individual skeches.

There are four experimental pipelines, with parameters adjusted in the individual Python source codes:

Accuracy and running time experiments on synthetic datasets: run with python run_experiments_IID.py
Accuracy and running time experiments on real-world datasets: download datasets as described below and then run with python run_experiments_datasets.py (optionally adjust the datasets in load_<dataset>_data functions)
Update time experiment: run with python run_experiments_update_time.py
Query time experiment: run with python run_experiments_query_time.py
Ablation studies (set up inside the code): run with python run_experiments_ablation.py

All of these Python programs produce a set of plots with results into plots/ directory.

Downloading real-world datasets

HEPMASS dataset from UC Irvine ML Repository: download all_train.csv.gz and all_test.csv.gz and decompress both files into datasets/hepmass/
Power dataset from UC Irvine ML Repository: download into datasets/household_power_consumption/household_power_consumption.txt
Books dataset from SOSD (a benchmark for learned indexes): download using download_books_dataset.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
datasets		datasets
plots		plots
DDSketchProgram.java		DDSketchProgram.java
GK.java		GK.java
GKProgram.java		GKProgram.java
IID_generators.py		IID_generators.py
KLLProgram.java		KLLProgram.java
LICENSE		LICENSE
Makefile		Makefile
MomentSketchProgram.java		MomentSketchProgram.java
README.md		README.md
SplineSketch.java		SplineSketch.java
SplineSketchAdjustable.java		SplineSketchAdjustable.java
SplineSketchAdjustableProgram.java		SplineSketchAdjustableProgram.java
SplineSketchMG.java		SplineSketchMG.java
SplineSketchMGProgram.java		SplineSketchMGProgram.java
SplineSketchProgram.java		SplineSketchProgram.java
TDigestProgram.java		TDigestProgram.java
commons-math3-3.6.1.jar		commons-math3-3.6.1.jar
datasketches-java-6.0.0.jar		datasketches-java-6.0.0.jar
datasketches-memory-2.2.1.jar		datasketches-memory-2.2.1.jar
download_books_dataset.sh		download_books_dataset.sh
helper_funcs.py		helper_funcs.py
msolver-1.0-SNAPSHOT.jar		msolver-1.0-SNAPSHOT.jar
quantile-bench-1.0-SNAPSHOT.jar		quantile-bench-1.0-SNAPSHOT.jar
run_experiments_IID.py		run_experiments_IID.py
run_experiments_ablation.py		run_experiments_ablation.py
run_experiments_datasets.py		run_experiments_datasets.py
run_experiments_query_time.py		run_experiments_query_time.py
run_experiments_update_time.py		run_experiments_update_time.py
run_sketches_fncs.py		run_sketches_fncs.py
sketches-java-0.8.3.jar		sketches-java-0.8.3.jar
spline_sketch_uniform.py		spline_sketch_uniform.py
t-digest-3.3.jar		t-digest-3.3.jar
test_merge.py		test_merge.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SplineSketch: a new quantile sketch with uniform error guarantees and high accuracy in practice

Running experiments

Downloading real-world datasets

About

Uh oh!

Releases

Packages

Languages

License

PavelVesely/SplineSketch-experiments

Folders and files

Latest commit

History

Repository files navigation

SplineSketch: a new quantile sketch with uniform error guarantees and high accuracy in practice

Running experiments

Downloading real-world datasets

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages