apache · alamb · Apr 12, 2023 · Apr 11, 2023 · Apr 11, 2023 · Apr 12, 2023
diff --git a/docs/source/contributor-guide/architecture.md b/docs/source/contributor-guide/architecture.md
@@ -20,7 +20,8 @@
 # Architecture
 
 DataFusion's code structure and organization is described in the
-[Crate Documentation], to keep it as close to the source as
-possible.
+[crates.io documentation], to keep it as close to the source as
+possible. You can find the most up to date version in the [source code].
 
-[crate documentation]: https://docs.rs/datafusion/latest/datafusion/index.html#code-organization
+[crates.io documentation]: https://docs.rs/datafusion/latest/datafusion/index.html#code-organization
+[source code]: https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/lib.rs
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -37,10 +37,9 @@ community.
    :maxdepth: 1
    :caption: Links
 
-   Issue tracker <https://github.com/apache/arrow-datafusion/issues>
+   Github and Issue Tracker <https://github.com/apache/arrow-datafusion>
    crates.io <https://crates.io/crates/datafusion>
-   API Docs <https://docs.rs/datafusion/21.1.0/datafusion/>
-   Github <https://github.com/apache/arrow-datafusion>
+   API Docs <https://docs.rs/datafusion/latest/datafusion/>
    Code of conduct <https://github.com/apache/arrow-datafusion/blob/main/CODE_OF_CONDUCT.md>
 
 .. _toc.guide:
@@ -50,22 +49,17 @@ community.
 
    user-guide/introduction
    user-guide/example-usage
-   user-guide/users
-   user-guide/comparison
-   user-guide/integration
-   user-guide/library
    user-guide/cli
    user-guide/dataframe
    user-guide/expressions
    user-guide/sql/index
    user-guide/configs
    user-guide/faq
-   Rust Crate Documentation <https://docs.rs/crate/datafusion/>
 
 .. _toc.contributor-guide:
 
 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 1
    :caption: Contributor Guide
 
    contributor-guide/index

diff --git a/docs/source/user-guide/cli.md b/docs/source/user-guide/cli.md
@@ -17,7 +17,7 @@
   under the License.
 -->
 
-# DataFusion Command-line SQL Utility
+# `datafusion-cli`
 
 The DataFusion CLI is a command-line interactive SQL utility for executing
 queries against any supported data files. It is a convenient way to

diff --git a/docs/source/user-guide/comparison.md b/docs/source/user-guide/comparison.md
diff --git a/docs/source/user-guide/example-usage.md b/docs/source/user-guide/example-usage.md
@@ -26,7 +26,7 @@ In this example some simple processing is performed on the [`example.csv`](../..
 Add the following to your `Cargo.toml` file:
 
 ```toml
-datafusion = "11.0"
+datafusion = "22"
 tokio = "1.0"
 ```
 
@@ -81,7 +81,7 @@ async fn main() -> datafusion::error::Result<()> {
 +---+--------+
 ```
 
-# Identifiers and Capitalization
+## Identifiers and Capitalization
 
 Please be aware that all identifiers are effectively made lower-case in SQL, so if your csv file has capital letters (ex: `Name`) you must put your column name in double quotes or the examples won't work.
 
@@ -141,3 +141,60 @@ async fn main() -> datafusion::error::Result<()> {
 | 1 | 2      |
 +---+--------+
 ```
+
+## Extensibility
+
+DataFusion is designed to be extensible at all points. To that end, you can provide your own custom:
+
+- [x] User Defined Functions (UDFs)
+- [x] User Defined Aggregate Functions (UDAFs)
+- [x] User Defined Table Source (`TableProvider`) for tables
+- [x] User Defined `Optimizer` passes (plan rewrites)
+- [x] User Defined `LogicalPlan` nodes
+- [x] User Defined `ExecutionPlan` nodes
+
+## Rust Version Compatibility
+
+This crate is tested with the latest stable version of Rust. We do not currently test against other, older versions of the Rust compiler.
+
+## Optimized Configuration
+
+For an optimized build several steps are required. First, use the below in your `Cargo.toml`. It is
+worth noting that using the settings in the `[profile.release]` section will significantly increase the build time.
+
+```toml
+[dependencies]
+datafusion = { version = "22.0" , features = ["simd"]}
+tokio = { version = "^1.0", features = ["rt-multi-thread"] }
+snmalloc-rs = "0.2"
+
+[profile.release]
+lto = true
+codegen-units = 1
+```
+
+Then, in `main.rs.` update the memory allocator with the below after your imports:
+
+```rust
+use datafusion::prelude::*;
+
+#[global_allocator]
+static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
+
+async fn main() -> datafusion::error::Result<()> {
+  Ok(())
+}
+```
+
+Finally, in order to build with the `simd` optimization `cargo nightly` is required.
+
+```shell
+rustup toolchain install nightly
+```
+
+Based on the instruction set architecture you are building on you will want to configure the `target-cpu` as well, ideally
+with `native` or at least `avx2`.
+
+```
+RUSTFLAGS='-C target-cpu=native' cargo +nightly run --release
+```
diff --git a/docs/source/user-guide/expressions.md b/docs/source/user-guide/expressions.md
@@ -17,7 +17,7 @@
   under the License.
 -->
 
-# Expressions
+# Expression API
 
 DataFrame methods such as `select` and `filter` accept one or more logical expressions and there are many functions
 available for creating logical expressions. These are documented below.

diff --git a/docs/source/user-guide/faq.md b/docs/source/user-guide/faq.md
@@ -29,3 +29,37 @@ model and computational kernels. It is designed to run within a single process,
 for parallel query execution.
 
 [Ballista](https://github.com/apache/arrow-ballista) is a distributed compute platform built on DataFusion.
+
+# How does DataFusion Compare with `XYZ`?
+
+When compared to similar systems, DataFusion typically is:
+
+1. Targeted at developers, rather than end users / data scientists.
+2. Designed to be embedded, rather than a complete file based SQL system.
+3. Governed by the [Apache Software Foundation](https://www.apache.org/) process, rather than a single company or individual.
+4. Implemented in `Rust`, rather than `C/C++`
+
+Here is a comparison with similar projects that may help understand
+when DataFusion might be be suitable and unsuitable for your needs:
+
+- [DuckDB](https://www.duckdb.org) is an open source, in process analytic database.
+  Like DataFusion, it supports very fast execution, both from its custom file format
+  and directly from parquet files. Unlike DataFusion, it is written in C/C++ and it
+  is primarily used directly by users as a serverless database and query system rather
+  than as a library for building such database systems.
+
+- [Polars](http://pola.rs): Polars is one of the fastest DataFrame
+  libraries at the time of writing. Like DataFusion, it is also
+  written in Rust and uses the Apache Arrow memory model, but unlike
+  DataFusion it is not designed with as many extension points.
+
+- [Facebook Velox](https://github.com/facebookincubator/velox)
+  is an execution engine. Like DataFusion, Velox aims to
+  provide a reusable foundation for building database-like systems. Unlike DataFusion,
+  it is written in C/C++ and does not include a SQL frontend or planning / optimization
+  framework.
+
+- [Databend](https://github.com/datafuselabs/databend) is a complete
+  database system. Like DataFusion it is also written in Rust and
+  utilizes the Apache Arrow memory model, but unlike DataFusion it
+  targets end-users rather than developers of other database systems.
diff --git a/docs/source/user-guide/integration.md b/docs/source/user-guide/integration.md
diff --git a/docs/source/user-guide/introduction.md b/docs/source/user-guide/introduction.md
@@ -17,7 +17,7 @@
   under the License.
 -->
 
-# Features, and Usecases
+# Introduction
 
 DataFusion is a very fast, extensible query engine for building
 high-quality data-centric systems in [Rust](http://rustlang.org),
@@ -66,6 +66,72 @@ features, and avoid reimplementing general (but still necessary)
 features such as an expression representation, standard optimizations,
 execution plans, file format support, etc.
 
+## Known Users
+
+Here are some of the projects known to use DataFusion:
+
+- [Ballista](https://github.com/apache/arrow-ballista) Distributed SQL Query Engine
+- [Blaze](https://github.com/blaze-init/blaze) Spark accelerator with DataFusion at its core
+- [CeresDB](https://github.com/CeresDB/ceresdb) Distributed Time-Series Database
+- [Cloudfuse Buzz](https://github.com/cloudfuse-io/buzz-rust)
+- [CnosDB](https://github.com/cnosdb/cnosdb) Open Source Distributed Time Series Database
+- [Cube Store](https://github.com/cube-js/cube.js/tree/master/rust)
+- [Dask SQL](https://github.com/dask-contrib/dask-sql) Distributed SQL query engine in Python
+- [datafusion-tui](https://github.com/datafusion-contrib/datafusion-tui) Text UI for DataFusion
+- [delta-rs](https://github.com/delta-io/delta-rs) Native Rust implementation of Delta Lake
+- [Flock](https://github.com/flock-lab/flock)
+- [GreptimeDB](https://github.com/GreptimeTeam/greptimedb) Open Source & Cloud Native Distributed Time Series Database
+- [InfluxDB IOx](https://github.com/influxdata/influxdb_iox) Time Series Database
+- [Kamu](https://github.com/kamu-data/kamu-cli/) Planet-scale streaming data pipeline
+- [Parseable](https://github.com/parseablehq/parseable) Log storage and observability platform
+- [qv](https://github.com/timvw/qv) Quickly view your data
+- [ROAPI](https://github.com/roapi/roapi)
+- [Seafowl](https://github.com/splitgraph/seafowl) CDN-friendly analytical database
+- [Synnada](https://synnada.ai/) Streaming-first framework for data products
+- [Tensorbase](https://github.com/tensorbase/tensorbase)
+- [VegaFusion](https://vegafusion.io/) Server-side acceleration for the [Vega](https://vega.github.io/) visualization grammar
+- [ZincObserve](https://github.com/zinclabs/zincobserve) Distributed cloud native observability platform
+
+[ballista]: https://github.com/apache/arrow-ballista
+[blaze]: https://github.com/blaze-init/blaze
+[ceresdb]: https://github.com/CeresDB/ceresdb
+[cloudfuse buzz]: https://github.com/cloudfuse-io/buzz-rust
+[cnosdb]: https://github.com/cnosdb/cnosdb
+[cube store]: https://github.com/cube-js/cube.js/tree/master/rust
+[dask sql]: https://github.com/dask-contrib/dask-sql
+[datafusion-tui]: https://github.com/datafusion-contrib/datafusion-tui
+[delta-rs]: https://github.com/delta-io/delta-rs
+[flock]: https://github.com/flock-lab/flock
+[kamu]: https://github.com/kamu-data/kamu-cli
+[greptime db]: https://github.com/GreptimeTeam/greptimedb
+[influxdb iox]: https://github.com/influxdata/influxdb_iox
+[parseable]: https://github.com/parseablehq/parseable
+[prql-query]: https://github.com/prql/prql-query
+[qv]: https://github.com/timvw/qv
+[roapi]: https://github.com/roapi/roapi
+[seafowl]: https://github.com/splitgraph/seafowl
+[synnada]: https://synnada.ai/
+[tensorbase]: https://github.com/tensorbase/tensorbase
+[vegafusion]: https://vegafusion.io/
+[zincobserve]: https://github.com/zinclabs/zincobserve "if you know of another project, please submit a PR to add a link!"
+
+## Integrations and Extensions
+
+There are a number of community projects that extend DataFusion or
+provide integrations with other systems.
+
+### Language Bindings
+
+- [datafusion-c](https://github.com/datafusion-contrib/datafusion-c)
+- [datafusion-python](https://github.com/apache/arrow-datafusion-python)
+- [datafusion-ruby](https://github.com/datafusion-contrib/datafusion-ruby)
+- [datafusion-java](https://github.com/datafusion-contrib/datafusion-java)
+
+### Integrations
+
+- [datafusion-bigtable](https://github.com/datafusion-contrib/datafusion-bigtable)
+- [datafusion-catalogprovider-glue](https://github.com/datafusion-contrib/datafusion-catalogprovider-glue)
+
 ## Why DataFusion?
 
 - _High Performance_: Leveraging Rust and Arrow's memory model, DataFusion is very fast.