diff --git a/doc/getting_started/index.rst b/doc/getting_started/index.rst index 947bf24dd..36b3ea766 100644 --- a/doc/getting_started/index.rst +++ b/doc/getting_started/index.rst @@ -5,7 +5,7 @@ Getting Started Installation ------------ -Datashader supports Python 3.9, 3.10, 3.11 and 3.12 on Linux, Windows, or Mac +Datashader supports Python 3.10, 3.11, 3.12 and 3.13 on Linux, Windows, or Mac and can be installed with conda:: conda install datashader diff --git a/examples/FAQ.ipynb b/examples/FAQ.ipynb index a6b47f312..db0e786ff 100644 --- a/examples/FAQ.ipynb +++ b/examples/FAQ.ipynb @@ -26,7 +26,7 @@ "If you have a very small number of data points (in the hundreds\n", "or thousands) or curves (in the tens or several tens, each with\n", "hundreds or thousands of points), then conventional plotting packages\n", - "like [Bokeh](https://bokeh.pydata.org) may be more suitable. With conventional browser-based\n", + "like [Bokeh](https://bokeh.org) may be more suitable. With conventional browser-based\n", "packages, all of the data points are passed directly to the browser for\n", "display, allowing specific interaction with each curve or point,\n", "including display of metadata, linking to sources, etc. This approach\n", @@ -100,5 +100,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/examples/getting_started/1_Introduction.ipynb b/examples/getting_started/1_Introduction.ipynb index 72f337e66..45339fac0 100644 --- a/examples/getting_started/1_Introduction.ipynb +++ b/examples/getting_started/1_Introduction.ipynb @@ -8,10 +8,10 @@ "\n", "**Datashader turns even the largest datasets into images, faithfully preserving the data's distribution.**\n", "\n", - "Datashader is an [open-source](https://github.com/bokeh/datashader/) Python library for analyzing and visualizing large datasets. Specifically, Datashader is designed to \"rasterize\" or \"aggregate\" datasets into regular grids that can be analyzed further or viewed as images, making it simple and quick to see the properties and patterns of your data. Datashader can plot a billion points in a second or so on a 16GB laptop, and scales up easily to out-of-core, distributed, or GPU processing for even larger datasets.\n", + "Datashader is an [open-source](https://github.com/holoviz/datashader/) Python library for analyzing and visualizing large datasets. Specifically, Datashader is designed to \"rasterize\" or \"aggregate\" datasets into regular grids that can be analyzed further or viewed as images, making it simple and quick to see the properties and patterns of your data. Datashader can plot a billion points in a second or so on a 16GB laptop, and scales up easily to out-of-core, distributed, or GPU processing for even larger datasets.\n", "\n", "This page of the getting-started guide will give a simple example to show how it works, and the following page will show how to use Datashader as a standalone library for generating arrays or images directly\n", - "([Pipeline](2_Pipeline.ipynb)). Next we'll show how to use Datashader as a component in a larger visualization system like [HoloViews](http://holoviews.org) or [Bokeh](http://bokeh.pydata.org) that provides interactive plots with dynamic zooming, labeled axes, and overlays and layouts ([3-Interactivity](3-Interactivity.ipynb)). More detailed information about each topic is then provided in the [User Guide](../user_guide/).\n", + "([Pipeline](2_Pipeline.ipynb)). Next we'll show how to use Datashader as a component in a larger visualization system like [HoloViews](https://holoviews.org) or [Bokeh](https://bokeh.org) that provides interactive plots with dynamic zooming, labeled axes, and overlays and layouts ([3-Interactivity](3-Interactivity.ipynb)). More detailed information about each topic is then provided in the [User Guide](../user_guide/).\n", "\n", "## Example: NYC taxi trips\n", "\n", @@ -91,5 +91,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/examples/getting_started/2_Pipeline.ipynb b/examples/getting_started/2_Pipeline.ipynb index 6a2f69d81..e0951dcc0 100644 --- a/examples/getting_started/2_Pipeline.ipynb +++ b/examples/getting_started/2_Pipeline.ipynb @@ -56,7 +56,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Datashader can work many different data objects provided by different data libraries depending on the type of data involved, such as columnar data in [Pandas](http://pandas.pydata.org) or [Dask](http://dask.pydata.org) dataframes, gridded multidimensional array data using [xarray](http://xarray.pydata.org), columnar data on GPUs using [cuDF](https://github.com/rapidsai/cudf), multidimensional arrays on GPUs using [CuPy](https://cupy.chainer.org/), and ragged arrays using [SpatialPandas](https://github.com/holoviz/spatialpandas) (see the [Performance User Guide](../10_Performance.ipynb) for a guide to selecting an appropriate library). Here, we're using a Pandas dataframe, with 50,000 rows by default:" + "Datashader can work many different data objects provided by different data libraries depending on the type of data involved, such as columnar data in [Pandas](https://pandas.pydata.org) or [Dask](https://dask.org) dataframes, gridded multidimensional array data using [xarray](https://xarray.dev), columnar data on GPUs using [cuDF](https://github.com/rapidsai/cudf), multidimensional arrays on GPUs using [CuPy](https://cupy.chainer.org/), and ragged arrays using [SpatialPandas](https://github.com/holoviz/spatialpandas) (see the [Performance User Guide](../10_Performance.ipynb) for a guide to selecting an appropriate library). Here, we're using a Pandas dataframe, with 50,000 rows by default:" ] }, { @@ -177,7 +177,7 @@ "source": [ "### 2D Reductions\n", "\n", - "One you have determined your mapping, you'll next need to choose a reduction operator to use when aggregating multiple datapoints into a given pixel. For points, each datapoint is mapped into a single pixel, while the other glyphs have spatial extent and can thus map into multiple pixels, each of which operates the same way. All glyphs act like points if the entire glyph is contained within that pixel. Here we will talk only about \"datapoints\" for simplicity, which for an area-based glyph should be interpreted as \"the part of that glyph that falls into this pixel\".\n", + "Once you have determined your mapping, you'll next need to choose a reduction operator to use when aggregating multiple datapoints into a given pixel. For points, each datapoint is mapped into a single pixel, while the other glyphs have spatial extent and can thus map into multiple pixels, each of which operates the same way. All glyphs act like points if the entire glyph is contained within that pixel. Here we will talk only about \"datapoints\" for simplicity, which for an area-based glyph should be interpreted as \"the part of that glyph that falls into this pixel\".\n", "\n", "All of the currently supported reduction operators are incremental, which means that we can efficiently process datasets in a single pass. Given an aggregate bin to update (typically corresponding to one eventual pixel) and a new datapoint, the reduction operator updates the state of the bin in some way. (Actually, datapoints are normally processed in batches for efficiency, but it's simplest to think about the operator as being applied per data point, and the mathematical result should be the same.) A large number of useful [reduction operators](https://datashader.org/api.html#reductions) are supplied in `ds.reductions`, including:\n", "\n", @@ -213,7 +213,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The result of will be an [xarray](http://xarray.pydata.org) `DataArray` data structure containing the bin values (typically one value per bin, but more for multiple category or multiple-aggregate operators) along with axis range and type information.\n", + "The result of will be an [xarray](https://xarray.dev) `DataArray` data structure containing the bin values (typically one value per bin, but more for multiple category or multiple-aggregate operators) along with axis range and type information.\n", "\n", "We can visualize this array in many different ways by customizing the pipeline stages described in the following sections, but for now we'll simply render images using the default parameters to show the effects of a few different aggregate operators:" ] @@ -359,7 +359,7 @@ "\n", "Now that the data has been projected and aggregated into a 2D or 3D gridded data structure, it can be processed in any way you like, before converting it to an image as will be described in the following section. At this stage, the data is still stored as bin data, not pixels, which makes a very wide variety of operations and transformations simple to express. \n", "\n", - "For instance, instead of plotting all the data, we can easily plot only those bins in the 99th percentile by count (left), or apply any [NumPy ufunc](http://docs.scipy.org/doc/numpy/reference/ufuncs.html) to the bin values (whether or not it makes any sense!):" + "For instance, instead of plotting all the data, we can easily plot only those bins in the 99th percentile by count (left), or apply any [NumPy ufunc](https://docs.scipy.org/doc/numpy/reference/ufuncs.html) to the bin values (whether or not it makes any sense!):" ] }, { @@ -388,11 +388,11 @@ "metadata": {}, "outputs": [], "source": [ - "sel1 = agg_d3_d5.where(aggc.sel(cat='d3') == aggc.sel(cat='d5')).astype('uint32')\n", - "sel2 = agg.where(aggc.sel(cat='d3') == aggc.sel(cat='d5')).astype('uint32')\n", + "sel1 = agg_d3_d5.where(aggc.sel(cat='d3') == aggc.sel(cat='d5'), other=-1).astype('uint32')\n", + "sel2 = agg.where(aggc.sel(cat='d3') == aggc.sel(cat='d5'), other=-1).astype('uint32')\n", "\n", - "tf.Images(tf.shade(sel1, name=\"d3+d5 where d3==d5\"),\n", - " tf.shade(sel2, name=\"d1+d2+d3+d4+d5 where d3==d5\"))" + "tf.Images(tf.shade(sel1, name='d3+d5 where d3==d5'),\n", + " tf.shade(sel2, name='d1+d2+d3+d4+d5 where d3==d5'))" ] }, { @@ -408,7 +408,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The [xarray documentation](http://xarray.pydata.org/en/stable/computation.html) describes all the various transformations you can apply from within xarray, and of course you can always extract the data values and operate on them outside of xarray for any transformation not directly supported by xarray, then construct a suitable xarray object for use in the following stage. Once the data is in the aggregate array, you generally don't have to worry much about optimization, because it's a fixed-sized grid regardless of your data size, and so it is very straightforward to apply arbitrary transformations to the aggregates." + "The [xarray documentation](https://docs.xarray.dev/en/stable/user-guide/computation.html) describes all the various transformations you can apply from within xarray, and of course you can always extract the data values and operate on them outside of xarray for any transformation not directly supported by xarray, then construct a suitable xarray object for use in the following stage. Once the data is in the aggregate array, you generally don't have to worry much about optimization, because it's a fixed-sized grid regardless of your data size, and so it is very straightforward to apply arbitrary transformations to the aggregates." ] }, { @@ -782,7 +782,7 @@ "source": [ "#### Colormapping with negative values\n", "\n", - "The above examples all use positive data values to avoid confusion when there is no colorbar or other explicit indication of a z (color) axis range. Negative values are also supported, in which case for a non-categorical plot you should normally use a [diverging colormap](https://colorcet.holoviz.org/user_guide/Continuous.html#Diverging-colormaps,-for-plotting-magnitudes-increasing-or-decreasing-from-a-central-point:):" + "The above examples all use positive data values to avoid confusion when there is no colorbar or other explicit indication of a z (color) axis range. Negative values are also supported, in which case for a non-categorical plot you should normally use a [diverging colormap](https://colorcet.holoviz.org/user_guide/Continuous.html#diverging-colormaps-for-plotting-magnitudes-increasing-or-decreasing-from-a-central-point)" ] }, { @@ -793,7 +793,7 @@ "source": [ "from colorcet import coolwarm, CET_D8\n", "dfn = df.copy()\n", - "dfn.val.replace({20:-20, 30:0, 40:-40}, inplace=True)\n", + "dfn[\"val\"] = dfn[\"val\"].replace({20: -20, 30: 0, 40: -40})\n", "aggn = ds.Canvas().points(dfn,'x','y', agg=ds.mean(\"val\"))\n", "\n", "tf.Images(tf.shade(aggn, name=\"Sequential\", cmap=[\"lightblue\",\"blue\"], how=\"linear\"),\n", @@ -881,7 +881,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "See [the API docs](https://datashader.org/api.html#transfer-functions) for more details. Image composition operators to provide for the `how` argument of `tf.stack` (e.g. `over` (default), `source`, `add`, and `saturate`) are listed in [composite.py](https://raw.githubusercontent.com/holoviz/datashader/main/datashader/composite.py) and illustrated [here](http://cairographics.org/operators).\n", + "See [the API docs](https://datashader.org/api.html#transfer-functions) for more details. Image composition operators to provide for the `how` argument of `tf.stack` (e.g. `over` (default), `source`, `add`, and `saturate`) are listed in [composite.py](https://raw.githubusercontent.com/holoviz/datashader/main/datashader/composite.py) and illustrated [here](https://cairographics.org/operators).\n", "\n", "## Embedding\n", "\n", @@ -896,5 +896,5 @@ } }, "nbformat": 4, - "nbformat_minor": 1 + "nbformat_minor": 4 } diff --git a/examples/getting_started/3_Interactivity.ipynb b/examples/getting_started/3_Interactivity.ipynb index 81a77d3e6..d525d087e 100644 --- a/examples/getting_started/3_Interactivity.ipynb +++ b/examples/getting_started/3_Interactivity.ipynb @@ -6,7 +6,7 @@ "source": [ "The [previous notebook](2-Pipeline.ipynb) showed all the steps required to get a Datashader rendering of your dataset, yielding raster images displayed using [Jupyter](http://jupyter.org)'s \"rich display\" support. However, these bare images do not show the data ranges or axis labels, making them difficult to interpret. Moreover, they are only static images, and datasets often need to be explored at multiple scales, which is much easier to do in an interactive program. \n", "\n", - "To get axes and interactivity, the images generated by Datashader need to be embedded into a plot using an external library like [Matplotlib](http://matplotlib.org) or [Bokeh](http://bokeh.org). As we illustrate below, the most convenient way to make Datashader plots using these libraries is via the [HoloViews](http://holoviews.org) high-level data-science API, using either [Bokeh](http://holoviews.org/user_guide/Large_Data.html) or [Plotly](https://medium.com/plotly/introducing-dash-holoviews-6a05c088ebe5). HoloViews encapsulates the Datashader pipeline in a way that lets you combine interactive datashaded plots easily with other plots without having to write explicit callbacks or event-processing code.\n", + "To get axes and interactivity, the images generated by Datashader need to be embedded into a plot using an external library like [Matplotlib](https://matplotlib.org) or [Bokeh](https://bokeh.org). As we illustrate below, the most convenient way to make Datashader plots using these libraries is via the [HoloViews](https://holoviews.org) high-level data-science API, using either [Bokeh](https://holoviews.org/user_guide/Large_Data.html) or [Plotly](https://medium.com/plotly/introducing-dash-holoviews-6a05c088ebe5). HoloViews encapsulates the Datashader pipeline in a way that lets you combine interactive datashaded plots easily with other plots without having to write explicit callbacks or event-processing code.\n", "\n", "In this notebook, we will first look at the HoloViews API, then at Datashader's new native Matplotlib support." ] @@ -17,7 +17,7 @@ "source": [ "# Embedding Datashader with HoloViews\n", "\n", - "[HoloViews](http://holoviews.org) (1.7 and later) is a high-level data analysis and visualization library that makes it simple to generate interactive [Datashader](https://github.com/holoviz/datashader)-based plots. Here's an illustration of how this all fits together when using HoloViews+[Bokeh](http://bokeh.pydata.org):\n", + "[HoloViews](https://holoviews.org) (1.7 and later) is a high-level data analysis and visualization library that makes it simple to generate interactive [Datashader](https://github.com/holoviz/datashader)-based plots. Here's an illustration of how this all fits together when using HoloViews+[Bokeh](https://bokeh.org):\n", "\n", " \n", "\n", @@ -80,7 +80,7 @@ "source": [ "### HoloViews+Bokeh\n", "\n", - "Rather than starting out by specifying a figure or plot, in HoloViews you specify an [``Element``](http://holoviews.org/reference/index.html#elements) object to contain your data, such as `Points` for a collection of 2D x,y points. To start, let's define a Points object wrapping around a small dataframe with 10,000 random samples from the ``df`` above:" + "Rather than starting out by specifying a figure or plot, in HoloViews you specify an [``Element``](https://holoviews.org/reference/index.html#elements) object to contain your data, such as `Points` for a collection of 2D x,y points. To start, let's define a Points object wrapping around a small dataframe with 10,000 random samples from the ``df`` above:" ] }, { @@ -144,7 +144,7 @@ "\n", "### HoloViews+Datashader+Bokeh\n", "\n", - "The Matplotlib interface only produces a static plot, i.e., a PNG or SVG image, but the [Bokeh](http://bokeh.pydata.org) and Plotly interfaces of HoloViews add the dynamic zooming and panning necessary to understand datasets across scales:" + "The Matplotlib interface only produces a static plot, i.e., a PNG or SVG image, but the [Bokeh](https://bokeh.org) and Plotly interfaces of HoloViews add the dynamic zooming and panning necessary to understand datasets across scales:" ] }, { @@ -187,7 +187,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can read more about HoloViews support for Datashader at [holoviews.org](http://holoviews.org/user_guide/Large_Data.html)." + "You can read more about HoloViews support for Datashader at [holoviews.org](https://holoviews.org/user_guide/Large_Data.html)." ] }, { @@ -196,7 +196,7 @@ "source": [ "### HoloViews+Datashader+Bokeh Legends\n", "\n", - "As explained in the [HoloViews User Guide](http://holoviews.org/user_guide/Large_Data.html), you'll want to use the HoloViews `rasterize` operation whenever you can, instead of `datashade`, because `rasterize` lets the plotting library do the final colormapping stage, allowing it to provide colorbars, legends, and interactive features like hover that reveal the actual (aggregated) data. However, plotting libraries do not yet support all of Datashader's features, such as `shade`'s categorical color mixing, and in those cases you will need to use special techniques like those listed here. \n", + "As explained in the [HoloViews User Guide](https://holoviews.org/user_guide/Large_Data.html), you'll want to use the HoloViews `rasterize` operation whenever you can, instead of `datashade`, because `rasterize` lets the plotting library do the final colormapping stage, allowing it to provide colorbars, legends, and interactive features like hover that reveal the actual (aggregated) data. However, plotting libraries do not yet support all of Datashader's features, such as `shade`'s categorical color mixing, and in those cases you will need to use special techniques like those listed here. \n", "\n", "If you are using Datashader's shading, the underlying plotting library only ever sees an image, not the individual categorical data, and so it cannot automatically show a legend. But you can work around it by building your own categorical legend by adding a suitable collection of labeled dummy points:" ] @@ -257,7 +257,7 @@ "source": [ "In the above examples, the \"fixed square hover\" plot provides coarse hover information from a square patch at a fixed spatial scale, while the \"dynamic square hover\" plot reports on a square area that scales with the zoom level so that arbitrarily small regions of data space can be examined, which is generally more useful.\n", "\n", - "As you can see, HoloViews makes it just about as simple to work with Datashader-based plots as regular Bokeh plots (at least if you don't need color keys!), letting you visualize data of any size interactively in a browser using just a few lines of code. Because Datashader-based HoloViews plots are just one or two extra steps added on to regular HoloViews plots, they support all of the same features as regular HoloViews objects, and can freely be laid out, overlaid, and nested together with them. See [holoviews.org](http://holoviews.org) for examples and documentation for how to control the appearance of these plots and how to work with them in general.\n", + "As you can see, HoloViews makes it just about as simple to work with Datashader-based plots as regular Bokeh plots (at least if you don't need color keys!), letting you visualize data of any size interactively in a browser using just a few lines of code. Because Datashader-based HoloViews plots are just one or two extra steps added on to regular HoloViews plots, they support all of the same features as regular HoloViews objects, and can freely be laid out, overlaid, and nested together with them. See [holoviews.org](https://holoviews.org) for examples and documentation for how to control the appearance of these plots and how to work with them in general.\n", "\n", "## HoloViews+Datashader+Panel\n", "\n", @@ -496,5 +496,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/examples/user_guide/10_Performance.ipynb b/examples/user_guide/10_Performance.ipynb index 1cf69aa12..bfac5f77b 100644 --- a/examples/user_guide/10_Performance.ipynb +++ b/examples/user_guide/10_Performance.ipynb @@ -53,7 +53,7 @@ "\n", "Datashader performance will vary significantly depending on the library and specific data object type used to represent the data in Python, because different libraries and data objects have very different abilities to use the available processing power and memory. Moreover, different libraries and objects are appropriate for different types of data, due to how they organize and store the data internally as well as the operations they provide for working with the data. The various data container objects available from the supported libraries all fall into one of the following three types of data structures:\n", "- **[Columnar (tabular) data](https://pandas.pydata.org/pandas-docs/stable/getting_started/overview.html)**: Relational, table-like data consisting of arbitrarily many rows, each with data for a fixed number of columns. For example, if you track the location of a particular cell phone over time, each time sampled would be a row, and for each time there could be columns for the latitude and longitude for the location at that time.\n", - "- **[n-D arrays (multidimensional data)](http://xarray.pydata.org/en/stable/why-xarray.html)**: Data laid out in _n_ dimensions, where _n_ is typically >1. For example, you might have the precipitation measured on a latitude and longitude grid covering the whole world, for every time at which precipitation was measured. Such data could be stored columnarly, but it would be very inefficient; instead it is stored as a three dimensional array of precipitation values, indexed with time, latitude, and longitude.\n", + "- **[n-D arrays (multidimensional data)](https://docs.xarray.dev/en/stable/getting-started-guide/why-xarray.html)**: Data laid out in _n_ dimensions, where _n_ is typically >1. For example, you might have the precipitation measured on a latitude and longitude grid covering the whole world, for every time at which precipitation was measured. Such data could be stored columnarly, but it would be very inefficient; instead it is stored as a three dimensional array of precipitation values, indexed with time, latitude, and longitude.\n", "- **[Ragged arrays](https://en.wikipedia.org/wiki/Jagged_array)**: Relational/columnar data where the value of at least one column is a list of values that could vary in length for each row. For example, you may have a table with one row per US state and columns for population, land area, and the geographic shape of that state. Here the shape would be stored as a polygon consisting of an arbitrarily long list of latitude and longitude coordinates, which does not fit efficiently into a standard columnar data structure due to its ragged (variable length) nature.\n", "\n", "As you can see, all three examples include latitude and longitude values, but they are very different data structures that need to be stored differently for them to be processed efficiently. \n", @@ -184,7 +184,7 @@ "