Skip to content

Commit 5bcf2b7

Browse files
authored
Merge pull request #55 from octue/release/0.1.7
Release/0.1.7
2 parents 0dd42ec + c2f6ff7 commit 5bcf2b7

33 files changed

+1483
-347
lines changed

.github/workflows/check-version-consistency.yml

Lines changed: 0 additions & 9 deletions
This file was deleted.

.github/workflows/python-ci.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,14 @@ name: python-ci
99
on: [push]
1010

1111
jobs:
12+
13+
check-version-consistency:
14+
runs-on: ubuntu-latest
15+
steps:
16+
- uses: actions/checkout@v2
17+
- uses: actions/setup-python@v2
18+
- run: python .github/workflows/scripts/check-version-consistency.py
19+
1220
tests:
1321
runs-on: ubuntu-latest
1422
env:

docs/source/analysis_objects.rst

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
.. _analysis_objects:
2+
3+
================
4+
Analysis objects
5+
================
6+
7+
An ``Analysis`` object is the sole argument to the ``app`` function in your ``app.py`` module. Its attributes include
8+
every strand that can be possibly added to a ``Twine``, although only the strands specified in your ``twine.py`` file
9+
will not be ``None``. The attributes are:
10+
11+
- ``input_values``
12+
- ``input_manifest``
13+
- ``configuration_values``
14+
- ``configuration_manifest``
15+
- ``output_values``
16+
- ``output_manifest``
17+
- ``credentials``
18+
- ``children``
19+
- ``monitors``
20+
21+
Additionally, all input and configuration attributes are hashed using a
22+
`BLAKE3 hash <https://github.com/BLAKE3-team/BLAKE3>`_ so the inputs and configuration that produced a given output in
23+
your app can always be verified. These hashes exist on the following attributes:
24+
25+
- ``input_values_hash``
26+
- ``input_manifest_hash``
27+
- ``configuration_values_hash``
28+
- ``configuration_manifest_hash``
29+
30+
If an input or configuration attribute is ``None``, so will its hash attribute be. For ``Manifests``, some metadata
31+
about the ``Datafiles`` and ``Datasets`` within them, and about the ``Manifest`` itself, is included when calculating
32+
the hash:
33+
34+
- For a ``Datafile``, the content of its on-disk file is hashed, along with the following metadata:
35+
36+
- ``name``
37+
- ``cluster``
38+
- ``sequence``
39+
- ``posix_timestamp``
40+
- ``tags``
41+
42+
- For a ``Dataset``, the hashes of its ``Datafiles`` are included, along with its ``tags``.
43+
44+
- For a ``Manifest``, the hashes of its ``Datasets`` are included, along with its ``keys``.

docs/source/datafile.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
.. _datafile:
2+
3+
========
4+
Datafile
5+
========
6+
7+
A ``Datafile`` is an Octue type that corresponds to a file, which may exist on your computer or in a cloud store. It has
8+
the following main attributes:
9+
10+
- ``path`` - the path of this file, which may include folders or subfolders, within the dataset.
11+
- ``cluster`` - the integer cluster of files, within a dataset, to which this belongs (default 0)
12+
- ``sequence`` - a sequence number of this file within its cluster (if sequences are appropriate)
13+
- ``tags`` - a space-separated string or iterable of tags relevant to this file
14+
- ``posix_timestamp`` - a posix timestamp associated with the file, in seconds since epoch, typically when it was created but could relate to a relevant time point for the data

docs/source/dataset.rst

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
.. _dataset:
2+
3+
=======
4+
Dataset
5+
=======
6+
7+
A ``Dataset`` contains any number of ``Datafiles`` along with the following metadata:
8+
9+
- ``name``
10+
- ``tags``
11+
12+
The files are stored in a ``FilterSet``, meaning they can be easily filtered according to any attribute of the
13+
`Datafile <datafile.rst>`_ instances it contains.
14+
15+
16+
--------------------------------
17+
Filtering files in a ``Dataset``
18+
--------------------------------
19+
20+
You can filter a ``Dataset``'s files as follows:
21+
22+
.. code-block:: python
23+
dataset = Dataset(
24+
files=[
25+
Datafile(path="path-within-dataset/my_file.csv", tags="one a:2 b:3 all"),
26+
Datafile(path="path-within-dataset/your_file.txt", tags="two a:2 b:3 all"),
27+
Datafile(path="path-within-dataset/another_file.csv", tags="three all"),
28+
]
29+
)
30+
31+
dataset.files.filter(filter_name="name__ends_with", filter_value=".csv")
32+
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>
33+
34+
dataset.files.filter("tags__contains", filter_value="a:2")
35+
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('your_file.txt')>})>
36+
37+
You can also chain filters indefinitely:
38+
39+
.. code-block:: python
40+
dataset.files.filter(filter_name="name__ends_with", filter_value=".csv").filter("tags__contains", filter_value="a:2")
41+
>>> <FilterSet({<Datafile('my_file.csv')>})>
42+
43+
Find out more about ``FilterSets`` `here <filterset.rst>`_, including all the possible filters available for each type of object stored on
44+
an attribute of a ``FilterSet`` member, and how to convert them to primitive types such as ``set`` or ``list``.

docs/source/filter_containers.rst

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
.. _filter_containers:
2+
3+
=================
4+
Filter containers
5+
=================
6+
7+
A filter container is just a regular python container that has some extra methods for filtering or ordering its
8+
elements. It has the same interface (i.e. attributes and methods) as the primitive python type it inherits from, with
9+
these extra methods:
10+
11+
- ``filter``
12+
- ``order_by``
13+
14+
There are two types of filter containers currently implemented:
15+
16+
- ``FilterSet``
17+
- ``FilterList``
18+
19+
``FilterSets`` are currently used in:
20+
21+
- ``Dataset.files`` to store ``Datafiles``
22+
- ``TagSet.tags`` to store ``Tags``
23+
24+
You can see filtering in action on the files of a ``Dataset`` `here <dataset.rst>`_.
25+
26+
27+
---------
28+
Filtering
29+
---------
30+
31+
Filters are named as ``"<name_of_attribute_to_check>__<filter_action>"``, and any attribute of a member of the
32+
``FilterSet`` whose type or interface is supported can be filtered.
33+
.. code-block:: python
34+
filter_set = FilterSet(
35+
{Datafile(path="my_file.csv"), Datafile(path="your_file.txt"), Datafile(path="another_file.csv")}
36+
)
37+
38+
filter_set.filter(filter_name="name__ends_with", filter_value=".csv")
39+
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>
40+
41+
The following filters are implemented for the following types:
42+
43+
- ``bool``:
44+
45+
* ``is``
46+
* ``is_not``
47+
48+
- ``str``:
49+
50+
* ``is``
51+
* ``is_not``
52+
* ``equals``
53+
* ``not_equals``
54+
* ``iequals``
55+
* ``not_iequals``
56+
* ``lt`` (less than)
57+
* ``lte`` (less than or equal)
58+
* ``gt`` (greater than)
59+
* ``gte`` (greater than or equal)
60+
* ``contains``
61+
* ``not_contains``
62+
* ``icontains`` (case-insensitive contains)
63+
* ``not_icontains``
64+
* ``starts_with``
65+
* ``not_starts_with``
66+
* ``ends_with``
67+
* ``not_ends_with``
68+
69+
- ``NoneType``:
70+
71+
* ``is``
72+
* ``is_not``
73+
74+
- ``TagSet``:
75+
76+
* ``is``
77+
* ``is_not``
78+
* ``equals``
79+
* ``not_equals``
80+
* ``any_tag_contains``
81+
* ``not_any_tag_contains``
82+
* ``any_tag_starts_with``
83+
* ``not_any_tag_starts_with``
84+
* ``any_tag_ends_with``
85+
* ``not_any_tag_ends_with``
86+
87+
88+
89+
Additionally, these filters are defined for the following *interfaces* (duck-types). :
90+
91+
- Numbers:
92+
93+
* ``is``
94+
* ``is_not``
95+
* ``equals``
96+
* ``not_equals``
97+
* ``lt``
98+
* ``lte``
99+
* ``gt``
100+
* ``gte``
101+
102+
- Iterables:
103+
104+
* ``is``
105+
* ``is_not``
106+
* ``equals``
107+
* ``not_equals``
108+
* ``contains``
109+
* ``not_contains``
110+
* ``icontains``
111+
* ``not_icontains``
112+
113+
The interface filters are only used if the type of the attribute of the element being filtered is not found in the first
114+
list of filters.
115+
116+
--------
117+
Ordering
118+
--------
119+
As sets are inherently orderless, ordering a ``FilterSet`` results in a new ``FilterList``, which has the same extra
120+
methods and behaviour as a ``FilterSet``, but is based on the ``list`` type instead - meaning it can be ordered and
121+
indexed etc. A ``FilterSet`` or ``FilterList`` can be ordered by any of the attributes of its members:
122+
.. code-block:: python
123+
filter_set.order_by("name")
124+
>>> <FilterList([<Datafile('another_file.csv')>, <Datafile('my_file.csv')>, <Datafile(path="your_file.txt")>])>
125+
126+
The ordering can also be carried out in reverse (i.e. descending order) by passing ``reverse=True`` as a second argument
127+
to the ``order_by`` method.

docs/source/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ Not all of Octue's API functionality is implemented in the SDK yet, we're active
1313
:hidden:
1414

1515
installation
16+
datafile
17+
dataset
18+
filter_containers
19+
analysis_objects
1620
license
1721
version_history
1822
bibliography

octue/mixins/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
from .base import MixinBase
2+
from .filterable import Filterable
3+
from .hashable import Hashable
24
from .identifiable import Identifiable
35
from .loggable import Loggable
46
from .pathable import Pathable
57
from .serialisable import Serialisable
68
from .taggable import Taggable
79

810

9-
__all__ = "Identifiable", "Loggable", "MixinBase", "Pathable", "Serialisable", "Taggable"
11+
__all__ = ("Filterable", "Hashable", "Identifiable", "Loggable", "MixinBase", "Pathable", "Serialisable", "Taggable")

0 commit comments

Comments
 (0)