Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
204 commits
Select commit Hold shift + click to select a range
2942b33
REF: Group magic methods at top of TagGroup
cortadocodes Dec 16, 2020
31a1139
IMP: Add ability to yield subtags and search them
cortadocodes Dec 16, 2020
9ab5bea
FIX: Fix and include subtags in __contains__ method of TagGroup
cortadocodes Dec 16, 2020
a6afb6a
MRG: Merge remote-tracking branch 'origin/main' into feature/search-f…
cortadocodes Dec 18, 2020
4ccd001
REV: Revert TagGroup __contains__ behaviour
cortadocodes Dec 18, 2020
21651ea
CLN: Use unittest assertions
cortadocodes Dec 18, 2020
1f20e4a
TST: Test endswith and startswith on TagGroup
cortadocodes Dec 18, 2020
b622be4
REF: Factor out repeated tag group
cortadocodes Dec 18, 2020
49b72b2
IMP: Add ability to filter tags on TagGroup
cortadocodes Dec 18, 2020
f78f725
IMP: Add ability to filter and compare TagGroups
cortadocodes Dec 18, 2020
8e3e7e3
IMP: Make TagGroup iterable; replace get_tags with filter method
cortadocodes Dec 18, 2020
e6b4e31
TST: Split filter test; add docstrings
cortadocodes Dec 18, 2020
89fae9d
FIX: Fix exact TagGroup filter
cortadocodes Dec 18, 2020
e0cbd26
CLN: Reorder methods and tests
cortadocodes Dec 18, 2020
f9f2da4
TST: Test new TagGroup magic methods
cortadocodes Dec 21, 2020
ca9a0d2
IMP: Use set to store tags; swap __contains__ for has_tag
cortadocodes Dec 21, 2020
e7cd4bd
FIX: Use set logic instead of list logic
cortadocodes Dec 21, 2020
3a531f9
TST: Update test
cortadocodes Dec 21, 2020
0d8ace3
REF: Make TagGroup.__iter__ a generator
cortadocodes Dec 21, 2020
24e67bb
REF: Use set not list
cortadocodes Dec 21, 2020
fdd1e2d
CLN: Fix typo; combine lines
cortadocodes Dec 21, 2020
0b84748
CLN: Reduce test code repetition
cortadocodes Dec 21, 2020
ddf3b9f
IMP: Add hash of input and config strands to Analysis
cortadocodes Dec 21, 2020
cc602f3
TST: Test hashing datafile and dataset
cortadocodes Dec 21, 2020
facc90c
TST: Test hashes and combined hashes are lenght 64
cortadocodes Dec 22, 2020
73087e4
IMP: Use BLAKE2b hash in Manifest
cortadocodes Dec 22, 2020
cd9af87
REF: Move test helper method to BaseTestCase
cortadocodes Dec 22, 2020
d647912
TST: Test hashing manifests
cortadocodes Dec 22, 2020
fe5c67c
IMP: Replace last usage of SHA254 with BLAKE2b
cortadocodes Dec 22, 2020
31d7b1d
TST: Update test
cortadocodes Dec 22, 2020
43e5ebc
TST: Test Analysis hash attributes are None if no strands
cortadocodes Dec 22, 2020
78294b1
FIX: Fix Analysis input value hashing
cortadocodes Dec 22, 2020
2dd1c42
TST: Test that hashes are stored on Analysis
cortadocodes Dec 22, 2020
aba4726
CLN: Avoid overriding hash builtin
cortadocodes Dec 22, 2020
cf4436b
TST: Test hashing JSON objects
cortadocodes Dec 22, 2020
c3e200a
DOC: Fix typo
cortadocodes Dec 22, 2020
b9d1f98
IMP: Cache hashes to avoid recomputation
cortadocodes Dec 22, 2020
9a768aa
FIX: Make caching hash compatible with python3.6
cortadocodes Dec 22, 2020
81d8d0e
FIX: Ensure hash of Datafiles and Datasets is order-independent
cortadocodes Dec 22, 2020
c350538
IMP: Include important Datafile attributes in hash
cortadocodes Dec 22, 2020
ac5283f
IMP: Add tags to hash of Dataset
cortadocodes Dec 22, 2020
ac01945
IMP: Include keys in Manifest hash
cortadocodes Dec 22, 2020
8d33fab
IMP: Use BLAKE3 instead of BLAKE2
cortadocodes Dec 22, 2020
b16184c
IMP: Create Hashable mixin and use for DataSet
cortadocodes Dec 22, 2020
5d4425a
FIX: Handle other basic datatypes in Hashable
cortadocodes Dec 22, 2020
09b99a8
REF: Use Hashable in Datafile
cortadocodes Dec 22, 2020
f839074
REF: Use Hashable in Manifest
cortadocodes Dec 22, 2020
f02f5fb
REF: Simplify Hashable
cortadocodes Dec 22, 2020
3295d71
FIX: Handle hashing classes with no attributes to hash
cortadocodes Dec 22, 2020
d0cfcd5
CHO: Change size of LRU cache on hashing to 1
cortadocodes Dec 22, 2020
4116e91
REF: Replace hash_json with Hashable classmethod
cortadocodes Dec 22, 2020
5e00485
FIX: Hash attributes in alphabetical order
cortadocodes Dec 22, 2020
e9c9493
REF: Rename methods
cortadocodes Dec 22, 2020
0d6fe88
CLN: Use consistent variable naming
cortadocodes Dec 22, 2020
cd4f12e
TST: Test Hashable
cortadocodes Dec 22, 2020
25258d6
TST: Fix and simplify tests
cortadocodes Dec 22, 2020
fd618ed
DOC: Document tests
cortadocodes Dec 22, 2020
02459b5
TST: Test hashes are the same for the copies of an object
cortadocodes Dec 22, 2020
8241482
DOC: Document hash attributes on Analysis
cortadocodes Dec 22, 2020
aa06d58
IMP: Add name to Dataset and include it in its hash
cortadocodes Dec 22, 2020
741145f
OPS: Move release version GitHub check into python-ci checks
cortadocodes Dec 22, 2020
65ccea9
Merge pull request #54 from octue/devops/move-release-version-check-i…
thclark Dec 23, 2020
2ab8dd5
MRG: Merge remote-tracking branch 'origin/release/0.1.7' into feature…
cortadocodes Dec 23, 2020
d7d0bce
DOC: Replace duplicate item with missing item
cortadocodes Dec 23, 2020
cd7f58e
REF: Allow hash function to be changed without breaking change
cortadocodes Dec 23, 2020
0e55127
REF: Use screaming snake case for class constants
cortadocodes Dec 23, 2020
cfd9d7b
CLN: Make argument and attribute name consistent
cortadocodes Dec 23, 2020
dd32055
DOC: Add more documentation to Hashable
cortadocodes Dec 23, 2020
b396cc5
Merge pull request #52 from octue/feature/hash-input-data
thclark Dec 23, 2020
d2859ef
MRG: Merge remote-tracking branch 'origin/release/0.1.7' into feature…
cortadocodes Dec 23, 2020
0b8e3fb
IMP: Serialise TagGroups to order list string
cortadocodes Dec 23, 2020
2d8de13
REF: Use snakecase for startswith and endswith filters
cortadocodes Dec 23, 2020
b7a7cd4
WIP: Add outline of Filterable mixin
cortadocodes Dec 23, 2020
cb8aa5f
IMP: Raise error if Filterable subclass doesn't specify attributes to…
cortadocodes Dec 23, 2020
eb7674e
TST: Make _get_nested_attribute method test independent of Filterable
cortadocodes Dec 23, 2020
d89b753
IMP: Build filters based on attributes to filter by
cortadocodes Dec 23, 2020
4cb8b4b
TST: Ensure tests don't leak
cortadocodes Dec 23, 2020
1f11428
REF: Factor out filter building into method
cortadocodes Dec 23, 2020
3a97221
IMP: Add filtering of attributes to Filterable
cortadocodes Dec 23, 2020
cc10777
FIX: Make attribute name and filter name differ
cortadocodes Dec 23, 2020
7fd6c86
REF: Use Filterable in TagGroup
cortadocodes Dec 23, 2020
b521fc7
TST: Update filter names in TagGroup tests
cortadocodes Dec 23, 2020
4d798e0
IMP: Allow conversion of FilteredSet to desired object
cortadocodes Dec 23, 2020
aa3e2cd
IMP: Allow Filterables to filter indefinitely
cortadocodes Dec 23, 2020
1e9e536
REF: Return Filterable inheriting class instance when filtering
cortadocodes Dec 28, 2020
27c8199
IMP: Combine base filters with ones provided
cortadocodes Dec 28, 2020
72db59c
REF: Move definition of TagGroup filters into a method
cortadocodes Dec 28, 2020
65f0e6b
FIX: Include other attributes when instantiating after filtering
cortadocodes Dec 28, 2020
21d61ef
CHO: Add tox.ini back to repo
cortadocodes Dec 28, 2020
0dd55a9
REF: Use Filterable in Dataset
cortadocodes Dec 28, 2020
8400eac
FIX: Add __len__ method to Dataset
cortadocodes Dec 28, 2020
e4bb808
FIX: Update get_files_by_tag; use more specific error
cortadocodes Dec 28, 2020
2e43388
FIX: Ensure Datafile is returned from Dataset.get_file_by_tag
cortadocodes Dec 28, 2020
cf1fd53
CHO: Remove accidentally added file
cortadocodes Dec 28, 2020
d36ba05
FIX: Ensure Pathable constructs path prefix properly
cortadocodes Dec 28, 2020
e6c77e3
IMP: Return None for disk properties of an unsaved Datafile
cortadocodes Dec 28, 2020
bbdab43
REF: Rename attribute
cortadocodes Dec 28, 2020
905acf8
TST: Test filtering by multiple attributes
cortadocodes Dec 28, 2020
8f81ae6
FIX: Fix exact base filter
cortadocodes Dec 28, 2020
76bfbb5
TST: Rename test local variable
cortadocodes Dec 28, 2020
b9105dc
DOC: Document Filterable methods
cortadocodes Dec 28, 2020
055dbbe
IMP: Add outline of Filteree mixin
cortadocodes Dec 29, 2020
70c3024
REF: Make Datafile a Filteree
cortadocodes Dec 29, 2020
832c35b
WIP: Add FilterSet and use for files in Datafile
cortadocodes Dec 29, 2020
0f1f62d
REF: Rename test module
cortadocodes Dec 29, 2020
c148f31
IMP: Add filter method to FilterSet
cortadocodes Dec 29, 2020
7be9440
IMP: Generalise from sets to iterables in Filteree
cortadocodes Dec 29, 2020
47e251c
IMP: Support filtering files by tags
cortadocodes Dec 29, 2020
d31c9a9
IMP: Handle incorrect filter syntax
cortadocodes Dec 29, 2020
21b4ddd
IMP: Add filtering for numbers; add (not-)None filters for non-None a…
cortadocodes Dec 29, 2020
58e8a5b
FIX: Filter on files, not dataset directly
cortadocodes Dec 29, 2020
256d877
IMP: Make FilterSets comparable
cortadocodes Dec 29, 2020
1cc5b65
TST: Update Filteree tests
cortadocodes Dec 29, 2020
a54fafd
FIX: Update template filter syntax
cortadocodes Dec 29, 2020
66d8550
REF: Make FilterSet inherit from set
cortadocodes Dec 29, 2020
e5ba95d
FIX: Fix representation of FilterSet
cortadocodes Dec 29, 2020
dd8538e
REF: Rename "field_lookup" to "filter_name" in Dataset methods
cortadocodes Dec 29, 2020
8a5c662
IMP: Add FilterList class
cortadocodes Dec 29, 2020
6b1c7ee
REF: Use FilterSet in TagGroup; create Tag class
cortadocodes Dec 29, 2020
457583a
WIP: Update and standardise Tag and TagGroup methods
cortadocodes Dec 29, 2020
3ad5d72
IMP: Allow Tags to be compared alphabetically
cortadocodes Dec 30, 2020
95f16c9
TST: Update taggable tests
cortadocodes Dec 30, 2020
2d445df
REF: Simplify definition of FilterSet and FilterList
cortadocodes Dec 30, 2020
3457933
IMP: Allow str <> Tag comparison
cortadocodes Dec 30, 2020
22306a7
TST: Fix broken import
cortadocodes Dec 30, 2020
37714e6
CLN: Remove unneeded tags arguments
cortadocodes Dec 30, 2020
156738f
REF: Clarify method name
cortadocodes Dec 30, 2020
087c000
REV: Remove Filterable class
cortadocodes Dec 30, 2020
7e3cba7
REF: Rename filterset module
cortadocodes Dec 30, 2020
133aa47
REF: Move Tag and TagGroup into resources package
cortadocodes Dec 30, 2020
0af151c
REF: Rename Dataset.append to Dataset.add
cortadocodes Dec 30, 2020
4705f51
IMP: Improve error handling in Filteree
cortadocodes Dec 30, 2020
75ae1a1
TST: Test Filteree error handling
cortadocodes Dec 30, 2020
e1eddb3
REF: Rename Filteree.check_attribute method
cortadocodes Dec 30, 2020
ed3f84a
REF: Rename FilterSet main method; add documentation
cortadocodes Dec 30, 2020
a6746d7
REF: Rename None filter
cortadocodes Dec 30, 2020
67b80fd
TST: Deepen Filtree tests and reduce code
cortadocodes Dec 30, 2020
da6e6ba
TST: Ensure each filterable type has None filters available
cortadocodes Dec 30, 2020
0ef1986
REF: Combine None filter dictionaries
cortadocodes Dec 30, 2020
47597f9
REF: Combine is and None filters
cortadocodes Dec 30, 2020
facdd65
CLN: Use less-than and greater-than filter names consistent with python
cortadocodes Dec 30, 2020
ff2619b
TST: Test filters
cortadocodes Dec 30, 2020
fec5ed5
IMP: Enable filtering of TagGroups
cortadocodes Dec 30, 2020
3c7ddc3
FIX: Fix construction of a TagGroup from an iterable
cortadocodes Dec 30, 2020
7f8cc2a
TST: Test TagGroup instantiation scenarios
cortadocodes Dec 30, 2020
9a48578
REV: Remove unused/duplicate method from Tag
cortadocodes Dec 30, 2020
59928ea
REF: Factor out contains filter from string filters
cortadocodes Dec 30, 2020
83387e0
REF: Return a Tag's subtags as a TagGroup
cortadocodes Dec 30, 2020
611867e
TST: Test Tag
cortadocodes Dec 30, 2020
87a30f6
REV: Remove TagGroup._yield_subtags method
cortadocodes Dec 30, 2020
c3afa6e
REF: Remove unnecessary subtag argument from TagGroup methods
cortadocodes Dec 30, 2020
ae257f0
REF: Factor out equals filter action
cortadocodes Dec 30, 2020
a30df5a
DOC: Update Tag docstrings
cortadocodes Dec 30, 2020
4ee27ae
REV: Revert existence checks in Datafile properties
cortadocodes Dec 30, 2020
6c554df
REF: Simplify Tag comparison methods
cortadocodes Dec 30, 2020
bc03bfb
REF: Simplify iterating over TagGroup
cortadocodes Dec 30, 2020
b5a6f66
REV: Remove redundant _FILTERABLE_ATTRIBUTES constant from Filteree
cortadocodes Dec 30, 2020
fa54d12
CHO: Remove accidentally committed file
cortadocodes Dec 30, 2020
aeacf35
CHO: Restore accidentally removed file
cortadocodes Dec 30, 2020
176a407
TST: Update tests
cortadocodes Dec 30, 2020
3135d46
REF: Re-add but deprecate Dataset.get_files method
cortadocodes Dec 30, 2020
cd91578
REF: Re-add and deprecate Dataset.append method
cortadocodes Dec 30, 2020
14683d2
REF: Rename Filteree to Filterable
cortadocodes Jan 4, 2021
e70f650
DOC: Fix typo in method name
cortadocodes Jan 4, 2021
54c6675
TST: Test comparing Tags with strings
cortadocodes Jan 4, 2021
9fb75eb
FIX: Raise standard error if using < or > between Tags and non-str types
cortadocodes Jan 4, 2021
d5a9399
TST: Test uncovered code in Dataset and Tag
cortadocodes Jan 4, 2021
0f83bef
REF: Rename Filterable test module
cortadocodes Jan 4, 2021
d80110c
MRG: Merge pull request #50 from octue/feature/search-for-subtags
cortadocodes Jan 4, 2021
0dd78cd
IMP: Add order_by method to FilterSet and FilterList
cortadocodes Jan 4, 2021
9bcb629
IMP: Provide better error message for non-existent attribute when fil…
cortadocodes Jan 4, 2021
ac1098d
DOC: Clarify that a new FilterList is returned by order_by
cortadocodes Jan 4, 2021
f3e45f8
TST: Test ordering by list attributes
cortadocodes Jan 4, 2021
e14d0e4
IMP: Add ability to reverse ordering direction
cortadocodes Jan 4, 2021
fb68d06
TST: Change test order
cortadocodes Jan 4, 2021
c12ab59
TST: Change test name; simplify test object
cortadocodes Jan 4, 2021
a3f9e30
REF: Rename TagGroup to TagSet
cortadocodes Jan 4, 2021
6f5046b
MRG: Merge pull request #62 from octue/refactor/rename-tag-group-to-t…
cortadocodes Jan 4, 2021
d440b25
IMP: Add more string comparison filters
cortadocodes Jan 4, 2021
0ef5fff
MRG: Merge pull request #63 from octue/feature/extend-str-filters
cortadocodes Jan 4, 2021
69a327e
IMP: Add representation of Datafile
cortadocodes Jan 4, 2021
7e768ec
DOC: Add documentation on Dataset
cortadocodes Jan 4, 2021
caa47d6
DOC: Document filter and order_by filter container methods
cortadocodes Jan 4, 2021
59a5da2
DOC: Add filter containers to documentation TOC
cortadocodes Jan 4, 2021
1ee0cbf
DOC: Fix RST inline code syntax
cortadocodes Jan 4, 2021
04137eb
DOC: Fix filter containers documentation link
cortadocodes Jan 4, 2021
809aafc
DOC: Add brief documentation on Datafile
cortadocodes Jan 4, 2021
d34b478
REF: Clarify TagSet interface; add extra filter
cortadocodes Jan 4, 2021
2d4dcb3
TST: Test new TagSet filter
cortadocodes Jan 4, 2021
97eb7fd
TST: Test TagSet contains method with Tag
cortadocodes Jan 4, 2021
5c71467
IMP: Add not version of all filters where appropriate
cortadocodes Jan 4, 2021
527f4e8
MRG: Merge pull request #66 from octue/feature/add-not-version-of-fil…
cortadocodes Jan 5, 2021
744c6b7
DOC: Add new filters to documentation
cortadocodes Jan 5, 2021
173e30b
MRG: Merge remote-tracking branch 'origin/release/0.1.7' into refacto…
cortadocodes Jan 5, 2021
6128ef8
MRG: Merge pull request #65 from octue/refactor/clarify-tag-set-inter…
cortadocodes Jan 5, 2021
100eab7
MRG: Merge remote-tracking branch 'origin/release/0.1.7' into doc/doc…
cortadocodes Jan 5, 2021
0a98379
DOC: Update TagSet filter names
cortadocodes Jan 5, 2021
6545ff1
REF: Use AttributError in Filterable
cortadocodes Jan 5, 2021
9a036ed
MRG: Merge pull request #61 from octue/feature/add-extra-methods-to-f…
cortadocodes Jan 5, 2021
e662e3e
DOC: Use python code-blocks in docs; add to Datafile docs
cortadocodes Jan 5, 2021
9697dc3
DOC: Add FilterSet use cases to docs
cortadocodes Jan 5, 2021
84b3ec7
DOC: Bring plurals inside preformatted RST text
cortadocodes Jan 5, 2021
a6c5e1f
MRG: Merge pull request #64 from octue/doc/document-filtering-of-data…
cortadocodes Jan 5, 2021
c2f6ff7
OPS: Increase setup.py patch version number
cortadocodes Jan 5, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 0 additions & 9 deletions .github/workflows/check-version-consistency.yml

This file was deleted.

8 changes: 8 additions & 0 deletions .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@ name: python-ci
on: [push]

jobs:

check-version-consistency:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- run: python .github/workflows/scripts/check-version-consistency.py

tests:
runs-on: ubuntu-latest
env:
Expand Down
44 changes: 44 additions & 0 deletions docs/source/analysis_objects.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
.. _analysis_objects:

================
Analysis objects
================

An ``Analysis`` object is the sole argument to the ``app`` function in your ``app.py`` module. Its attributes include
every strand that can be possibly added to a ``Twine``, although only the strands specified in your ``twine.py`` file
will not be ``None``. The attributes are:

- ``input_values``
- ``input_manifest``
- ``configuration_values``
- ``configuration_manifest``
- ``output_values``
- ``output_manifest``
- ``credentials``
- ``children``
- ``monitors``

Additionally, all input and configuration attributes are hashed using a
`BLAKE3 hash <https://github.com/BLAKE3-team/BLAKE3>`_ so the inputs and configuration that produced a given output in
your app can always be verified. These hashes exist on the following attributes:

- ``input_values_hash``
- ``input_manifest_hash``
- ``configuration_values_hash``
- ``configuration_manifest_hash``

If an input or configuration attribute is ``None``, so will its hash attribute be. For ``Manifests``, some metadata
about the ``Datafiles`` and ``Datasets`` within them, and about the ``Manifest`` itself, is included when calculating
the hash:

- For a ``Datafile``, the content of its on-disk file is hashed, along with the following metadata:

- ``name``
- ``cluster``
- ``sequence``
- ``posix_timestamp``
- ``tags``

- For a ``Dataset``, the hashes of its ``Datafiles`` are included, along with its ``tags``.

- For a ``Manifest``, the hashes of its ``Datasets`` are included, along with its ``keys``.
14 changes: 14 additions & 0 deletions docs/source/datafile.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.. _datafile:

========
Datafile
========

A ``Datafile`` is an Octue type that corresponds to a file, which may exist on your computer or in a cloud store. It has
the following main attributes:

- ``path`` - the path of this file, which may include folders or subfolders, within the dataset.
- ``cluster`` - the integer cluster of files, within a dataset, to which this belongs (default 0)
- ``sequence`` - a sequence number of this file within its cluster (if sequences are appropriate)
- ``tags`` - a space-separated string or iterable of tags relevant to this file
- ``posix_timestamp`` - a posix timestamp associated with the file, in seconds since epoch, typically when it was created but could relate to a relevant time point for the data
44 changes: 44 additions & 0 deletions docs/source/dataset.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
.. _dataset:

=======
Dataset
=======

A ``Dataset`` contains any number of ``Datafiles`` along with the following metadata:

- ``name``
- ``tags``

The files are stored in a ``FilterSet``, meaning they can be easily filtered according to any attribute of the
`Datafile <datafile.rst>`_ instances it contains.


--------------------------------
Filtering files in a ``Dataset``
--------------------------------

You can filter a ``Dataset``'s files as follows:

.. code-block:: python
dataset = Dataset(
files=[
Datafile(path="path-within-dataset/my_file.csv", tags="one a:2 b:3 all"),
Datafile(path="path-within-dataset/your_file.txt", tags="two a:2 b:3 all"),
Datafile(path="path-within-dataset/another_file.csv", tags="three all"),
]
)

dataset.files.filter(filter_name="name__ends_with", filter_value=".csv")
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>

dataset.files.filter("tags__contains", filter_value="a:2")
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('your_file.txt')>})>

You can also chain filters indefinitely:

.. code-block:: python
dataset.files.filter(filter_name="name__ends_with", filter_value=".csv").filter("tags__contains", filter_value="a:2")
>>> <FilterSet({<Datafile('my_file.csv')>})>

Find out more about ``FilterSets`` `here <filterset.rst>`_, including all the possible filters available for each type of object stored on
an attribute of a ``FilterSet`` member, and how to convert them to primitive types such as ``set`` or ``list``.
127 changes: 127 additions & 0 deletions docs/source/filter_containers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
.. _filter_containers:

=================
Filter containers
=================

A filter container is just a regular python container that has some extra methods for filtering or ordering its
elements. It has the same interface (i.e. attributes and methods) as the primitive python type it inherits from, with
these extra methods:

- ``filter``
- ``order_by``

There are two types of filter containers currently implemented:

- ``FilterSet``
- ``FilterList``

``FilterSets`` are currently used in:

- ``Dataset.files`` to store ``Datafiles``
- ``TagSet.tags`` to store ``Tags``

You can see filtering in action on the files of a ``Dataset`` `here <dataset.rst>`_.


---------
Filtering
---------

Filters are named as ``"<name_of_attribute_to_check>__<filter_action>"``, and any attribute of a member of the
``FilterSet`` whose type or interface is supported can be filtered.
.. code-block:: python
filter_set = FilterSet(
{Datafile(path="my_file.csv"), Datafile(path="your_file.txt"), Datafile(path="another_file.csv")}
)

filter_set.filter(filter_name="name__ends_with", filter_value=".csv")
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>

The following filters are implemented for the following types:

- ``bool``:

* ``is``
* ``is_not``

- ``str``:

* ``is``
* ``is_not``
* ``equals``
* ``not_equals``
* ``iequals``
* ``not_iequals``
* ``lt`` (less than)
* ``lte`` (less than or equal)
* ``gt`` (greater than)
* ``gte`` (greater than or equal)
* ``contains``
* ``not_contains``
* ``icontains`` (case-insensitive contains)
* ``not_icontains``
* ``starts_with``
* ``not_starts_with``
* ``ends_with``
* ``not_ends_with``

- ``NoneType``:

* ``is``
* ``is_not``

- ``TagSet``:

* ``is``
* ``is_not``
* ``equals``
* ``not_equals``
* ``any_tag_contains``
* ``not_any_tag_contains``
* ``any_tag_starts_with``
* ``not_any_tag_starts_with``
* ``any_tag_ends_with``
* ``not_any_tag_ends_with``



Additionally, these filters are defined for the following *interfaces* (duck-types). :

- Numbers:

* ``is``
* ``is_not``
* ``equals``
* ``not_equals``
* ``lt``
* ``lte``
* ``gt``
* ``gte``

- Iterables:

* ``is``
* ``is_not``
* ``equals``
* ``not_equals``
* ``contains``
* ``not_contains``
* ``icontains``
* ``not_icontains``

The interface filters are only used if the type of the attribute of the element being filtered is not found in the first
list of filters.

--------
Ordering
--------
As sets are inherently orderless, ordering a ``FilterSet`` results in a new ``FilterList``, which has the same extra
methods and behaviour as a ``FilterSet``, but is based on the ``list`` type instead - meaning it can be ordered and
indexed etc. A ``FilterSet`` or ``FilterList`` can be ordered by any of the attributes of its members:
.. code-block:: python
filter_set.order_by("name")
>>> <FilterList([<Datafile('another_file.csv')>, <Datafile('my_file.csv')>, <Datafile(path="your_file.txt")>])>

The ordering can also be carried out in reverse (i.e. descending order) by passing ``reverse=True`` as a second argument
to the ``order_by`` method.
4 changes: 4 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ Not all of Octue's API functionality is implemented in the SDK yet, we're active
:hidden:

installation
datafile
dataset
filter_containers
analysis_objects
license
version_history
bibliography
Expand Down
4 changes: 3 additions & 1 deletion octue/mixins/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
from .base import MixinBase
from .filterable import Filterable
from .hashable import Hashable
from .identifiable import Identifiable
from .loggable import Loggable
from .pathable import Pathable
from .serialisable import Serialisable
from .taggable import Taggable


__all__ = "Identifiable", "Loggable", "MixinBase", "Pathable", "Serialisable", "Taggable"
__all__ = ("Filterable", "Hashable", "Identifiable", "Loggable", "MixinBase", "Pathable", "Serialisable", "Taggable")
Loading