Skip to content
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
79762b7
Fix code spacing format issues
nfahlgren Apr 3, 2022
f6662fd
Add rotation term and rename barcode
nfahlgren Apr 15, 2022
91d2636
Change default timestamp format to ISO8601 UTC
nfahlgren Apr 15, 2022
bea622d
Replace metadata_parser
nfahlgren Apr 15, 2022
7a71e5b
Remove outdated functions
nfahlgren Apr 15, 2022
9695302
Use correct class attribute
nfahlgren Apr 16, 2022
f259fa7
Add phenodata test data
nfahlgren Apr 16, 2022
f54db50
Update metadata_parser tests
nfahlgren Apr 16, 2022
7b97db4
Add method to group metadata
nfahlgren Apr 19, 2022
0d2a55f
Update job_builder to handle grouped metadata
nfahlgren Apr 19, 2022
a2640c4
Prettify JSON output
nfahlgren Apr 20, 2022
f4c855c
Convert datetime to string before serialization
nfahlgren Apr 20, 2022
6b1c2c3
Pretty print JSON output
nfahlgren Apr 22, 2022
283150a
Update test data to match new functions
nfahlgren Apr 23, 2022
be943aa
Add new workflow argument module
nfahlgren Apr 25, 2022
10169f9
Update tests with dataframe-based metadata
nfahlgren Apr 25, 2022
fb53dd9
Add tests module for WorkflowInputs
nfahlgren Apr 26, 2022
4793a32
Lowercase all image names
nfahlgren Apr 26, 2022
ddeeb4b
Add missing names generator to job_builder
nfahlgren Apr 26, 2022
fa88a08
Check-in updated json2csv
nfahlgren Apr 26, 2022
054d61a
Simplify json2csv
nfahlgren Apr 26, 2022
c4e4509
Merge branch '4.x' into revise-parallel-parsers
nfahlgren Apr 26, 2022
a3a81d4
Fix test assertion
nfahlgren Apr 27, 2022
958414c
Update package install method
nfahlgren Apr 27, 2022
4af58bd
Add test for workflow_inputs
nfahlgren Apr 27, 2022
b78fbfc
Remove redundant import
nfahlgren Apr 27, 2022
84b3419
Remove redundant list comprehension
nfahlgren Apr 27, 2022
6c03469
Add auto naming method to job_builder
nfahlgren Apr 27, 2022
436671c
Simplify if expression
nfahlgren Apr 27, 2022
910f823
One-line docstring should be on one line
nfahlgren Apr 27, 2022
1655f38
Add line after class docstring
nfahlgren Apr 27, 2022
0ca7caa
Deprecate command-line options
nfahlgren Apr 27, 2022
29a4c20
Update csv keyword description
nfahlgren Apr 27, 2022
5ac24ff
Update docs for json2csv converter
nfahlgren Apr 27, 2022
b316fb7
Update WorkflowConfig docs
nfahlgren Apr 27, 2022
10af283
Update metadata_parser docs
nfahlgren Apr 27, 2022
734ca1d
Update input description to DataFrame
nfahlgren Apr 27, 2022
d917258
Update process_results docs
nfahlgren Apr 27, 2022
36b3ba1
Update Jupyter docs
nfahlgren Apr 27, 2022
89e2f94
Update the parallelization docs
nfahlgren Apr 27, 2022
982d287
Merge branch '4.x' into revise-parallel-parsers
HaleySchuhl May 23, 2022
9e9e51c
Merge branch '4.x' into revise-parallel-parsers
nfahlgren Jun 10, 2022
ffefef6
Merge branch '4.x' into revise-parallel-parsers
HaleySchuhl Jun 14, 2022
6b850c2
Merge branch '4.x' into revise-parallel-parsers
HaleySchuhl Jun 17, 2022
44e96bd
remove "python" from parallelization command line
HaleySchuhl Jun 27, 2022
6f83fc2
Limit numpy version
nfahlgren Jun 27, 2022
670f4c0
Add additional developer tools to conda env
nfahlgren Jun 28, 2022
d978c49
Change other_args from list to dict
nfahlgren Jun 28, 2022
67f3440
Apply user-defined keyword arguments to workflows
nfahlgren Jun 28, 2022
fab651e
Add args and kwargs to workflow input class and function
nfahlgren Jun 29, 2022
d83f21a
Add docs page for workflow inputs
nfahlgren Jun 29, 2022
9eed00e
Reference new workflow inputs doc page
nfahlgren Jun 29, 2022
55f05b1
Update test for coverage
nfahlgren Jun 29, 2022
dc63a76
Convert to regular comment
nfahlgren Jun 29, 2022
f42c93b
Add missing docstrings
nfahlgren Jun 29, 2022
1865763
Catch explicit error
nfahlgren Jun 29, 2022
1f13824
Update doc example with two custom inputs
nfahlgren Jun 30, 2022
6c88106
Merge branch '4.x' into revise-parallel-parsers
HaleySchuhl Jul 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/continuous-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ jobs:
- name: Test and generate coverage report
# Run coverage analysis on pytest tests
run: |
python setup.py install
pip install .
py.test --cov-report=xml --cov=plantcv tests/
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
Expand Down
177 changes: 51 additions & 126 deletions docs/jupyter.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,39 +21,34 @@ can be visualized instantly within the notebook.

![Screenshot](img/documentation_images/jupyter/jupyter_screenshot.jpg)

PlantCV is automatically set up to run in Jupyter Notebook but there
are a couple of considerations. Jupyter must be opened within the PlantCV
environment. For example, launch Jupyter from the command line from within
a PlantCV environment with `jupyter notebook`, launch Jupyter from the
Anaconda Navigator (if installed with conda) from within the PlantCV environment,
etc.
PlantCV is automatically set up to run in Jupyter Notebook but you will need to install Jupyter.
For example, with `conda`:

First, if PlantCV is installed in the global Python search path, you can
import the PlantCV library like normal:

```python
from plantcv import plantcv as pcv
```bash
conda install nb_conda jupyterlab
```

On the other hand, if you installed PlantCV into a local Python path,
you will need to configure the Jupyter Python kernel to find it. For
example:

```python
import sys
sys.path.append("/home/user/plantcv")
from plantcv import plantcv as pcv
```
Then you can launch Jupyter from the command line `jupyter lab` and create a notebook using
a kernel containing your PlantCV environment.

Second, we use [matplotlib](http://matplotlib.org/) to do the
First, we use [matplotlib](http://matplotlib.org/) to do the
in-notebook plotting. To make this work, add the following to the top
of your notebook:

```python
%matplotlib inline
```

Third, PlantCV has a built-in debug mode that is set to `None` by
Second, you can import the PlantCV library like normal:

```python
from plantcv import plantcv as pcv
```

Third (optionally), utilize PlantCV's `WorkflowInputs` class to organize and name workflow
inputs for compatibility with running the workflow later in parallel.

PlantCV has a built-in debug mode that is set to `None` by
default. Setting debug to `"print"` will cause PlantCV to print debug
images to files, which is the original debug method. In Jupyter, setting
debug to `"plot"` will cause PlantCV to plot debug images directly into
Expand All @@ -64,148 +59,78 @@ would look like the following example:

```python
%matplotlib inline
import os
import sys
sys.path.append('/home/user/plantcv')
import numpy as np
import cv2
from matplotlib import pyplot as plt
from plantcv import plantcv as pcv
from plantcv.parallel import WorkflowInputs

# Set input variables
args = WorkflowInputs(images=["./input_color_img.jpg"],
names="image",
result="plantcv_results.csv",
debug="plot")

# Set variables
pcv.params.debug = 'plot' # Plot debug images to the notebook
img_file = 'input_color_img.jpg' # Example image
pcv.params.debug = args.debug

```

Not all of these imports are required, this is just to demonstrate that
in addition to importing PlantCV you can import any other useful Python
packages as well.

### Converting Jupyter Notebooks to PlantCV workflow scripts

Once a workflow has been developed, it needs to be converted into a pure
Python script if the goal is to use it on many images using the PlantCV
workflow [parallelization](pipeline_parallel.md) tools. To make a
Python script that is compatible with the `plantcv-workflow.py` program,
first use Jupyter to convert the notebook to Python. This can be done
through the web interface, or on the command line:
through the web interface (File > Save and Export Notebook As... > Executable Script),
or on the command line:

```
```bash
jupyter nbconvert --to python notebook.ipynb
```

The resulting Python script will be named `notebook.py` in the example
above. Next, open the Python script with a text editor. Several
modifications to the script are needed. Modify the list of imported
packages as needed, but in particular, remove
`get_ipython().magic('matplotlib inline')` and add `import argparse`.
If PlantCV is importable in your normal shell environment, you can
remove `sys.path.append('/home/user/plantcv')` also.
`get_ipython().magic('matplotlib inline')`. Change `from plantcv.parallel import WorkflowInputs`
to `from plantcv.parallel import workflow_inputs`.

All of the remaining script (other than the imports) needs to be added
to a function called `main`. To do this, add a main function and indent
the remaining code within main, for example:
Change the code for managing inputs, for example:

```python
def main():

# all the code from Jupyter

if __name__ == '__main__':
main()

args = WorkflowInputs(images=["./input_color_img.jpg"],
names="image",
result="plantcv_results.csv",
debug="plot")
```

Add a function for parsing command line options using [argparse](https://docs.python.org/2.7/library/argparse.html).
The `plantcv-workflow.py` script requires a few command-line arguments for
workflow scripts to work properly. If the script analyzes a single image
the options minimally should look like the following:
To:

```python
def options():
parser = argparse.ArgumentParser(description="Imaging processing with PlantCV.")
parser.add_argument("-i", "--image", help="Input image file.", required=True)
parser.add_argument("-r","--result", help="Result file.", required= True )
parser.add_argument("-o", "--outdir", help="Output directory for image files.", required=False)
parser.add_argument("-w","--writeimg", help="Write out images.", default=False, action="store_true")
parser.add_argument("-D", "--debug", help="Turn on debug, prints intermediate images.")
args = parser.parse_args()
return args

```

If the script analyzes two images using co-processing, the options
should minimally should looks like the following:

```python
def options():
parser = argparse.ArgumentParser(description="Imaging processing with opencv")
parser.add_argument("-i", "--image", help="Input image file.", required=True)
parser.add_argument("-r","--result", help="Result file.", required=True )
parser.add_argument("-r2","--coresult", help="Result file for co-processed image.", required=True )
parser.add_argument("-o", "--outdir", help="Output directory for image files.", required=False)
parser.add_argument("-w","--writeimg", help="Write out images.", default=False, action="store_true")
parser.add_argument("-D", "--debug", help="Turn on debug, prints intermediate images.")
args = parser.parse_args()
return args

```

Within the `main` function, call the `options` function to get the
values of the command-line options. Swap any hard-coded values with
the argument values instead:

```python
def main():
# Get options
args = options()

# Set variables
pcv.params.debug = args.debug # Replace the hard-coded debug with the debug flag
img_file = args.image # Replace the hard-coded input image with image flag

args = workflow_inputs()
```

Make any other alterations as necessary after testing. Based on the
simple Jupyter Notebook example above, the fully modified version would
look like the following:

```python
import os
import sys
import numpy as np
import cv2
from matplotlib import pyplot as plt
from plantcv import plantcv as pcv
from plantcv.parallel import workflow_inputs

def options():
parser = argparse.ArgumentParser(description="Imaging processing with PlantCV.")
parser.add_argument("-i", "--image", help="Input image file.", required=True)
parser.add_argument("-r","--result", help="Result file.", required= True )
parser.add_argument("-o", "--outdir", help="Output directory for image files.", required=False)
parser.add_argument("-w","--writeimg", help="Write out images.", default=False, action="store_true")
parser.add_argument("-D", "--debug", help="Turn on debug, prints intermediate images.")
args = parser.parse_args()
return args

def main():
# Get options
args = options()

# Set variables
pcv.params.debug = args.debug # Replace the hard-coded debug with the debug flag
img_file = args.image # Replace the hard-coded input image with image flag

# Put workflow
# steps from
# Jupyter here

# Print data that gets collected into the Outputs
pcv.outputs.save_results(filename=.result, outformat="json")
# Get command-line options
args = workflow_inputs()

# Set variables
pcv.params.debug = args.debug # Replace the hard-coded debug with the debug flag

img, imgpath, imgname = pcv.readimage(filename=args.image)

# Put workflow
# steps from
# Jupyter here

if __name__ == '__main__':
main()
# Print data that gets collected into the Outputs
pcv.outputs.save_results(filename=args.result, outformat="json")

```

Expand Down
28 changes: 21 additions & 7 deletions docs/parallel_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,14 @@
`WorkflowConfig` is a class that stores parallel workflow configuration parameters. Configurations can be saved/imported
to run workflows in parallel.

### Quick start

Create a configuration file from a template:

```bash
plantcv-workflow.py --template my_config.txt
```

*class* **plantcv.parallel.WorkflowConfig**

### Class methods
Expand Down Expand Up @@ -61,12 +69,12 @@ Validate parameters/structure of configuration data.


* **start_date**: (str, default = `None`): start date used to filter images. Images will be analyzed that are newer
than the start date. In the case of `None` all images prior to `end_date` are processed. String format should match
`timestampformat`.
than or equal to the start date. In the case of `None` all images prior to `end_date` are processed. String format
should match `timestampformat`.


* **end_date**: (str, default = `None`): end date used to filter images. Images will be analyzed that are older than
the end date. In the case of `None` all images after `start_date` are processed. String format should match
or equal to the end date. In the case of `None` all images after `start_date` are processed. String format should match
`timestampformat`.


Expand All @@ -82,7 +90,7 @@ Validate parameters/structure of configuration data.
`{"imgtype": "VIS", "frame": ["0", "90"]"}`).


* **timestampformat**: (str, default = '%Y-%m-%d %H:%M:%S.%f'): a date format code compatible with strptime C library.
* **timestampformat**: (str, default = '%Y-%m-%dT%H:%M:%S.%fZ'): a date format code compatible with strptime C library.
See [strptime docs](https://docs.python.org/3.7/library/datetime.html#strftime-and-strptime-behavior) for supported
codes.

Expand All @@ -94,8 +102,14 @@ Validate parameters/structure of configuration data.
`["--input1", "value1", "--input2", "value2"]`).


* **coprocess** (str, default = `None`): coprocess the specified imgtype with the imgtype specified in metadata_filters
(e.g. coprocess NIR images with VIS).
* **groupby** (list, default = `["filepath"]`): a list of one or more metadata terms used to create unique groups of images
for downstream analysis. The default, `filepath` will create groups of single images (i.e. one input image per workflow). An
example of a multi-image group could be to pair VIS and NIR images (e.g. `["timestamp", "camera", "rotation"]`). Supported
metadata terms are listed [here](pipeline_parallel.md).

* **group_name** (str, default = `"imgtype"`): either a metadata term used to create a unique name for each image in an
image group (created by `groupby`), or `"auto"` to generate a numbered image sequence `image1, image2, ...`. The resulting
names are used to access individual image filepaths in a workflow.

* **cleanup**: (bool, default =`True`): remove temporary job directory if `True`.

Expand Down Expand Up @@ -198,7 +212,7 @@ See [Workflow Parallization tutorial for examples](pipeline_parallel.md)
To run `plantcv-workflow.py` with a config file you can use the following:

```shell
python plantcv-workflow.py --config my_config.json
plantcv-workflow.py --config my_config.json
```

Remember that `python` and `plantcv-workflow.py` need to be in your PATH, for example with Conda environment. On
Expand Down
2 changes: 1 addition & 1 deletion docs/parallel_job_builder.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The job builder step in [PlantCV Workflow Parallelization](pipeline_parallel.md)
**returns** none

- **Parameters:**
- meta - Dictionary of processed image metadata
- meta - Grouped Pandas DataFrame of processed image metadata
- config - plantcv.parallel.WorkflowConfig object
- **Context:**
- This step is built into the [PlantCV Workflow Parallelization](pipeline_parallel.md) feature. It builds a list of image processing
Expand Down
17 changes: 1 addition & 16 deletions docs/parallel_metadata_parser.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,12 @@ Reads metadata the from the input data directory.

**plantcv.parallel.metadata_parser**(*config*)

**returns** meta (dictionary of image metadata, one entry per image to be processed)
**returns** dataset (grouped Pandas DataFrame of image metadata)

- **Parameters:**
- config - plantcv.parallel.WorkflowConfig object
- **Context:**
- This is one of the first steps built into the [PlantCV Workflow Parallelization](pipeline_parallel.md) feature.
It reads metadata the from the input data directory and uses the outputs in the [job builder](parallel_job_builder.md) step.


A helper function to convert datetimes/timestamps in string format to Unix/Epoch time (elapsed seconds from epoch).

**plantcv.parallel.convert_datetime_to_unixtime**(*timestamp_str, date_format*)

**returns** unix_time (integer value of elapsed seconds from epoch: 1970-01-01 00:00:00)

- **Parameters:**
- timestamp_str - a datetime represented as a character string (e.g. 2020-01-01 00:00:00)
- date_format - date format code for `strptime` (e.g. "%Y-%m-%d %H:%M:%S") See
[strptime docs](https://docs.python.org/3.7/library/datetime.html#strftime-and-strptime-behavior) for supported codes.
- **Context:**
- A timestamp is often an important piece of metadata associated with automated imaging. This function is used to
convert between human and machine readable datetime formats within [Workflow Parallelization](pipeline_parallel.md).

**Source Code:** [Here](https://github.com/danforthcenter/plantcv/blob/master/plantcv/parallel/parsers.py)
11 changes: 6 additions & 5 deletions docs/parallel_process_results.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
## Process Results

Process a directory of results files from running PlantCV over as many images as needed and create a formatted, concatenated data output file.
Process a directory of results files from running PlantCV over as many images as needed and create a formatted,
concatenated data output file.

**plantcv.parallel.process_results**(*job_dir, json_file*)

Expand All @@ -10,10 +11,10 @@ Process a directory of results files from running PlantCV over as many images as
- job_dir - Path of the job directory
- json_file - Path and name of the output combined json file
- **Context:**
- This step is built into the [PlantCV Workflow Parallelization](pipeline_parallel.md) feature. Each image will likely print
hierarchical data files if [`print_results`](print_results.md) is a step in the workflow but the `process_results` step takes place after all
images have been analyzed and combines these single image data files into one text file that can be used as input for the [`json2csv`](tools.md#convert-output-json-data-files-to-csv-tables)
function.
- This step is built into the [PlantCV Workflow Parallelization](pipeline_parallel.md) feature. Each workflow will save
hierarchical data files using [`pcv.outputs.save_results`](outputs.md). `process_results` step takes place after all
images have been analyzed and combines these single workflow data files into one text file that can be used as input for
the [`json2csv`](tools.md#convert-output-json-data-files-to-csv-tables) function.
- **Example use:**
- Below

Expand Down
Loading