danforthcenter · nfahlgren · Jul 15, 2022 · Apr 3, 2022 · Apr 15, 2022 · Apr 15, 2022
diff --git a/.github/workflows/continuous-integration.yml b/.github/workflows/continuous-integration.yml
@@ -42,7 +42,7 @@ jobs:
     - name: Test and generate coverage report
       # Run coverage analysis on pytest tests
       run: |
-        python setup.py install
+        pip install .
         py.test --cov-report=xml --cov=plantcv tests/
     - name: Upload coverage to Codecov
       uses: codecov/codecov-action@v1

diff --git a/docs/jupyter.md b/docs/jupyter.md
@@ -21,39 +21,34 @@ can be visualized instantly within the notebook.
 
 ![Screenshot](img/documentation_images/jupyter/jupyter_screenshot.jpg)
 
-PlantCV is automatically set up to run in Jupyter Notebook but there
-are a couple of considerations. Jupyter must be opened within the PlantCV 
-environment. For example, launch Jupyter from the command line from within
-a PlantCV environment with `jupyter notebook`, launch Jupyter from the 
-Anaconda Navigator (if installed with conda) from within the PlantCV environment, 
-etc. 
+PlantCV is automatically set up to run in Jupyter Notebook but you will need to install Jupyter.
+For example, with `conda`:
 
-First, if PlantCV is installed in the global Python search path, you can
-import the PlantCV library like normal:
-
-```python
-from plantcv import plantcv as pcv
+```bash
+conda install nb_conda jupyterlab
 ```
 
-On the other hand, if you installed PlantCV into a local Python path,
-you will need to configure the Jupyter Python kernel to find it. For
-example:
-
-```python
-import sys
-sys.path.append("/home/user/plantcv")
-from plantcv import plantcv as pcv 
-```
+Then you can launch Jupyter from the command line `jupyter lab` and create a notebook using
+a kernel containing your PlantCV environment. 
 
-Second, we use [matplotlib](http://matplotlib.org/) to do the
+First, we use [matplotlib](http://matplotlib.org/) to do the
 in-notebook plotting. To make this work, add the following to the top
 of your notebook:
 
 ```python
 %matplotlib inline
 ```
 
-Third, PlantCV has a built-in debug mode that is set to `None` by 
+Second, you can import the PlantCV library like normal:
+
+```python
+from plantcv import plantcv as pcv
+```
+
+Third (optionally), utilize PlantCV's `WorkflowInputs` class to organize and name workflow
+inputs for compatibility with running the workflow later in parallel.
+
+PlantCV has a built-in debug mode that is set to `None` by 
 default. Setting debug to `"print"` will cause PlantCV to print debug
 images to files, which is the original debug method. In Jupyter, setting
 debug to `"plot"` will cause PlantCV to plot debug images directly into
@@ -64,148 +59,78 @@ would look like the following example:
 
 ```python
 %matplotlib inline
-import os
-import sys
-sys.path.append('/home/user/plantcv')
-import numpy as np
-import cv2
-from matplotlib import pyplot as plt
 from plantcv import plantcv as pcv
+from plantcv.parallel import WorkflowInputs
+
+# Set input variables
+args = WorkflowInputs(images=["./input_color_img.jpg"],
+                      names="image",
+                      result="plantcv_results.csv",
+                      debug="plot")
 
 # Set variables
-pcv.params.debug = 'plot'                     # Plot debug images to the notebook
-img_file = 'input_color_img.jpg'              # Example image
+pcv.params.debug = args.debug
 
 ```
 
-Not all of these imports are required, this is just to demonstrate that
-in addition to importing PlantCV you can import any other useful Python
-packages as well.
-
 ### Converting Jupyter Notebooks to PlantCV workflow scripts
 
 Once a workflow has been developed, it needs to be converted into a pure
 Python script if the goal is to use it on many images using the PlantCV
 workflow [parallelization](pipeline_parallel.md) tools. To make a
 Python script that is compatible with the `plantcv-workflow.py` program,
 first use Jupyter to convert the notebook to Python. This can be done
-through the web interface, or on the command line:
+through the web interface (File > Save and Export Notebook As... > Executable Script),
+or on the command line:
 
-```
+```bash
 jupyter nbconvert --to python notebook.ipynb
 ```
 
 The resulting Python script will be named `notebook.py` in the example
 above. Next, open the Python script with a text editor. Several
 modifications to the script are needed. Modify the list of imported
 packages as needed, but in particular, remove
-`get_ipython().magic('matplotlib inline')` and add `import argparse`.
-If PlantCV is importable in your normal shell environment, you can
-remove `sys.path.append('/home/user/plantcv')` also.
+`get_ipython().magic('matplotlib inline')`. Change `from plantcv.parallel import WorkflowInputs`
+to `from plantcv.parallel import workflow_inputs`.
 
-All of the remaining script (other than the imports) needs to be added
-to a function called `main`. To do this, add a main function and indent
-the remaining code within main, for example:
+Change the code for managing inputs, for example:
 
 ```python
-def main():
-
-    # all the code from Jupyter
-
-if __name__ == '__main__':
-    main()
-
+args = WorkflowInputs(images=["./input_color_img.jpg"],
+                      names="image",
+                      result="plantcv_results.csv",
+                      debug="plot")
 ```
 
-Add a function for parsing command line options using [argparse](https://docs.python.org/2.7/library/argparse.html).
-The `plantcv-workflow.py` script requires a few command-line arguments for
-workflow scripts to work properly. If the script analyzes a single image
-the options minimally should look like the following:
+To:
 
 ```python
-def options():
-    parser = argparse.ArgumentParser(description="Imaging processing with PlantCV.")
-    parser.add_argument("-i", "--image", help="Input image file.", required=True)
-    parser.add_argument("-r","--result", help="Result file.", required= True )
-    parser.add_argument("-o", "--outdir", help="Output directory for image files.", required=False)
-    parser.add_argument("-w","--writeimg", help="Write out images.", default=False, action="store_true")
-    parser.add_argument("-D", "--debug", help="Turn on debug, prints intermediate images.")
-    args = parser.parse_args()
-    return args
-
-```
-
-If the script analyzes two images using co-processing, the options
-should minimally should looks like the following:
-
-```python
-def options():
-    parser = argparse.ArgumentParser(description="Imaging processing with opencv")
-    parser.add_argument("-i", "--image", help="Input image file.", required=True)
-    parser.add_argument("-r","--result", help="Result file.", required=True )
-    parser.add_argument("-r2","--coresult", help="Result file for co-processed image.", required=True )
-    parser.add_argument("-o", "--outdir", help="Output directory for image files.", required=False)
-    parser.add_argument("-w","--writeimg", help="Write out images.", default=False, action="store_true")
-    parser.add_argument("-D", "--debug", help="Turn on debug, prints intermediate images.")
-    args = parser.parse_args()
-    return args
-
-```
-
-Within the `main` function, call the `options` function to get the
-values of the command-line options. Swap any hard-coded values with
-the argument values instead:
-
-```python
-def main():
-    # Get options
-    args = options()
-
-    # Set variables
-    pcv.params.debug = args.debug     # Replace the hard-coded debug with the debug flag
-    img_file = args.image             # Replace the hard-coded input image with image flag
-
+args = workflow_inputs()
 ```
 
 Make any other alterations as necessary after testing. Based on the
 simple Jupyter Notebook example above, the fully modified version would
 look like the following:
 
 ```python
-import os
-import sys
-import numpy as np
-import cv2
-from matplotlib import pyplot as plt
 from plantcv import plantcv as pcv
+from plantcv.parallel import workflow_inputs
 
-def options():
-    parser = argparse.ArgumentParser(description="Imaging processing with PlantCV.")
-    parser.add_argument("-i", "--image", help="Input image file.", required=True)
-    parser.add_argument("-r","--result", help="Result file.", required= True )
-    parser.add_argument("-o", "--outdir", help="Output directory for image files.", required=False)
-    parser.add_argument("-w","--writeimg", help="Write out images.", default=False, action="store_true")
-    parser.add_argument("-D", "--debug", help="Turn on debug, prints intermediate images.")
-    args = parser.parse_args()
-    return args
-
-def main():
-    # Get options
-    args = options()
-
-    # Set variables
-    pcv.params.debug = args.debug        # Replace the hard-coded debug with the debug flag
-    img_file = args.image     # Replace the hard-coded input image with image flag
-
-    # Put workflow 
-    # steps from 
-    # Jupyter here
-
-    # Print data that gets collected into the Outputs 
-    pcv.outputs.save_results(filename=.result, outformat="json")
+# Get command-line options
+args = workflow_inputs()
+
+# Set variables
+pcv.params.debug = args.debug  # Replace the hard-coded debug with the debug flag
+
+img, imgpath, imgname = pcv.readimage(filename=args.image)
+
+# Put workflow 
+# steps from 
+# Jupyter here
 
-if __name__ == '__main__':
-    main()
+# Print data that gets collected into the Outputs 
+pcv.outputs.save_results(filename=args.result, outformat="json")
 
 ```
 

diff --git a/docs/parallel_config.md b/docs/parallel_config.md
@@ -3,6 +3,14 @@
 `WorkflowConfig` is a class that stores parallel workflow configuration parameters. Configurations can be saved/imported
 to run workflows in parallel.
 
+### Quick start
+
+Create a configuration file from a template:
+
+```bash
+plantcv-workflow.py --template my_config.txt
+```
+
 *class* **plantcv.parallel.WorkflowConfig**
 
 ### Class methods
@@ -61,12 +69,12 @@ Validate parameters/structure of configuration data.
 
 
 * **start_date**: (str, default = `None`): start date used to filter images. Images will be analyzed that are newer 
-  than the start date. In the case of `None` all images prior to `end_date` are processed. String format should match
-  `timestampformat`.
+  than or equal to the start date. In the case of `None` all images prior to `end_date` are processed. String format
+  should match `timestampformat`.
 
 
 * **end_date**: (str, default = `None`): end date used to filter images. Images will be analyzed that are older than 
-  the end date. In the case of `None` all images after `start_date` are processed. String format should match 
+  or equal to the end date. In the case of `None` all images after `start_date` are processed. String format should match 
   `timestampformat`.
 
 
@@ -82,7 +90,7 @@ Validate parameters/structure of configuration data.
   `{"imgtype": "VIS", "frame": ["0", "90"]"}`).
 
 
-* **timestampformat**: (str, default = '%Y-%m-%d %H:%M:%S.%f'): a date format code compatible with strptime C library. 
+* **timestampformat**: (str, default = '%Y-%m-%dT%H:%M:%S.%fZ'): a date format code compatible with strptime C library. 
   See [strptime docs](https://docs.python.org/3.7/library/datetime.html#strftime-and-strptime-behavior) for supported 
   codes.
 
@@ -94,8 +102,14 @@ Validate parameters/structure of configuration data.
   `["--input1", "value1", "--input2", "value2"]`).
 
 
-* **coprocess** (str, default = `None`): coprocess the specified imgtype with the imgtype specified in metadata_filters
-  (e.g. coprocess NIR images with VIS).
+* **groupby** (list, default = `["filepath"]`): a list of one or more metadata terms used to create unique groups of images
+for downstream analysis. The default, `filepath` will create groups of single images (i.e. one input image per workflow). An
+example of a multi-image group could be to pair VIS and NIR images (e.g. `["timestamp", "camera", "rotation"]`). Supported
+metadata terms are listed [here](pipeline_parallel.md).
+
+* **group_name** (str, default = `"imgtype"`): either a metadata term used to create a unique name for each image in an
+image group (created by `groupby`), or `"auto"` to generate a numbered image sequence `image1, image2, ...`. The resulting
+names are used to access individual image filepaths in a workflow.
 
 * **cleanup**: (bool, default =`True`): remove temporary job directory if `True`.
 
@@ -198,7 +212,7 @@ See [Workflow Parallization tutorial for examples](pipeline_parallel.md)
 To run `plantcv-workflow.py` with a config file you can use the following:
 
 ```shell
-python plantcv-workflow.py --config my_config.json
+plantcv-workflow.py --config my_config.json
 ```
 
 Remember that `python` and `plantcv-workflow.py` need to be in your PATH, for example with Conda environment. On 

diff --git a/docs/parallel_job_builder.md b/docs/parallel_job_builder.md
@@ -7,7 +7,7 @@ The job builder step in [PlantCV Workflow Parallelization](pipeline_parallel.md)
 **returns** none
 
 - **Parameters:**
-    - meta   - Dictionary of processed image metadata
+    - meta   - Grouped Pandas DataFrame of processed image metadata
     - config - plantcv.parallel.WorkflowConfig object
 - **Context:**
     - This step is built into the [PlantCV Workflow Parallelization](pipeline_parallel.md) feature. It builds a list of image processing 

diff --git a/docs/parallel_metadata_parser.md b/docs/parallel_metadata_parser.md
@@ -4,27 +4,12 @@ Reads metadata the from the input data directory.
 
 **plantcv.parallel.metadata_parser**(*config*)
 
-**returns** meta (dictionary of image metadata, one entry per image to be processed)
+**returns** dataset (grouped Pandas DataFrame of image metadata)
 
 - **Parameters:**
     - config   - plantcv.parallel.WorkflowConfig object
 - **Context:**
     - This is one of the first steps built into the [PlantCV Workflow Parallelization](pipeline_parallel.md) feature. 
     It reads metadata the from the input data directory and uses the outputs in the [job builder](parallel_job_builder.md) step. 
 
-
-A helper function to convert datetimes/timestamps in string format to Unix/Epoch time (elapsed seconds from epoch).
-
-**plantcv.parallel.convert_datetime_to_unixtime**(*timestamp_str, date_format*)
-
-**returns** unix_time (integer value of elapsed seconds from epoch: 1970-01-01 00:00:00)
-
-- **Parameters:**
-    - timestamp_str - a datetime represented as a character string (e.g. 2020-01-01 00:00:00)
-    - date_format - date format code for `strptime` (e.g. "%Y-%m-%d %H:%M:%S") See 
-    [strptime docs](https://docs.python.org/3.7/library/datetime.html#strftime-and-strptime-behavior) for supported codes.
-- **Context:**
-    - A timestamp is often an important piece of metadata associated with automated imaging. This function is used to
-    convert between human and machine readable datetime formats within [Workflow Parallelization](pipeline_parallel.md).
-
 **Source Code:** [Here](https://github.com/danforthcenter/plantcv/blob/master/plantcv/parallel/parsers.py)
diff --git a/docs/parallel_process_results.md b/docs/parallel_process_results.md
@@ -1,6 +1,7 @@
 ## Process Results
 
-Process a directory of results files from running PlantCV over as many images as needed and create a formatted, concatenated data output file. 
+Process a directory of results files from running PlantCV over as many images as needed and create a formatted,
+concatenated data output file. 
 
 **plantcv.parallel.process_results**(*job_dir, json_file*)
 
@@ -10,10 +11,10 @@ Process a directory of results files from running PlantCV over as many images as
     - job_dir   - Path of the job directory
     - json_file - Path and name of the output combined json file
 - **Context:**
-    - This step is built into the [PlantCV Workflow Parallelization](pipeline_parallel.md) feature. Each image will likely print 
-    hierarchical data files if [`print_results`](print_results.md) is a step in the workflow but the `process_results` step takes place after all
-    images have been analyzed and combines these single image data files into one text file that can be used as input for the [`json2csv`](tools.md#convert-output-json-data-files-to-csv-tables)
-    function. 
+    - This step is built into the [PlantCV Workflow Parallelization](pipeline_parallel.md) feature. Each workflow will save
+    hierarchical data files using [`pcv.outputs.save_results`](outputs.md). `process_results` step takes place after all
+    images have been analyzed and combines these single workflow data files into one text file that can be used as input for
+    the [`json2csv`](tools.md#convert-output-json-data-files-to-csv-tables) function. 
 - **Example use:**
     - Below