Skip to content

Conversation

@kiersten-stokes
Copy link
Member

@kiersten-stokes kiersten-stokes commented Jun 17, 2022

#2760

Will also require documentation updates, either here or in a follow-up PR.

What changes were proposed in this pull request?

I'll flush this description out further after making a few changes.

The mounted_volumes property is added to the canvas properties for custom components. It will exist as a sibling-stanza to the component_parameters stanza of the pipeline JSON, which is different from how this property is handled for generic components (where it appears within the component_parameters stanza). As a result of this difference, some of the property propagation has been updated in pipeline_definition.py. Two functions -- get_property() and set_property() -- are added that deal with this difference. The logic for get_property() is below (set_property is similar). I'm not sure I like how this is handled so I may fix that later (suggestions welcome).

def get_property(self, key: str, default_value=None) -> Any:
   if self.is_generic_node():
        value = self.get_component_parameter(key, default_value)
   else:
         value = self.get(key, default_value)
   return None if value == "None" else value

A new mounted_volumes property is also added to both GenericOperation and Operation (also accessed differently depending on the type). In adding this property, I found that simplifying the Operation object construction made things more straightforward, so some of the logic previously done in parser.py has been moved to pipeline.py in the Operation constructors and some utility classes (KeyValueList and the dataclasses) have been moved to a new file, pipeline_utilities.py to prevent circular reference issues.

Finally, logic was added to both processor_kfp.py and processor_airflow.py to handle the addition of volumes for custom components. I also refactored how the Airflow processor handles volumes and secrets in order to simplify the DAG template. Now, all volumes and secrets for the whole pipeline are collected before being template to render.

How was this pull request tested?

Tests will have to be updated and added

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

@kiersten-stokes kiersten-stokes added this to the 3.10.0 milestone Jun 17, 2022
@elyra-bot
Copy link

elyra-bot bot commented Jun 17, 2022

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link: Binder

from elyra.pipeline.pipeline_constants import KUBERNETES_SECRETS
from elyra.pipeline.pipeline_constants import MOUNTED_VOLUMES
from elyra.pipeline.pipeline_constants import RUNTIME_IMAGE
from elyra.pipeline.pipeline_definition import Node

Check failure

Code scanning / CodeQL

Module-level cyclic import

'Node' may not be defined if module [elyra.pipeline.pipeline_definition](1) is imported before module [elyra.pipeline.pipeline](2), as the [definition](3) of Node occurs after the cyclic [import](4) of elyra.pipeline.pipeline.
# below, the pipeline module must be imported in its entirety in order to
# avoid a circular reference issue
try:
from elyra.pipeline import pipeline

Check notice

Code scanning / CodeQL

Cyclic import

Import of module [elyra.pipeline.pipeline](1) begins an import cycle.
super().__init__(node, super_node)

if not component_params.get("filename"):
if not self.filename:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach in this GenericOperation is different than the approach in the Operation constructor where we use _-prefixed member variables to hold the values and use those variables in the validation. Then their "property" counterparts just reference the _-prefixed members.

It would be more maintainable to be consistent with approaches and would either recommend the Operation approach or forgo the use of properties altogether and use direct member names - validating each in their respective constructors.


@property
def mounted_volumes(self) -> List[VolumeMount]:
return self._component_params.get(MOUNTED_VOLUMES)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is preventing GenericOperations from using the same location for mounted_volumes and kubernetes_secrets as Operations (i.e., in app_data rather than app_data.component_parameters) or vice versa? This difference is referenced in the opening description and it seems like it would be worth a possible migration to place these pieces of information in the same location.

Using a property to handle this difference is nice, but it would be good if we could have as few differences within the file structure as we can.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah not sure why I hadn't thought of that as an option. Thinking through this, it looks like the main thing that would have to be worked through is that this would necessitate different logic among those properties that need to be propagated. For example, we need to propagate both env_vars and mounted_volumes (and presumably env_vars should stay in the app_data.component_parameters stanza). That being said, once we got past that part, this approach would probably result in fewer differences between Operations and GenericOperations. I'm going to give it a try and hope that nothing weird comes up 🙂

@kiersten-stokes
Copy link
Member Author

While working through this PR, an issue was encountered that complicated things to a degree that made this approach a lot more complicated than it was worth. Long story short, the elyra_ prefix that we use to denote which properties are 'component parameters' or 'pipeline default' properties was affecting both how the component properties appeared in the GUI and how they were processed. Will close this and open a new PR in which the mounted_volumes property is added as a child to the 'component_parameters' stanza.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants