Skip to content

Commit a83c3aa

Browse files
authored
Add support for 'file' URI scheme to URL-based connectors (#2873)
Add support for the 'file' URI scheme to the URL component catalog connector, the Apache Airflow package connector, and Apache Airflow provider package connector. This enables those connectors to load resources from the local file system, which can lead to significant performance improvements in the Visual Pipeline Editor.
1 parent 6ee5aea commit a83c3aa

9 files changed

Lines changed: 403 additions & 57 deletions

File tree

docs/source/user_guide/pipeline-components.md

Lines changed: 33 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ Elyra includes connectors for the following component catalog types:
5959

6060
Example: A directory component catalog that is configured using the `/users/jdoe/kubeflow_components/test` path makes all component files in that directory available to Elyra.
6161

62-
- [_URL component catalogs_](#url-component-catalog) provide access to components that are stored on the web and can be retrieved using anonymous HTTP `GET` requests.
62+
- [_URL component catalogs_](#url-component-catalog) provide access to components that are stored on the web and can be retrieved using HTTP `GET` requests.
6363

6464
Example: A URL component catalog that is configured using the `http://myserver:myport/mypath/my_component.yaml` URL makes the `my_component.yaml` component file available to Elyra.
6565

@@ -395,36 +395,62 @@ Examples (CLI):
395395

396396
The URL component catalog connector provides access to components that are stored on the web:
397397
- You can specify one or more URL resources.
398-
- The specified URLs must be retrievable using an HTTP `GET` request.
398+
- The specified URLs must be retrievable using an HTTP `GET` request. `http`, `https`, and `file` [URI schemes](https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml) are supported.
399399
- If the resources are secured, provide credentials, such as a user id and password or API key.
400400

401401
Examples (GUI):
402-
- `https://raw.githubusercontent.com/elyra-ai/examples/main/component-catalog-connectors/kfp-example-components-connector/kfp_examples_connector/resources/filter_text_using_shell_and_grep.yaml`
402+
- HTTPS URL
403+
```
404+
https://raw.githubusercontent.com/elyra-ai/examples/main/component-catalog-connectors/kfp-example-components-connector/kfp_examples_connector/resources/filter_text_using_shell_and_grep.yaml
405+
```
406+
- Local file URL
407+
```
408+
file:///absolute/path/to/component.yaml
409+
```
403410

404411
Examples (CLI):
405-
- `['https://raw.githubusercontent.com/elyra-ai/examples/main/component-catalog-connectors/kfp-example-components-connector/kfp_examples_connector/resources/filter_text_using_shell_and_grep.yaml']`
406-
- `['<URL_1>','<URL_2>']`
412+
- HTTPS URL
413+
```
414+
['https://raw.githubusercontent.com/elyra-ai/examples/main/component-catalog-connectors/kfp-example-components-connector/kfp_examples_connector/resources/filter_text_using_shell_and_grep.yaml']
415+
```
416+
- Local file URL
417+
```
418+
['file:///absolute/path/to/component.yaml']
419+
```
420+
- Multiple URLs
421+
```
422+
['<URL_1>','<URL_2>']
423+
```
407424

408425

409426
#### Apache Airflow package catalog
410427

411428
The [Apache Airflow package catalog connector](https://github.com/elyra-ai/elyra/tree/main/elyra/pipeline/airflow/package_catalog_connector) provides access to operators that are stored in Apache Airflow [built distributions](https://packaging.python.org/en/latest/glossary/#term-built-distribution):
412429
- Only the [wheel distribution format](https://packaging.python.org/en/latest/glossary/#term-Wheel) is supported.
413-
- The specified URL must be retrievable using an HTTP `GET` request.
430+
- The specified URL must be retrievable using an HTTP `GET` request. `http`, `https`, and `file` [URI schemes](https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml) are supported.
414431

415432
Examples:
416433
- [Apache Airflow](https://pypi.org/project/apache-airflow/) (v1.10.15):
417434
```
418435
https://files.pythonhosted.org/packages/f0/3a/f5ce74b2bdbbe59c925bb3398ec0781b66a64b8a23e2f6adc7ab9f1005d9/apache_airflow-1.10.15-py2.py3-none-any.whl
436+
```
437+
- Local copy of a downloaded Apache Airflow package
438+
```
439+
file:///absolute/path/to/apache_airflow-1.10.15-py2.py3-none-any.whl
419440
```
420441

421442
#### Apache Airflow provider package catalog
422443
The [Apache Airflow provider package catalog connector](https://github.com/elyra-ai/elyra/tree/main/elyra/pipeline/airflow/provider_package_catalog_connector) provides access to operators that are stored in [Apache Airflow provider packages](https://airflow.apache.org/docs/apache-airflow-providers/):
423444
- Only the [wheel distribution format](https://packaging.python.org/en/latest/glossary/#term-Wheel) is supported.
424-
- The specified URL must be retrievable using an HTTP `GET` request.
445+
- The specified URL must be retrievable using an HTTP `GET` request. `http`, `https`, and `file` [URI schemes](https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml) are supported.
425446

426447
Examples:
427448
- [apache-airflow-providers-http](https://airflow.apache.org/docs/apache-airflow-providers-http/stable/index.html) (v2.0.2):
428449
```
429450
https://files.pythonhosted.org/packages/a1/08/91653e9f394cbefe356ac07db809be7e69cc89b094379ad91d6cef3d2bc9/apache_airflow_providers_http-2.0.2-py3-none-any.whl
430451
```
452+
453+
- Local copy of a downloaded provider package
454+
```
455+
file:///absolute/path/to/apache_airflow_providers_http-2.0.2-py3-none-any.whl
456+
```

elyra/pipeline/airflow/package_catalog_connector/airflow_package_catalog_connector.py

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -26,12 +26,13 @@
2626
from urllib.parse import urlparse
2727
import zipfile
2828

29-
from requests import get
29+
from requests import session
3030
from requests.auth import HTTPBasicAuth
3131

3232
from elyra.pipeline.catalog_connector import AirflowEntryData
3333
from elyra.pipeline.catalog_connector import ComponentCatalogConnector
3434
from elyra.pipeline.catalog_connector import EntryData
35+
from elyra.util.url import FileTransportAdapter
3536

3637

3738
class AirflowPackageCatalogConnector(ComponentCatalogConnector):
@@ -74,20 +75,22 @@ def get_catalog_entries(self, catalog_metadata: Dict[str, Any]) -> List[Dict[str
7475
)
7576
return operator_key_list
7677

77-
# determine whether authentication needs to be performed
78-
auth_id = catalog_metadata.get("auth_id")
79-
auth_password = catalog_metadata.get("auth_password")
80-
if auth_id and auth_password:
81-
auth = HTTPBasicAuth(auth_id, auth_password)
82-
elif auth_id or auth_password:
83-
self.log.error(
84-
f"Error. Airflow connector '{catalog_metadata.get('display_name')}' "
85-
"is not configured properly. "
86-
"Authentication requires a user id and password or API key."
87-
)
88-
return operator_key_list
89-
else:
90-
auth = None
78+
pr = urlparse(airflow_package_download_url)
79+
auth = None
80+
81+
if pr.scheme != "file":
82+
# determine whether authentication needs to be performed
83+
auth_id = catalog_metadata.get("auth_id")
84+
auth_password = catalog_metadata.get("auth_password")
85+
if auth_id and auth_password:
86+
auth = HTTPBasicAuth(auth_id, auth_password)
87+
elif auth_id or auth_password:
88+
self.log.error(
89+
f"Error. Airflow connector '{catalog_metadata.get('display_name')}' "
90+
"is not configured properly. "
91+
"Authentication requires a user id and password or API key."
92+
)
93+
return operator_key_list
9194

9295
# tmp_archive_dir is used to store the downloaded archive and as working directory
9396
if hasattr(self, "tmp_archive_dir"):
@@ -100,7 +103,10 @@ def get_catalog_entries(self, catalog_metadata: Dict[str, Any]) -> List[Dict[str
100103

101104
# download archive; abort after 30 seconds
102105
try:
103-
response = get(
106+
requests_session = session()
107+
if pr.scheme == "file":
108+
requests_session.mount("file://", FileTransportAdapter())
109+
response = requests_session.get(
104110
airflow_package_download_url,
105111
timeout=AirflowPackageCatalogConnector.REQUEST_TIMEOUT,
106112
allow_redirects=True,

elyra/pipeline/airflow/provider_package_catalog_connector/airflow_provider_package_catalog_connector.py

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,13 @@
2828
from urllib.parse import urlparse
2929
import zipfile
3030

31-
from requests import get
31+
from requests import session
3232
from requests.auth import HTTPBasicAuth
3333

3434
from elyra.pipeline.catalog_connector import AirflowEntryData
3535
from elyra.pipeline.catalog_connector import ComponentCatalogConnector
3636
from elyra.pipeline.catalog_connector import EntryData
37+
from elyra.util.url import FileTransportAdapter
3738

3839

3940
class AirflowProviderPackageCatalogConnector(ComponentCatalogConnector):
@@ -79,20 +80,22 @@ def get_catalog_entries(self, catalog_metadata: Dict[str, Any]) -> List[Dict[str
7980
)
8081
return operator_key_list
8182

82-
# determine whether authentication needs to be performed
83-
auth_id = catalog_metadata.get("auth_id")
84-
auth_password = catalog_metadata.get("auth_password")
85-
if auth_id and auth_password:
86-
auth = HTTPBasicAuth(auth_id, auth_password)
87-
elif auth_id or auth_password:
88-
self.log.error(
89-
f"Error. Airflow provider package connector '{catalog_metadata.get('display_name')}' "
90-
"is not configured properly. "
91-
"Authentication requires a user id and password or API key."
92-
)
93-
return operator_key_list
94-
else:
95-
auth = None
83+
pr = urlparse(airflow_provider_package_download_url)
84+
auth = None
85+
86+
if pr.scheme != "file":
87+
# determine whether authentication needs to be performed
88+
auth_id = catalog_metadata.get("auth_id")
89+
auth_password = catalog_metadata.get("auth_password")
90+
if auth_id and auth_password:
91+
auth = HTTPBasicAuth(auth_id, auth_password)
92+
elif auth_id or auth_password:
93+
self.log.error(
94+
f"Error. Airflow provider package connector '{catalog_metadata.get('display_name')}' "
95+
"is not configured properly. "
96+
"Authentication requires a user id and password or API key."
97+
)
98+
return operator_key_list
9699

97100
# tmp_archive_dir is used to store the downloaded archive and as working directory
98101
if hasattr(self, "tmp_archive_dir"):
@@ -105,7 +108,10 @@ def get_catalog_entries(self, catalog_metadata: Dict[str, Any]) -> List[Dict[str
105108

106109
# download archive
107110
try:
108-
response = get(
111+
requests_session = session()
112+
if pr.scheme == "file":
113+
requests_session.mount("file://", FileTransportAdapter())
114+
response = requests_session.get(
109115
airflow_provider_package_download_url,
110116
timeout=AirflowProviderPackageCatalogConnector.REQUEST_TIMEOUT,
111117
allow_redirects=True,

elyra/pipeline/catalog_connector.py

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,11 @@
2626
from typing import Dict
2727
from typing import List
2828
from typing import Optional
29+
from urllib.parse import urlparse
2930

3031
from deprecation import deprecated
3132
from jupyter_core.paths import ENV_JUPYTER_PATH
32-
from requests import get
33+
from requests import session
3334
from requests.auth import HTTPBasicAuth
3435
from traitlets.config import LoggingConfigurable
3536
from traitlets.traitlets import default
@@ -40,6 +41,7 @@
4041
from elyra.pipeline.component import Component
4142
from elyra.pipeline.component import ComponentParameter
4243
from elyra.pipeline.runtime_type import RuntimeProcessorType
44+
from elyra.util.url import FileTransportAdapter
4345

4446

4547
class EntryData(object):
@@ -636,24 +638,28 @@ def get_entry_data(
636638
individual catalog entries
637639
"""
638640
url = catalog_entry_data.get("url")
639-
640-
# determine whether authentication needs to be performed
641-
auth_id = catalog_metadata.get("auth_id")
642-
auth_password = catalog_metadata.get("auth_password")
643-
if auth_id and auth_password:
644-
auth = HTTPBasicAuth(auth_id, auth_password)
645-
elif auth_id or auth_password:
646-
self.log.error(
647-
f"Error. URL catalog connector '{catalog_metadata.get('display_name')}' "
648-
"is not configured properly. "
649-
"Authentication requires a user id and password or API key."
650-
)
651-
return None
652-
else:
653-
auth = None
641+
pr = urlparse(url)
642+
auth = None
643+
644+
if pr.scheme != "file":
645+
# determine whether authentication needs to be performed
646+
auth_id = catalog_metadata.get("auth_id")
647+
auth_password = catalog_metadata.get("auth_password")
648+
if auth_id and auth_password:
649+
auth = HTTPBasicAuth(auth_id, auth_password)
650+
elif auth_id or auth_password:
651+
self.log.error(
652+
f"Error. URL catalog connector '{catalog_metadata.get('display_name')}' "
653+
"is not configured properly. "
654+
"Authentication requires a user id and password or API key."
655+
)
656+
return None
654657

655658
try:
656-
res = get(
659+
requests_session = session()
660+
if pr.scheme == "file":
661+
requests_session.mount("file://", FileTransportAdapter())
662+
res = requests_session.get(
657663
url,
658664
timeout=UrlComponentCatalogConnector.REQUEST_TIMEOUT,
659665
allow_redirects=True,

elyra/tests/pipeline/airflow/test_airflow_package_connector.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
#
1616

1717
import io
18+
from pathlib import Path
1819
import zipfile
1920

2021
from elyra.pipeline.airflow.package_catalog_connector.airflow_package_catalog_connector import (
@@ -84,6 +85,31 @@ def test_invalid_download_input(requests_mock):
8485
assert len(ce) == 0
8586

8687

88+
def test_invalid_get_entry_data():
89+
"""
90+
Validate that AirflowPackageCatalogConnector.get_entry_data(...) returns
91+
the expected results for invalid inputs
92+
"""
93+
apc = AirflowPackageCatalogConnector(AIRFLOW_SUPPORTED_FILE_TYPES)
94+
95+
# Test invalid "file://" inputs ...
96+
# ... input refers to a directory
97+
resource_location = Path(__file__).parent / ".." / "resources" / "components"
98+
resource_url = resource_location.as_uri()
99+
ce = apc.get_catalog_entries({"airflow_package_download_url": resource_url, "display_name": "file://is-a-dir-test"})
100+
assert isinstance(ce, list), resource_url
101+
assert len(ce) == 0
102+
103+
# ... input refers to a non-existing whl file
104+
resource_location = Path(__file__).parent / ".." / "resources" / "components" / "no-such.whl"
105+
resource_url = resource_location.as_uri()
106+
ce = apc.get_catalog_entries(
107+
{"airflow_package_download_url": resource_url, "display_name": "file://no-such-file-test"}
108+
)
109+
assert isinstance(ce, list), resource_url
110+
assert len(ce) == 0
111+
112+
87113
# -----------------------------------
88114
# Long running test(s)
89115
# ----------------------------------

0 commit comments

Comments
 (0)