21 Apr 15:12

idanov

ae9f15c

0.17.3

Release 0.17.3

Major features and improvements

Kedro plugins can now override built-in CLI commands.
Added a before_command_run hook for plugins to add extra behaviour before Kedro CLI commands run.
pipelines from pipeline_registry.py and register_pipeline hooks are now loaded lazily when they are first accessed, not on startup:

from kedro.framework.project import pipelines

print(pipelines["__default__"])  # pipeline loading is only triggered here

Bug fixes and other changes

TemplatedConfigLoader now correctly inserts default values when no globals are supplied.
Fixed a bug where the KEDRO_ENV environment variable had no effect on instantiating the context variable in an iPython session or a Jupyter notebook.
Plugins with empty CLI groups are no longer displayed in the Kedro CLI help screen.
Duplicate commands will no longer appear twice in the Kedro CLI help screen.
CLI commands from sources with the same name will show under one list in the help screen.
The setup of a Kedro project, including adding src to path and configuring settings, is now handled via the bootstrap_project method.
configure_project is invoked if a package_name is supplied to KedroSession.create. This is added for backward-compatibility purpose to support a workflow that creates Session manually. It will be removed in 0.18.0.
Stopped swallowing up all ModuleNotFoundError if register_pipelines not found, so that a more helpful error message will appear when a dependency is missing, e.g. Issue #722.
When kedro new is invoked using a configuration yaml file, output_dir is no longer a required key; by default the current working directory will be used.
When kedro new is invoked using a configuration yaml file, the appropriate prompts.yml file is now used for validating the provided configuration. Previously, validation was always performed against the kedro project template prompts.yml file.
When a relative path to a starter template is provided, kedro new now generates user prompts to obtain configuration rather than supplying empty configuration.
Fixed error when using starters on Windows with Python 3.7 (Issue #722).
Fixed decoding error of config files that contain accented characters by opening them for reading in UTF-8.
Fixed an issue where after_dataset_loaded run would finish before a dataset is actually loaded when using --async flag.

Upcoming deprecations for Kedro 0.18.0

kedro.versioning.journal.Journal will be removed.
The following properties on kedro.framework.context.KedroContext will be removed:
- io in favour of KedroContext.catalog
- pipeline (equivalent to pipelines["__default__"])
- pipelines in favour of kedro.framework.project.pipelines

Assets 2

15 Mar 18:10

idanov

0.17.2

eda6762

0.17.2

Release 0.17.2

Major features and improvements

Added support for compress_pickle backend to PickleDataSet.
Enabled loading pipelines without creating a KedroContext instance:

from kedro.framework.project import pipelines

print(pipelines)

Projects generated with kedro>=0.17.2:
- should define pipelines in pipeline_registry.py rather than hooks.py.
- when run as a package, will behave the same as kedro run

Bug fixes and other changes

If settings.py is not importable, the errors will be surfaced earlier in the process, rather than at runtime.

Minor breaking changes to the API

kedro pipeline list and kedro pipeline describe no longer accept redundant --env parameter.
from kedro.framework.cli.cli import cli no longer includes the new and starter commands.

Upcoming deprecations for Kedro 0.18.0

kedro.framework.context.KedroContext.run will be removed in release 0.18.0.

Thanks for supporting contributions

Sasaki Takeru

Assets 2

04 Mar 14:31

idanov

0.17.1

535570a

0.17.1

Release 0.17.1

Major features and improvements

Added env and extra_params to reload_kedro() line magic.
Extended the pipeline() API to allow strings and sets of strings as inputs and outputs, to specify when a dataset name remains the same (not namespaced).
Added the ability to add custom prompts with regexp validator for starters by repurposing default_config.yml as prompts.yml.
Added the env and extra_params arguments to register_config_loader hook.
Refactored the way settings are loaded. You will now be able to run:

from kedro.framework.project import settings

print(settings.CONF_ROOT)

Bug fixes and other changes

The version of a packaged modular pipeline now defaults to the version of the project package.
Added fix to prevent new lines being added to pandas CSV datasets.
Fixed issue with loading a versioned SparkDataSet in the interactive workflow.
Kedro CLI now checks pyproject.toml for a tool.kedro section before treating the project as a Kedro project.
Added fix to DataCatalog::shallow_copy now it should copy layers.
kedro pipeline pull now uses pip download for protocols that are not supported by fsspec.
Cleaned up documentation to fix broken links and rewrite permanently redirected ones.
Added a jsonschema schema definition for the Kedro 0.17 catalog.
kedro install now waits on Windows until all the requirements are installed.
Exposed --to-outputs option in the CLI, throughout the codebase, and as part of hooks specifications.
Fixed a bug where ParquetDataSet wasn't creating parent directories on the fly.
Updated documentation.

Breaking changes to the API

This release has broken the kedro ipython and kedro jupyter workflows. To fix this, follow the instructions in the migration guide below.

Note: If you're using the ipython extension instead, you will not encounter this problem.

Migration guide

You will have to update the file <your_project>/.ipython/profile_default/startup/00-kedro-init.py in order to make kedro ipython and/or kedro jupyter work. Add the following line before the KedroSession is created:

configure_project(metadata.package_name)  # to add

session = KedroSession.create(metadata.package_name, path)

Make sure that the associated import is provided in the same place as others in the file:

from kedro.framework.project import configure_project  # to add
from kedro.framework.session import KedroSession

Thanks for supporting contributions

Mariana Silva,
Kiyohito Kunii,
noklam,
Ivan Doroshenko,
Zain Patel,
Deepyaman Datta,
Sam Hiscox,
Pascal Brokmeier

Assets 2

17 Dec 13:29

idanov

0.17.0

fb88cc2

0.17.0

Release 0.17.0

Major features and improvements

In a significant change, we have introduced KedroSession which is responsible for managing the lifecycle of a Kedro run.
Created a new Kedro Starter: kedro new --starter=mini-kedro. It is possible to use the DataCatalog as a standalone component in a Jupyter notebook and transition into the rest of the Kedro framework.
Added DatasetSpecs with Hooks to run before and after datasets are loaded from/saved to the catalog.
Added a command: kedro catalog create. For a registered pipeline, it creates a <conf_root>/<env>/catalog/<pipeline_name>.yml configuration file with MemoryDataSet datasets for each dataset that is missing from DataCatalog.
Added settings.py and pyproject.toml (to replace .kedro.yml) for project configuration, in line with Python best practice.
ProjectContext is no longer needed, unless for very complex customisations. KedroContext, ProjectHooks and settings.py together implement sensible default behaviour. As a result context_path is also now an optional key in pyproject.toml.
Removed ProjectContext from src/<package_name>/run.py.
TemplatedConfigLoader now supports Jinja2 template syntax alongside its original syntax.
Made registration Hooks mandatory, as the only way to customise the ConfigLoader or the DataCatalog used in a project. If no such Hook is provided in src/<package_name>/hooks.py, a KedroContextError is raised. There are sensible defaults defined in any project generated with Kedro >= 0.16.5.

Bug fixes and other changes

ParallelRunner no longer results in a run failure, when triggered from a notebook, if the run is started using KedroSession (session.run()).
before_node_run can now overwrite node inputs by returning a dictionary with the corresponding updates.
Added minimal, black-compatible flake8 configuration to the project template.
Moved isort and pytest configuration from <project_root>/setup.cfg to <project_root>/pyproject.toml.
Extra parameters are no longer incorrectly passed from KedroSession to KedroContext.
Relaxed pyspark requirements to allow for installation of pyspark 3.0.
Added a --fs-args option to the kedro pipeline pull command to specify configuration options for the fsspec filesystem arguments used when pulling modular pipelines from non-PyPI locations.
Bumped maximum required fsspec version to 0.9.
Bumped maximum supported s3fs version to 0.5 (S3FileSystem interface has changed since 0.4.1 version).

Deprecations

In Kedro 0.17.0 we have deleted the deprecated kedro.cli and kedro.context modules in favour of kedro.framework.cli and kedro.framework.context respectively.

Other breaking changes to the API

kedro.io.DataCatalog.exists() returns False when the dataset does not exist, as opposed to raising an exception.
The pipeline-specific catalog.yml file is no longer automatically created for modular pipelines when running kedro pipeline create. Use kedro catalog create to replace this functionality.
Removed include_examples prompt from kedro new. To generate boilerplate example code, you should use a Kedro starter.
Changed the --verbose flag from a global command to a project-specific command flag (e.g kedro --verbose new becomes kedro new --verbose).
Dropped support of the dataset_credentials key in credentials in PartitionedDataSet.
get_source_dir() was removed from kedro/framework/cli/utils.py.
Dropped support of get_config, create_catalog, create_pipeline, template_version, project_name and project_path keys by get_project_context() function (kedro/framework/cli/cli.py).
kedro new --starter now defaults to fetching the starter template matching the installed Kedro version.
Renamed kedro_cli.py to cli.py and moved it inside the Python package (src/<package_name>/), for a better packaging and deployment experience.
Removed .kedro.yml from the project template and replaced it with pyproject.toml.
Removed KEDRO_CONFIGS constant (previously residing in kedro.framework.context.context).
Modified kedro pipeline create CLI command to add a boilerplate parameter config file in conf/<env>/parameters/<pipeline_name>.yml instead of conf/<env>/pipelines/<pipeline_name>/parameters.yml. CLI commands kedro pipeline delete / package / pull were updated accordingly.
Removed get_static_project_data from kedro.framework.context.
Removed KedroContext.static_data.
The KedroContext constructor now takes package_name as first argument.
Replaced context property on KedroSession with load_context() method.
Renamed _push_session and _pop_session in kedro.framework.session.session to _activate_session and _deactivate_session respectively.
Custom context class is set via CONTEXT_CLASS variable in src/<your_project>/settings.py.
Removed KedroContext.hooks attribute. Instead, hooks should be registered in src/<your_project>/settings.py under the HOOKS key.
Restricted names given to nodes to match the regex pattern [\w\.-]+$.
Removed KedroContext._create_config_loader() and KedroContext._create_data_catalog(). They have been replaced by registration hooks, namely register_config_loader() and register_catalog() (see also upcoming deprecations).

Upcoming deprecations for Kedro 0.18.0

kedro.framework.context.load_context will be removed in release 0.18.0.
kedro.framework.cli.get_project_context will be removed in release 0.18.0.
We've added a DeprecationWarning to the decorator API for both node and pipeline. These will be removed in release 0.18.0. Use Hooks to extend a node's behaviour instead.
We've added a DeprecationWarning to the Transformers API when adding a transformer to the catalog. These will be removed in release 0.18.0. Use Hooks to customise the load and save methods.

Thanks for supporting contributions

Deepyaman Datta, Zach Schuster

Migration guide from Kedro 0.16.* to 0.17.*

Reminder: Our documentation on how to upgrade Kedro covers a few key things to remember when updating any Kedro version.

The Kedro 0.17.0 release contains some breaking changes. If you update Kedro to 0.17.0 and then try to work with projects created against earlier versions of Kedro, you may encounter some issues when trying to run kedro commands in the terminal for that project. Here's a short guide to getting your projects running against the new version of Kedro.

Note: As always, if you hit any problems, please check out our documentation:

How can I find out more about Kedro?

How can I get my questions answered?.

To get an existing Kedro project to work after you upgrade to Kedro 0.17.0, we recommend that you create a new project against Kedro 0.17.0 and move the code from your existing project into it. Let's go through the changes, but first, note that if you create a new Kedro project with Kedro 0.17.0 you will not be asked whether you want to include the boilerplate code for the Iris dataset example. We've removed this option (you should now use a Kedro starter if you want to create a project that is pre-populated with code).

To create a new, blank Kedro 0.17.0 project to drop your existing code into, you can create one, as always, with kedro new. We also recommend creating a new virtual environment for your new project, or you might run into conflicts with existing dependencies.

Update pyproject.toml: Copy the following three keys from the .kedro.yml of your existing Kedro project into the pyproject.toml file of your new Kedro 0.17.0 project:

[tools.kedro]
package_name = "<package_name>"
project_name = "<project_name>"
project_version = "0.17.0"

Check your source directory. If you defined a different source directory (source_dir), make sure you also move that to pyproject.toml.

Copy files from your existing project:
- Copy subfolders of project/src/project_name/pipelines from existing to new project
- Copy subfolders of project/src/test/pipelines from existing to new project
- Copy the requirements your project needs into requirements.txt and/or requirements.in.
- Copy your project configuration from the conf folder. Take note of the new locations needed for modular pipeline configuration (move it from conf/<env>/pipeline_name/catalog.yml to conf/<env>/catalog/pipeline_name.yml and likewise for parameters.yml).
- Copy from the data/ folder of your existing project, if needed, into the same location in your new project.
- Copy any Hooks from src/<package_name>/hooks.py.
Update your new project's README and docs as necessary.
Update settings.py: For example, if you specified additional Hook implementations in hooks, or listed plugins under disable_hooks_by_plugin in your .kedro.yml, you will need to move them to settings.py accordingly:

from <package_name>.hooks import MyCustomHooks, ProjectHooks

HOOKS = (ProjectHooks(), MyCustomHooks())

DISABLE_HOOKS_FOR_PLUGINS = ("my_plugin1",)

**Mig...

Assets 2

23 Oct 10:42

idanov

0.16.6

029d40a

0.16.6

Major features and improvements

Added documentation with a focus on single machine and distributed environment deployment; the series includes Docker, Argo, Prefect, Kubeflow, AWS Batch, AWS Sagemaker and extends our section on Databricks
Added kedro-starter-spaceflights alias for generating a project: kedro new --starter spaceflights.

Bug fixes and other changes

Fixed TypeError when converting dict inputs to a node made from a wrapped partial function.
PartitionedDataSet improvements:
- Supported passing arguments to the underlying filesystem.
Improved handling of non-ASCII word characters in dataset names.
- For example, a dataset named jalapeño will be accessible as DataCatalog.datasets.jalapeño rather than DataCatalog.datasets.jalape__o.
Fixed kedro install for an Anaconda environment defined in environment.yml.
Fixed backwards compatibility with templates generated with older Kedro versions <0.16.5. No longer need to update .kedro.yml to use kedro lint and kedro jupyter notebook convert.
Improved documentation.
Added documentation using MinIO with Kedro.
Improved error messages for incorrect parameters passed into a node.
Fixed issue with saving a TensorFlowModelDataset in the HDF5 format with versioning enabled.
Added missing run_result argument in after_pipeline_run Hooks spec.
Fixed a bug in IPython script that was causing context hooks to be registered twice. To apply this fix to a project generated with an older Kedro version, apply the same changes made in this PR to your 00-kedro-init.py file.

Thanks for supporting contributions

Deepyaman Datta, Bhavya Merchant, Lovkush Agarwal, Varun Krishna S, Sebastian Bertoli, noklam, Daniel Petti, Waylon Walker

Assets 2

09 Sep 11:21

idanov

0.16.5

f9100f8

0.16.5

Major features and improvements

Added the following new datasets.

Type	Description	Location
`email.EmailMessageDataSet`	Manage email messages using the Python standard library	`kedro.extras.datasets.email`

Added support for pyproject.toml to configure Kedro. pyproject.toml is used if .kedro.yml doesn't exist (Kedro configuration should be under [tool.kedro] section).
Projects created with this version will have no pipeline.py, having been replaced by hooks.py.
Added a set of registration hooks, as the new way of registering library components with a Kedro project:
- register_pipelines(), to replace _get_pipelines()
- register_config_loader(), to replace _create_config_loader()
- register_catalog(), to replace _create_catalog()
  These can be defined in src/<package-name>/hooks.py and added to .kedro.yml (or pyproject.toml). The order of execution is: plugin hooks, .kedro.yml hooks, hooks in ProjectContext.hooks.
Added ability to disable auto-registered Hooks using .kedro.yml (or pyproject.toml) configuration file.

Bug fixes and other changes

Added option to run asynchronously via the Kedro CLI.
Absorbed .isort.cfg settings into setup.cfg.
project_name, project_version and package_name now have to be defined in .kedro.yml for projects generated using Kedro 0.16.5+.
Packaging a modular pipeline raises an error if the pipeline directory is empty or non-existent.

Thanks for supporting contributions

Deepyaman Datta, Bas Nijholt, Sebastian Bertoli

Assets 2

30 Jul 10:22

idanov

0.16.4

e7cf14d

0.16.4

Release 0.16.4

Major features and improvements

Enabled auto-discovery of hooks implementations coming from installed plugins.

Bug fixes and other changes

Fixed a bug for using ParallelRunner on Windows.
Modified GBQTableDataSet to load customised results using customised queries from Google Big Query tables.
Documentation improvements.

Thanks for supporting contributions

Ajay Bisht, Vijay Sajjanar, Deepyaman Datta, Sebastian Bertoli, Shahil Mawjee, Louis Guitton, Emanuel Ferm

Assets 2

13 Jul 11:37

idanov

0.16.3

7152b41

0.16.3

Release 0.16.3

Assets 2

15 Jun 14:36

idanov

0.16.2

a4fe8d1

0.16.2

Major features and improvements

Added the following new datasets.

Type	Description	Location
`pandas.AppendableExcelDataSet`	Works with `Excel` file opened in append mode	`kedro.extras.datasets.pandas`
`tensorflow.TensorFlowModelDataset`	Works with `TensorFlow` models using TensorFlow 2.X	`kedro.extras.datasets.tensorflow`
`holoviews.HoloviewsWriter`	Works with `Holoviews` objects (saves as image file)	`kedro.extras.datasets.holoviews`

kedro install will now compile project dependencies (by running kedro build-reqs behind the scenes) before the installation if the src/requirements.in file doesn't exist.
Added only_nodes_with_namespace in Pipeline class to filter only nodes with a specified namespace.
Added the kedro pipeline delete command to help delete unwanted or unused pipelines (it won't remove references to the pipeline in your create_pipelines() code).
Added the kedro pipeline package command to help package up a modular pipeline. It will bundle up the pipeline source code, tests, and parameters configuration into a .whl file.

Bug fixes and other changes

Improvement in DataCatalog:
- Introduced regex filtering to the DataCatalog.list() method.
- Non-alphanumeric characters (except underscore) in dataset name are replaced with __ in DataCatalog.datasets, for ease of access to transcoded datasets.
Improvement in Datasets:
- Improved initialization speed of spark.SparkHiveDataSet.
- Improved S3 cache in spark.SparkDataSet.
- Added support of options for building pyarrow table in pandas.ParquetDataSet.
Improvement in kedro build-reqs CLI command:
- kedro build-reqs is now called with -q option and will no longer print out compiled requirements to the console for security reasons.
- All unrecognized CLI options in kedro build-reqs command are now passed to pip-compile call (e.g. kedro build-reqs --generate-hashes).
Improvement in kedro jupyter CLI command:
- Improved error message when running kedro jupyter notebook, kedro jupyter lab or kedro ipython with Jupyter/IPython dependencies not being installed.
- Fixed %run_viz line magic for showing kedro viz inside a Jupyter notebook. For the fix to be applied on existing Kedro project, please see the migration guide.
- Fixed the bug in IPython startup script (issue 298).
Documentation improvements:
- Updated community-generated content in FAQ.
- Added find-kedro and kedro-static-viz to the list of community plugins.
- Add missing pillow.ImageDataSet entry to the documentation.

Breaking changes to the API

Migration guide from Kedro 0.16.1 to 0.16.2

Guide to apply the fix for `%run_viz` line magic in existing project

Even though this release ships a fix for project generated with kedro==0.16.2, after upgrading, you will still need to make a change in your existing project if it was generated with kedro>=0.16.0,<=0.16.1 for the fix to take effect. Specifically, please change the content of your project's IPython init script located at .ipython/profile_default/startup/00-kedro-init.py with the content of this file. You will also need kedro-viz>=3.3.1.

Thanks for supporting contributions

Miguel Rodriguez Gutierrez, Joel Schwarzmann, w0rdsm1th, Deepyaman Datta, Tam-Sanh Nguyen, Marcus Gawronsky

Assets 2

21 May 12:46

idanov

0.16.1

d291a21

0.16.1

Bug fixes and other changes

Fixed deprecation warnings from kedro.cli and kedro.context when running kedro jupyter notebook.
Fixed a bug where catalog and context were not available in Jupyter Lab and Notebook.
Fixed a bug where kedro build-reqs would fail if you didn't have your project dependencies installed.

Assets 2

Releases: kedro-org/kedro

0.17.3

Release 0.17.3

Major features and improvements

Bug fixes and other changes

Upcoming deprecations for Kedro 0.18.0

Uh oh!

0.17.2

Release 0.17.2

Major features and improvements

Bug fixes and other changes

Minor breaking changes to the API

Upcoming deprecations for Kedro 0.18.0

Thanks for supporting contributions

Uh oh!

0.17.1

Release 0.17.1

Major features and improvements

Bug fixes and other changes

Breaking changes to the API

Migration guide

Thanks for supporting contributions

Uh oh!

0.17.0

Release 0.17.0

Major features and improvements

Bug fixes and other changes

Deprecations

Other breaking changes to the API

Upcoming deprecations for Kedro 0.18.0

Thanks for supporting contributions

Migration guide from Kedro 0.16.* to 0.17.*

Uh oh!

0.16.6

Major features and improvements

Bug fixes and other changes

Thanks for supporting contributions

Uh oh!

0.16.5

Major features and improvements

Bug fixes and other changes

Thanks for supporting contributions

Uh oh!

0.16.4

Release 0.16.4

Major features and improvements

Bug fixes and other changes

Thanks for supporting contributions

Uh oh!

0.16.3

Uh oh!

0.16.2

Major features and improvements

Bug fixes and other changes

Breaking changes to the API

Migration guide from Kedro 0.16.1 to 0.16.2

Guide to apply the fix for %run_viz line magic in existing project

Thanks for supporting contributions

Uh oh!

0.16.1

Bug fixes and other changes

Uh oh!

Guide to apply the fix for `%run_viz` line magic in existing project