Releases: kedro-org/kedro
1.0.0
Major features and improvements
Data Catalog
- The previously experimental
KedroDataCataloghas been renamed toDataCatalogand is now the default catalog implementation. - It retains the dict-like interface, supports lazy dataset initialisation, and delivers improved performance.
- While this change is seamless for users following standard Kedro workflows, it introduces a richer API for programmatic use:
- New pipeline-aware commands, available via both the CLI and interactive environments.
- Simplified handling of dataset factories.
- Centralised pattern resolution via the
CatalogConfigResolverproperty. - Ability to serialise the catalog to configuration and reconstruct it from it.
Read more in the Kedro documentation.
Namespaces
- Added support for running multiple namespaces within a single session with
--namespacesCLI option andnamespacesargument inKedroSession.run()method. - Improved namespace validation efficiency to prevent significant slowdowns when creating large pipelines.
- Added stricter validation to dataset names in the
Nodeclass, ensuring.characters are reserved to be used as part of a namespace. - Added a
prefix_datasets_with_namespaceargument to thePipelineclass which allows users to turn on or off the prefixing of the namespace to the node inputs, outputs, and parameters. - Changed pipeline filtering for namespace to return exact namespace matches instead of partial matches.
Other features and improvements
- Changed the default node name to be formed of the function name used in the node suffixed by a secure hash (SHA-256) based on the function, inputs, and outputs, ensuring uniqueness and improved readability.
- Added an option to select which multiprocessing start method is going to be used on
ParallelRunnervia theKEDRO_MP_CONTEXTenvironment variable. - Added
--only-missing-outputsCLI flag tokedro run. This flag skips nodes when all their persistent outputs exist. - Updated
kedro registry describeto return the node name property instead of creating its own name for the node. - Removed
pre-commit-hooksdependency for new project creation.
Breaking changes to the API
CLI
kedro catalog createcommand has been removed.kedro catalog list,kedro catalog rank, andkedro catalog resolvecommands have been replaced withkedro catalog describe-datasets,kedro catalog list-patternsandkedro catalog resolve-patternscommands, respectively.- The
kedro runoption--namespacehas been removed and replaced with--namespaces. - The
kedro micropkgCLI command has been removed as part of the micro-packaging feature deprecation.
API
- Private methods
_is_projectand_find_kedro_projectare changed tois_kedro_projectandfind_kedro_project. - Renamed instances of
extra_paramsand_extra_paramstoruntime_params. - Removed the
modular_pipelinemodule and moved functionality to thepipelinemodule instead. - Renamed
ModularPipelineErrortoPipelineError. Pipeline.grouped_nodes_by_namespace()was replaced withgroup_nodes_by(group_by), which supports multiple strategies and returns a list ofGroupedNodes, improving type safety and consistency for deployment plugin integrations.- Renamed
session_idparameter torun_idin all runner methods and hooks to improve API clarity and prepare for future multi-run session support. - Removed the following
DataCatalogmethods:_get_dataset(),add_all(),add_feed_dict(),list(), andshallow_copy(). - Changed the output of
runner.run()andsession.run()— it now always returns all pipeline outputs, regardless of catalog configuration. - Removed the
AbstractRunner.run_only_missing()method, an older and underused API for partial runs. Please use--only-missing-outputsCLI instead.
Documentation changes
- Revamped the look and feel of the Kedro documentation, including a new theme and improved navigation with
mkdocsas the documentation engine. - Updated the
DataCatalogdocumentation with improved structure and detailed description of new features. Read the DataCatalog documentation here.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
Migration guide from Kedro 0.19.* to 1.*
See the migration guide for 1.0.0 in the Kedro documentation.
1.0.0rc3
Major features and improvements
Changed DataCatalog.__getitem__ to raise DatasetNotFoundError for missing datasets, aligning with expected dictionary behavior.
Bug fixes and other changes
Breaking changes to the API
Upcoming deprecations for Kedro 1.0.0
Documentation changes
Community contributions
1.0.0rc2
Major features and improvements
- Added
--only-missing-outputsCLI flag tokedro run. This flag skips nodes when all their persistent outputs exist. - Removed the
AbstractRunner.run_only_missing()method, an older and underused API for partial runs. Please use--only-missing-outputsCLI instead.
Bug fixes and other changes
- Improved namespace validation efficiency to prevent significant slowdowns when creating large pipelines
Breaking changes to the API
Upcoming deprecations for Kedro 1.0.0
Documentation changes
Community contributions
1.0.0rc1
Major features and improvements
- Added stricter validation to dataset names in the
Nodeclass, ensuring.characters are reserved to be used as part of a namespace. - Added a
prefix_datasets_with_namespaceargument to thePipelineclass which allows users to turn on or off the prefixing of the namespace to the node inputs, outputs, and parameters. - Changed the default node name to be formed of the function name used in the node suffixed by a secure hash (SHA-256) based on the function, inputs, and outputs, ensuring uniqueness and improved readability.
- Added an option to select which multiprocessing start method is going to be used on
ParallelRunnervia theKEDRO_MP_CONTEXTenvironment variable.
Bug fixes and other changes
- Changed pipeline filtering for namespace to return exact namespace matches instead of partial matches.
- Added support for running multiple namespaces within a single session.
- Updated
kedro registry describeto return the node name property instead of creating its own name for the node.
Documentation changes
- Updated the
DataCatalogdocumentation with improved structure and detailed description of new features.
Community contributions
Breaking changes to the API
- Private methods
_is_projectand_find_kedro_projectare changed tois_kedro_projectandfind_kedro_project. - Renamed instances of
extra_paramsand_extra_paramstoruntime_params. - Removed the
modular_pipelinemodule and moved functionality to thepipelinemodule instead. - Renamed
ModularPipelineErrortoPipelineError. Pipeline.grouped_nodes_by_namespace()was replaced withgroup_nodes_by(group_by), which supports multiple strategies and returns a list ofGroupedNodes, improving type safety and consistency for deployment plugin integrations.- The micro-packaging feature and the corresponding
micropkgCLI command have been removed. - Renamed
session_idparameter torun_idin all runner methods and hooks to improve API clarity and prepare for future multi-run session support. - Removed the following
DataCatalogmethods:_get_dataset(),add_all(),add_feed_dict(),list(), andshallow_copy(). - Removed the CLI command
kedro catalog create. - Changed the output of
runner.run()— it now always returns all pipeline outputs, regardless of catalog configuration.
Migration guide from Kedro 0.19.* to 1.*
See the migration guide for 1.0.0 in the Kedro documentation.
0.19.14
Major features and improvements
- Added execution time to pipeline completion log.
Bug fixes and other changes
- Fixed a recursion error in custom datasets when
_describe()accessedself.__dict__.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
0.19.13
Major features and improvements
- Unified
pipeline()andPipelineinto a single module (kedro.pipeline), aligning with thenode()/Nodedesign pattern and improving namespace handling.
Bug fixes and other changes
- Fixed bug where project creation workflow would use the
mainbranch version ofkedro-startersinstead of the respective release version. - Fixed namespacing for
confirmsduring pipeline creation to supportIncrementalDataset. - Fixed bug where
OmegaConfcause an error during config resolution with runtime parameters. - Cached
inputsinNodewhen created from dictionary for better performance. - Enabled pluggy tracing only when logging level is set to
DEBUGto speed up the execution of project runs.
Upcoming deprecations for Kedro 1.0.0
- Added a deprecation warning for catalog CLI commands. The following commands will be replaced with their alternatives -
kedro catalog rank,kedro catalog list,kedro catalog resolveand thekedro catalog createcommand will be removed. - Added a deprecation warning for
KedroDataCatalogthat will replaceDataCatalogwhile adopting the originalDataCatalogname. - Add deprecation warning for
--namespaceoption forkedro run. It will be replaced with--namespacesoption which will allow for running multiple namespaces together. - The
modular_pipelinemodule is deprecated and will be removed in Kedro 1.0.0. Use thepipelinemodule instead.
Note: On March 20th, a security vulnerability, CVE-2024-12215, was identified in Kedro. This issue stems from the deprecated micropackaging functionality, which is scheduled for removal in the upcoming Kedro 1.0 release. While we agree with the CVE assigned, this vulnerability only poses a risk if you pull a malicious micropackage from an untrusted source. If you're concerned, we recommend avoiding the micropackaging feature for now and upgrading to Kedro 1.0 once it's released.
Documentation changes
- Updated Dask deployment docs.
- Added non-jupyter environment integration page (for example Marimo) with dynamic Kedro session loading.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
0.19.12
Major features and improvements
- Added
KedroDataCatalog.filter()to filter datasets by name and type. - Added
Pipeline.grouped_nodes_by_namespaceproperty which returns a dictionary of nodes grouped by namespace, intended to be used by plugins to facilitate deployment of namespaced nodes together. - Added support for cloud storage protocols in
--conf-source, allowing configuration to be loaded from remote locations such as S3.
Bug fixes and other changes
- Added
DataCatalogdeprecation warning. - Updated
_LazyDatasetrepresentation when printingKedroDataCatalog. - Fixed
MemoryDatasetto inferassigncopy mode for Ibis Tables, which previously would be inferred asdeepcopy. - Fixed pipeline packaging issue by ensuring
pipelines/__init__.pyexists when creating new pipelines. - Changed the execution of
SequentialRunnerto not use an executor pool to ensure it's single threaded. - Fixed
%load_nodemagic command to work with Jupyter Notebook>=7.2.0. - Remove
7: Kedro Vizfrom Kedro tools. - Updated node grouping API to only group on first level of namespace.
Documentation changes
- Added documentation for Kedro's support for Delta Lake versioning.
- Added documentation for Kedro's support for Iceberg versioning.
- Added documentation for Kedro's nodes grouping in deployment.
- Fixed a minor grammatical error in Kedro-Viz installation instructions to improve documentation clarity.
- Improved the Kedro VSCode extension documentation.
- Updated the recommendations for nesting namespaces.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
0.19.11
Major features and improvements
- Implemented
KedroDataCatalog.to_config()method that converts the catalog instance into a configuration format suitable for serialization. - Improve OmegaConfigLoader performance.
- Replaced
trufflehogwithdetect-secretsfor detecting secrets within a code base. - Added support for
%load_ext kedro.
Bug fixes and other changes
- Added validation to ensure dataset versions consistency across catalog.
- Fixed a bug in project creation when using a custom starter template offline.
- Added
nodeimport to the pipeline template. - Update error message when executing kedro run without pipeline.
- Safeguard hooks when user incorrectly registers a hook class in settings.py.
- Fixed parsing paths with query and fragment.
- Remove lowercase transformation in regex validation.
- Moved
kedro-catalogJSON schema tokedro-datasets. - Updated
Partitioned dataset lazy savingdocs page. - Fixed
KedroDataCatalogmutation after pipeline run. - Made
KedroDataCatalog._datasetscompatible withDataCatalog._datasets.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
0.19.10
Major features and improvements
- Add official support for Python 3.13.
- Implemented dict-like interface for
KedroDataCatalog. - Implemented lazy dataset initializing for
KedroDataCatalog. - Project dependencies on both the default template and on starter templates are now explicitly declared on the
pyproject.tomlfile, allowing Kedro projects to work with project management tools likeuv,pdm, andrye.
Note: KedroDataCatalog is an experimental feature and is under active development. Therefore, it is possible we'll introduce breaking changes to this class, so be mindful of that if you decide to use it already. Let us know if you have any feedback about the KedroDataCatalog or ideas for new features.
Bug fixes and other changes
- Added I/O support for Oracle Cloud Infrastructure (OCI) Object Storage filesystem.
- Fixed
DatasetAlreadyExistsErrorforThreadRunnerwhen Kedro project run and using runner separately.
Breaking changes to the API
Documentation changes
- Added Databricks Asset Bundles deployment guide.
- Added a new minimal Kedro project creation guide.
- Added example to explain how dataset factories work.
- Updated CLI autocompletion docs with new Click syntax.
- Standardised
.parquetsuffix in docs and tests.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
0.19.9
Major features and improvements
- Dropped Python 3.8 support.
- Implemented
KedroDataCatalogrepeatingDataCatalogfunctionality with a few API enhancements:- Removed
_FrozenDatasetsand access datasets as properties; - Added get dataset by name feature;
add_feed_dict()was simplified to only add raw data;- Datasets' initialisation was moved out from
from_config()method to the constructor.
- Removed
- Moved development requirements from
requirements.txtto the dedicated section inpyproject.tomlfor project template. - Implemented
Protocolabstraction for the currentDataCatalogand adding new catalog implementations. - Refactored
kedro runandkedro catalogcommands. - Moved pattern resolution logic from
DataCatalogto a separate component -CatalogConfigResolver. UpdatedDataCatalogto useCatalogConfigResolverinternally. - Made packaged Kedro projects return
session.run()output to be used when running it in the interactive environment. - Enhanced
OmegaConfigLoaderconfiguration validation to detect duplicate keys at all parameter levels, ensuring comprehensive nested key checking.
Note: KedroDataCatalog is an experimental feature and is under active development. Therefore, it is possible we'll introduce breaking changes to this class, so be mindful of that if you decide to use it already. Let us know if you have any feedback about the KedroDataCatalog or ideas for new features.
Bug fixes and other changes
- Fixed bug where using dataset factories breaks with
ThreadRunner. - Fixed a bug where
SharedMemoryDataset.existswould not call the underlyingMemoryDataset. - Fixed template projects example tests.
- Made credentials loading consistent between
KedroContext._get_catalog()andresolve_patternsso that both use_get_config_credentials()
Breaking changes to the API
- Removed
ShelveStoreto address a security vulnerability.
Documentation changes
- Fix logo on PyPI page.
- Minor language/styling updates.