Skip to content
Merged
Changes from 2 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
ea78881
PEP 9999: Recording provenance of installed packages
Mar 27, 2023
81a9dd7
Rename to PEP-710
Mar 27, 2023
29b86f8
Add PEP-710 to CODEOWNERS
Mar 28, 2023
ac86eda
Apply suggestions from code review
Mar 28, 2023
51ccbed
Apply suggestions from code review
Mar 28, 2023
1d394c4
Apply suggestions from code review
Mar 28, 2023
8a86906
Remove duplicate topic
Mar 28, 2023
3f0478b
Add Christopher A. M. Gerlach to the Acknowledgements section
Mar 28, 2023
c99e676
Fix name in the Acknowledgements section
Mar 28, 2023
d2cb745
Move Backwards Compatibility after Specification
Mar 29, 2023
a4334fb
Add How to Teach This section
Mar 29, 2023
e1b3106
Add Security Implications section
Mar 29, 2023
28d93a0
Add Reference Implementation section
Mar 29, 2023
8f2e4e4
Fix reference to pip-preserve
Mar 29, 2023
96f0a5e
Apply suggestions from code review
Mar 30, 2023
9eb94f8
s/*.dist-info/.dist-info/
Mar 30, 2023
2356439
Add Rationale section
Mar 30, 2023
ca729f8
Fix reference to a term
Mar 30, 2023
00ec0ea
Use a reference to the pip installation report thraed
Mar 30, 2023
bc55397
Apply suggestions from code review
Mar 30, 2023
de7cf45
Adjust Backwards Compatibility section
Mar 31, 2023
2a29627
State main difference between direct_url.json and provenance_url.json
Mar 31, 2023
3b09caf
State Conda's conda-meta directory created by Conda
Mar 31, 2023
8cb9ce9
Mention compatibility considerations with direct_url.json
Mar 31, 2023
7939192
Remove a leftover from review
Mar 31, 2023
b400b39
Fix links to project sites
Mar 31, 2023
eb3efa9
Apply suggestions from code review
Mar 31, 2023
6c9e95c
Create appendix for the tools survey
Mar 31, 2023
dfb21eb
Apply suggestions from code review
Apr 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
357 changes: 357 additions & 0 deletions pep-0710.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,357 @@
PEP: 710
Title: Recording provenance of installed packages
Author: Fridolín Pokorný <fridolin.pokorny at gmail.com>
Sponsor: Donald Stufft <[email protected]>
PEP-Delegate: Paul Moore <[email protected]>
Discussions-To: https://discuss.python.org/t/draft-pep-recording-provenance-of-installed-packages/24838
Status: Draft
Type: Process
Content-Type: text/x-rst
Created: 27-Mar-2023
Post-History:

Abstract
========

This PEP describes a way to record provenance of Python distributions
installed. The record is created by an installer and is available to users in
a form of a JSON file ``provenance_url.json`` in ``.dist-info`` directory. The
mentioned JSON file captures additional metadata to allow recording a URL to a
Python distribution together with the installed Python distribution hash. This
proposal is built on top of :pep:`610` following `its corresponding canonical
PyPA spec
<https://packaging.python.org/en/latest/specifications/direct-url/>`__ and
complements ``direct_url.json`` with ``provenance_url.json`` file when packages
are identified by a name, and either a version.

Motivation
==========

Installing a Python package involves downloading a distribution from an index
and extracting its content to an appropriate place. After the installation
process is done, information about the distribution used as well as its source
is generally lost. Nevertheless, there are use cases for keeping records of
distributions used for installing packages and their provenance.

Python wheels can be built with different compiler flags or supporting
different wheel tags. In both cases, users might get into a situation in which
multiple wheels might be considered by installers (possibly from different
package indexes) and immediately finding out which wheel file was actually used
during the installation might be helpful. This way, developers can use
information about wheels to debug issues making sure the desired wheel was
actually installed. Another use case could be tools reporting software
installed, such as tools reporting SBOM (Software Bill of Material), that might
give more accurate reports. Yet another use case could be reconstruction of the
Python environment by pinning each installed package to specific distribution
consumed from a Python packagee index.

The motivation described in this PEP is an extension to :pep:`610`. Besides
stating information about packages installed using a direct URL, installers SHOULD
record information also for packages installed from Python package indexes when
identified by their name, and optionally their version.

Specification
=============

The ``provenance_url.json`` file SHOULD be created in the ``*.dist-info``
directory by installers when installing a distribution identified by their
name, and optionally their version specifier.

This file MUST NOT be created when installing a distribution from a requirement
specifying a direct URL reference (including a VCS URL).

Only one of ``provenance_url.json`` and ``direct_url.json`` from :pep:`610`
files MAY be present in ``*.dist-info`` directory.

The ``provenance_url.json`` JSON file MUST be a dictionary, compliant with
:rfc:`8259` and UTF-8 encoded.

If present, it MUST contain exactly two keys. The first one is ``url``, with
type ``string``. The second key MUST be ``archive_info`` with a value defined
below.

The ``url`` field MUST state a URL to the installed distribution. If a wheel is
built from a source distribution, the ``url`` field MUST point to the used
source distribution. On the other hand, when a wheel is installed, the
``url`` field MUST keep a URL of the installed wheel. Following :pep:`610`, the
``url`` field MUST be stripped of any sensitive authentication information, for
security reasons.

The user:password section of the URL MAY however be composed of environment
variables, matching the following regular expression::

\$\{[A-Za-z0-9-_]+\}(:\$\{[A-Za-z0-9-_]+\})?

Additionally, the user:password section of the URL MAY be a well-known,
non-security sensitive string. A typical example is ``git`` in the case of an
URL such as ``ssh://[email protected]``.

The value of ``archive_info`` MUST be a dictionary with a single key
``hashes``. The ``hashes`` key is a dictionary mapping a hash name to a
hex-encoded digest of the file referenced by the ``url`` field. Multiple hashes
can be included, and it is up to the consumer to decide what to do with
multiple hashes (it may validate all of them or a subset of them, or nothing at
all).

Each hash MUST be one of the single argument hashes provided by
``hashlib.algorithms_guaranteed`` except for ``sha1`` and ``md5`` hashes. At
the time of writing this PEP, the listing does not include multi-argument
hashes ``shake_128`` and ``shake_256``:

.. code-block:: python

>>> import hashlib
>>> sorted(hashlib.algorithms_guaranteed - {"shake_128", "shake_256", "sha1", "md5"})
['blake2b', 'blake2s', 'sha224', 'sha256', 'sha384', 'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512', 'sha512']

Each hash MUST be referenced by the canonical name of the hash, always lower case.

Hashes ``sha1`` and ``md5`` MUST NOT be present, respecting security
limitations of these hash algorithms. On the other hand, hash ``sha256`` SHOULD
be included.

Installers that cache installed distributions from an index SHOULD keep
information related to the cached distribution, so that
``provenance_url.json`` file can be created even when installing distributions
from installer's cache.

Examples
========

Examples of a valid provenance_url.json
---------------------------------------

A valid ``provenance_url.json`` stating multiple hashes:

.. code:: json

{
"archive_info": {
"hashes": {
"blake2s": "fffeaf3d0bd71dc960ca2113af890a2f2198f2466f8cd58ce4b77c1fc54601ff",
"sha256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f",
"sha3_256": "c856930e0f707266d30e5b48c667a843d45e79bb30473c464e92dfa158285eab",
"sha512": "6bad5536c30a0b2d5905318a1592948929fbac9baf3bcf2e7faeaf90f445f82bc2b656d0a89070d8a6a9395761f4793c83187bd640c64b2656a112b5be41f73d"
}
},
"url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
}

A valid ``provenance_url.json`` stating a single hash entry:

.. code:: json

{
"archive_info": {
"hashes": {
"sha256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f"
}
},
"url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
}

A valid ``provenance_url.json`` stating a source distribution which was used to
build and install a wheel:

.. code:: json

{
"archive_info": {
"hashes": {
"sha256": "8bfe29f17c10e2f2e619de8033a07a224058d96b3bfe2ed61777596f7ffd7fa9"
}
},
"url": "https://files.pythonhosted.org/packages/1d/43/ad8ae671de795ec2eafd86515ef9842ab68455009d864c058d0c3dcf680d/micropipenv-0.0.1.tar.gz"
}

Examples of an invalid provenance_url.json
------------------------------------------

The following example includes ``hash`` key in the ``archive_info`` dictionary
as originally designed in :pep:`610` and the data structure documented in [3]_.
The ``hash`` key MUST NOT be present to prevent from any possible confusion
with ``hashes`` and additional checks that would be required to keep hash
values in sync.

.. code:: json

{
"archive_info": {
"hash": "sha256=236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f",
"hashes": {
"sha256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f"
}
},
"url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
}

Another example demonstrates an invalid hash name. The referenced hash does not
correspond to canonical hash name described in this PEP and `Python docs
<https://docs.python.org/3/library/hashlib.html#hashlib.hash.name>`__.

.. code:: json

{
"archive_info": {
"hashes": {
"SHA-256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f"
}
},
"url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
}


Example pip commands and their effect on provenance_url.json and direct_url.json
--------------------------------------------------------------------------------

Commands that generate a ``direct_url.json`` file but do not generate
```provenance_url.json`` file. These examples follow examples from :pep:`610`:

* ``pip install https://example.com/app-1.0.tgz``
* ``pip install https://example.com/app-1.0.whl``
* ``pip install “git+https://example.com/repo/app.git#egg=app&subdirectory=setup”``
* ``pip install ./app``
* ``pip install file:///home/user/app``
* ``pip install –editable "git+https://example.com/repo/app.git#egg=app&subdirectory=setup"`` (in which case, ``url`` will be the local directory where the git repository has been cloned to, and ``dir_info`` will be present with ``"editable": true`` and no ``vcs_info`` will be set)
* ``pip install -e ./app``

Commands that generate a ``provenance_url.json`` file but do not generate
``direct_url.json`` file:

* ``pip install app``
* ``pip install app~=2.2.0``
* ``pip install app –no-index –find-links "https://example.com/"``

This behaviour can be tested using changes to pip introduced in [1]_.

Rejected Ideas
==============

Naming the file direct_url.json instead of provenance_url.json
--------------------------------------------------------------

To preserve backwards compatibility with :pep:`610`, the file cannot be named
``direct_url.json`` (from :pep:`610`):

This file MUST NOT be created when installing a distribution from an other
type of requirement (i.e. name plus version specifier).

The change might introduce backwards compatibility issues for consumers of
``direct_url.json`` who rely on its presence only when distributions are
installed using a direct URL reference.

Deprecate direct_url.json and use only provenance_url.json
----------------------------------------------------------

File ``direct_url.json`` is already well established in :pep:`610` and is
already used by installers. For example, ``pip`` uses ``direct_url.json`` to
report a direct URL reference on ``pip freeze``. Deprecating
``direct_url.json`` would require additional changes to the ``pip freeze``
implementation in pip (see [2]_) and could introduce backwards compatibility
issues for already existing ``direct_url.json`` consumers.

Keeping the hash key in the archive_info dictionary
---------------------------------------------------

:pep:`610` and `its corresponding canonical PyPA spec
<https://packaging.python.org/en/latest/specifications/direct-url/>`__ discuss
the possibility to state ``hash`` key alongside the ``hashes`` key in the
``archive_info`` dictionary. This PEP explicitly discards the ``hash`` key in
the ``provenance_url.json`` file and expects only ``hashes`` key to be present.
By doing so we eliminate possible redundancy in the file, possible confusion,
and any additional checks that would need to be done to make sure hashes are in
sync.

Making the hashes field optional
--------------------------------

:pep:`610` and `its corresponding canonical PyPA spec
<https://packaging.python.org/en/latest/specifications/direct-url/>`__
recommend stating the ``hashes`` field of the ``archive_info`` in the
``direct_url.json`` file but allows ignoring it under certain circumstances
following :rfc:`2119`:

A hashes key SHOULD be present as a dictionary mapping a hash name to a hex
encoded digest of the file.

This PEP enforces availability of the ``hashes`` field of the ``archive_info``
in the ``provenance_url.json`` file if ``provenance_url.json`` file is created:

The value of ``archive_info`` MUST be a dictionary with a single key
``hashes``.

By doing so, consumers of ``provenance_url.json`` file can perform check on
artifact digests when ``provenance_url.json`` file is created by installers.

Open Issues
===========

Availability of the provenance_url.json file in Conda
-----------------------------------------------------

We would like to get feedback on the ``provenance_url.json`` file by Conda
maintainers or developers. It is not clear whether Conda would like to adopt
the ``provenance_url.json`` file.

Using provenance_url.json in downstream installers
--------------------------------------------------

The proposed ``provenance_url.json`` file was meant to be adopted primarily by
Python installers. Other installers, such as apt or dnf, might record
provenance of the installed downstream Python distributions in their specific
way that can be specific to downstream package management. The proposed file is
not expected to be created by these downstream package installers and thus they
were intentionally left out of this PEP. However, any input by developers or
maintainers of these installers is valuable to possibly enrich the
``provenance_url.json`` file with information that would help in some way.

Backwards Compatibility
=======================

Since this PEP specifies a new file in the ``*.dist-info`` directory, there are
no backwards compatibility implications to consider in the ``provenance_url.json``
file itself. Also, this proposal does not make any changes to the
``direct_url.json`` described in :pep:`610` and `its corresponding canonical
PyPA spec
<https://packaging.python.org/en/latest/specifications/direct-url/>`__.

The content of ``provenance_url.json`` file was designed in a way to eventually
allow installers reuse some of the logic supporting :pep:`610` when a
direct URL refers to a source archive or a wheel.

References
==========

The following changes were done to pip to support this PEP:

.. [1] `A patch to pip introducing provenance_url.json as discussed in this PEP
<https://github.com/fridex/pip/pull/1/>`__

.. [2] `Changes to pip to support the decision for creating
provenance_url.json instead of stating provenance in already existing
direct_url.json <https://github.com/fridex/pip/pull/2/>`__

.. [3] `Direct URL Data Structure
<https://packaging.python.org/en/latest/specifications/direct-url-data-structure/>`__

Acknowledgements
================

Thanks to Dustin Ingram, Brett Cannon, Paul Moore for the initial discussion in
which this idea originated.

Thanks to Donald Stufft, Ofek Lev, and Trishank Kuppusamy for early feedback
and support to work on this PEP.

Thanks to Gregory P. Smith and Stéphane Bidoul for reviewing this PEP and
providing valuable suggestions.

Thanks to Stéphane Bidoul and Chris Jerdonek for :pep:`610`.

Last, but not least, thanks to Donald Stufft for sponsoring this PEP.

Copyright
=========

This document is placed in the public domain or under the CC0-1.0-Universal
license, whichever is more permissive.