Skip to content
Merged
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions maintainer_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
## Torchvision maintainers guide

This document aims at documenting user-facing policies / principles used when
developing and maintaining torchvision. Other maintainer info (e.g. release
process) can be found in the internal wiki.

### What is public and what is private?

For the Python API, torchvision largely follows the [torch core
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm never sure how we should call core: torch or PyTorch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

over here it's usually "torch core" or "core", but "PyTorch" isn't ambiguous either. Happy with whatever you prefer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I'm in favor of "torch" since with that we don't have a disconnect in the naming scheme when talking about "torchvision". (I really hate that in conda install pytorch torchvision -c pytorch)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be pedantic but "torch" may refer to the original Lua torch ? https://en.wikipedia.org/wiki/Torch_(machine_learning)

policy](https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation)
which is consistent with other major packages (numpy, scikit-learn etc.).
We recognize that his policy is somewhat imperfect for some edge cases, and that
it's difficult to come up with an accurate technical definition. In broad terms,
which are usually well understood by user, the policy is that:

- modules that can be accessed without leading underscore are public
- objects in a public file that don't have a leading underscore are public
- objects that start with a leading underscore are private unless they're
exposed / aliased as public in a public `__init__.py` file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the situation discussed offline, i.e. defining public function in a private namespace and expose them in a public one? If so, can we clarify the text? The "exposed / aliased" implies for me that if we expose _foo in a public __init__.py, it can be considered public. Was that your intention?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to clarify, I tried "exposed / aliased as public" but maybe it's not clear enough. Perhaps

exposed / aliased as public (i.e. without a leading underscore) in a public __init__.py file

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a simpler way is to just say:

Any object exposed in a public __init__.py is public unless it has a leading underscore in its immediate name

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like that a lot better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's just a special case of what we have above:

  • objects in a public file that don't have a leading underscore are public

Do you think it's worth re-iterating that with __init__.py ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes, since __init__.py itself is not a public file given its underscores. But since have stated above that our definition might technically not be perfect and this should be common knowledge for Python devs, I'm ok to merge the two.

- class attributes are public iff they have no leading underscore

The public API has backward-compatible (BC) guarantees defined in our
deprecation policy (see below). The private API has not BC guarantees.

For C++, code is private. If a change breaks fbcode, fix fbcode or revert the
change. We should be careful about models running in prod and relying on
torchvision ops.

The `test` folder is not importable and is **private.** Even meta-internal
projects should *not* rely on it (it has happned in the past and is now
programmatically impossible.)

The training references do not have BC guarantees. Breaking changes are
possible, but we should make sure that the tutorials are still running properly,
and that their intended narrative is preserved (by e.g. checking outputs,
etc.).

The rest of the folders (build, android, ios) are private and have no BC
guarantees.

### Deprecation policy.

Because they're disruptive, **deprecations should only be used sparingly**.

We largely follow the [torch core
policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy):
breaking changes require a deprecation period of at least 2 versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume that this document is not intended to users. However this part and other part about C++ being private, it worth exposing to the users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree...
At the same time, I don't know where we could expose it properly. Pytorch typically doesn't have that in the online docs (I think?), it's rather in the Wiki which most users wouldn't unless they're actively searching for it.

Perhaps changing the title from maintiner_guie to policies or something like that? Happy to hear any other suggestion


Deprecations should clearly indicate their deadline in the docs and warning
messages. Avoid not committing to a deadline, or keeping deprecated APIs for too
long: it gives no incentive for users to update their code, sends conflicting
messages ("why was this API removed while this other one is still around?"), and
accumulates debt in the project.

### Should this attribute be public? Should this function be private?

When designing an API it’s not always obvious what should be exposed as public,
and what should be kept as a private implementation detail. The following
guidelines can be useful:

* Functional consistency throughout the library is a top priority, for users and
developers’ sake. In doubt and unless it’s clearly wrong, expose what other
similar classes expose.
* Think really hard about the users and their use-cases, and try to expose what
they would need to address those use-cases. Aggressively keep everything else
private. Remember that the “private -> public” direction is way smoother than
the “public -> private” one: in doubt, keep it private.
* When thinking about use-cases, the general API motto applies: make what’s
simple and common easy, and make what’s complex possible (80% / 20% rule).
There might be a ~1% left that’s not addressed: that’s OK. Also, **make what’s
wrong very hard**, if not impossible.

As a good practice, always create new files and even classes with a leading
underscore in their name. This way, everything is private by default and the
only public surface is explicitly present in an `__init__.py` file.