-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
The general goal of this proposal is to formalize what it means to be considered API in julia, to give a framework to talk about what even is a "breaking" change to make it easier for developers to provide more documentation for their packages as well as avoiding breakage in the first place. The reasoning in this document comes from the current practices in use in the ecosystem, as well as type theoretic requirements of the type system, though having a background in type theory is not required to participate in the discussion.
Please do reply if you have something to add/comment on!
Motivation
The current stance of Base and much of the package ecosystem of what is considered to be API under semver is "If it's in the manual, it's API". Other approaches are "If it has a docstring it's API", "If it's exported, it's API", "If it's in the Internal submodule it's API" or "If we explicitly mention it, it's API". These approaches are inconsistent with each other and have a number of issues:
- Discoverability
- As a user writing code, it is nigh impossible to figure out whether a given function they found being used in some other code is stable under semver or not, unnecessarily increasing the chances for accidental breakage later on.
- Tooling can't reliably query/show whether some symbol/object is stable API or not, because the inconsistencies between the approaches only lend themselves to bespoke solutions on a package by package basis.
- Maintainability
- The fuzziness of some of these approaches disincentivizes writing documentation for internal functionality, for fear of others taking the existence of a docstring as guarantor for the function in question being API.
- This fuzziness and lack of internal documentation increases the "time to first contribution" by newcomers and returning developers as well, due to having to reverse engineer the package to contribute in the first place.
- "Unofficial" API surfaces grow that are using internals (often because the developers in question wrote the internal in the first place), leading to a sort-of tacit implication that it is ok to use internal functionality of Base/a package.
- Accidental exposure of an internal as API (even just perceived API!) generally means that removing that internal would necessitate a breaking change - this doesn't scale well, and would quickly lead to not being able to change any code.
- Due to the fuzziness of what exactly is considered API, deprecations are difficult -
@deprecateis hard to use correctly and often not used at all.
- Reviewability
- It is all too easy to accidentally rely on an internal of another package due to the bad discoverability of what actually is API. This increases the burden on reviewers to check contributions for whether or not the functionality a PR is introducing is relying on/bringing in internal uses, even if they may not be familiar with the dependency in question.
- The lax stance around what exactly is considered API and the fact that internals rarely get docstrings or even comments explaining what the code does means that contributions don't usually get checked for whether they (accidentally or not) increase the API surface or whether a particular aspect of a contribution or new feature ought to have a documented API in the first place.
This proposal has three parts - first, presenting a full-stack solution to both the Discoverability and Maintainability issues described above, and second, proposing a small list of things that could be done with the proposed feature to make Reviewability easier. Finally, the third part is focused on a (preliminary, nonexhaustive) "How do we get there" list of things required to be implemented for this proposal.
There is also a FAQ list at the end, hoping to anticipate some questions about this proposal that already came up in previous discussions and thoughts about this design.
The api keyword
The main proposal for user-facing interactions with declaring whether an object is the new api keyword. The keyword can be placed in front of definitions in global scope, like struct, abstract type, module, function and const/type annotated global variables. Using api is declaring the intent of a developer, about which parts of the accessible symbols they consider API under semver and plan to support in newer versions.
When a project/environment importing a package wants to access a symbol not marked as API (if it is not the same project/environment originally defining the symbol), a warning is displayed, making the user aware of the unsupported access but doesn't otherwise hinder it. This behavior should be, to an extent, configurable, to support legitimate accesses of internals (insofar those exist). There is the caveat that silencing these warnings makes no difference to whether or not the access is supported by a developer. This is intended to provide an incentive to either use the supported subset of the API, or encourage a user to start a discussion with the developer to provide an API for what they would like to do. The result of such a discussion can be a "No, we won't support this"! This, however, is a far more desirable outcome to accessing internals and fearing breakage later on, if it would have been avoided by such a discussion.
The following sections explain how api interacts with various uses, what the interactions ought to mean semantically as well as the reasoning for choosing the semantics to be so.
function
Consider this example:
api function foo(arg1, arg2)
arg1 + arg2
end
# equivalently:
# api foo(arg1, arg2) = arg1 + arg2This declares the function foo, with a single method taking two arguments of any type. The api keyword, when written in front of such a method definition, only declares the given method as API. If we were to later on define a new method, taking different arguments like so
function foo(arg1::Int, arg2::Float64)
arg1 * arg2
endthe new method would not be considered API under semver. The reasoning for this is as simple - once a method (any object, really) is included in a new release as API, removing it is a breaking change, even if that inclusion as API was accidental. As such, being conservative with what is considered API is a boon to maintainability.
Being declared on a per-method case means the following:
- A method annotated with
apiMAY NOT be removed in a non-breaking release, without splitting the existing method into multiple definitions that are able to fully take on the existing dispatches of the previous single method. In type parlance, this means that the type union of the signatures of the replacement methods MUST be at least as specific as the original method, but MAY be less specific. This is to prevent introducing accidentalMethodErrors where there were none before. - A function/method annotated with
apiMAY NOT introduce an error where there was none before, without that being done in a breaking release. - A function/method annotated with
apiMAY change the return type of a given set of input arguments, even in a non-breaking release. Developers are free to strengthen this toMAY NOTif they feel it appropriate for their function/method. - A function/method annotated with
apiMAY remove an error and introduce a non-throwing result. - A function/method annotated with
apiMAY change one error type to another, but developers are free to strengthen this toMAY NOTif they feel it appropriate for their function.
This is not enforced in the compiler (it can't do so without versioned consistency checking between executions/compilations, though some third party tooling could implement such a mechanism for CI checks or similar), but serves as a semantic guideline to be able to anticipate breaking changes and allow developers to plan around and test for them easier. The exact semantics a method that is marked as API must obey to be considered API, apart of its signature and the above points, are up to the developers of a function.
Depending on usecase (for example, some interface packages), it is desirable to mark all methods of a function as API. As a shorthand, the syntax
api function bar enddeclares ALL methods of bar to be public API as an escape hatch - i.e., the above syntax declares the function bar to be API, not just individual methods. In Base-internal parlance, the api keyword on a single method only marks an entry in the method table as API, while the use on a zero-arg definition marks the whole method table as API. An API mark on the whole method table trumps a nonexistent mark on a single method - it effectively acts as if there were a method taking a sole ::Vararg{Any} argument and marking that as API.
struct
The cases elaborated on here are (effectively) already the case today, and are only mentioned here for clarity. They are (mostly) a consequence of subtyping relationships and dispatch.
Consider this example:
abstract type AbstractFoo end
api struct MyStruct{T, S} <: AbstractFoo
a::T
b::S
endA struct like the above annotated with api guarantees that the default constructor methods are marked as api. The subtyping relationship is considered API under semver, up to and including Any. The existence of the fields a and b are considered API, as well as their relationship to the type parameters T and S.
In the example above, the full chain MyStruct{T,S} <: AbstractFoo <: Any is considered API under semver, which means that methods declared as taking either an AbstractFoo or an Any argument must continue to also take objects of type MyStruct. This means that changing a definition like the one above into one like this
abstract type AbstractBar end
abstract type AbstractFoo end
api struct MyStruct{T,S} <: AbstractBar
a::T
b::S
endis a breaking change under semver. It is however legal to do the following:
abstract type AbstractFoo end
abstract type AbstractBar <: AbstractFoo end
api struct MyStruct{T,S} <: AbstractBar
a::T
b::S
endbecause the old subtyping chain MyStruct{T,S} <: AbstractFoo <: Any is a subchain of the new chain MyStruct{T,S} <: AbstractBar <: AbstractFoo <: Any. That is, it is legal to grow the subtyping chain downwards.
Notably, making MyStruct API does not mean that AbstractFoo itself is API, i.e. adding new subtypes to AbstractBar is not supported and is not considered API purely by annotating a subtype as API.
Since the new type in a changing release must be useable in all places where the old type was used, the only additional restriction placed on MyStruct as defined above is that no type parameters may be removed. Due to the way dispatch is lazy in terms of matching type parameters, it is legal to add more type parameters without making a breaking change (even if this makes uses of things like MyStruct{S,T} in structs containing objects of this type type unstable).
In regards to whether field access is considered API or not, it is possible to annotate individual fields as api:
api struct MyStruct{T,S}
a::T
api b::S
endThis requires the main struct to be annotated as api as well - annotating a field as API without also annotating the struct as API is illegal. This means that accessing an object of type MyStruct via getfield(::MyStruct, :b) or getproperty(::MyStruct, :b) is covered under semver and considered API. The same is not true of the field a, its type or the connection to the first type parameter, the layout of MyStruct or the internal padding bytes that may be inserted into instances of MyStruct.
abstract type
abstract type behaves similarly to struct, in that it is illegal to remove a type from a subtype chain while it being legal to extend the chain downwards or to introduce new supertypes in the supertype chain.
Consider this example:
abstract type AbstractBar end
api abstract type MyAbstract <: AbstractBar end
# MyAbstract <: AbstractBar <: AnyThe following changes do not require a breaking version:
# introducing a supertype
abstract type AbstractBar end
abstract type AbstractFoo <: AbstractBar end
api abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: AbstractFoo <: AbstractBar <: AnyThe following changes require a new breaking version:
# removing a supertype
api abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: <: Any# removing the `api` keyword
abstract type AbstractBar end
abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: AbstractBar <: AnyWhat the api keyword used on abstract types effectively means for the users of a package is that it is considered API to subtype the abstract type, to opt into some behavior/set of API methods/dispatches the package provides, as long as the semantics of the type (usually detailed in its docstring) are followed. In particular, this means that methods like api function foo(a::MyAbstract) are expected to work with new objects MyConcrete <: MyAbstract defined by a user, but methods like function bar(a::MyAbstract) (note the lack of api) are not.
At the same time, a lack of api can be considered an indicator that it is not generally expected nor supported to subtype the abstract type in question.
type annotated/const global variables
api MyFoo::Foo = ...
api const MyBar = ...
Marking a global const or type annotated variable as api means that the variable binding is considered API under semver and is guaranteed to exist in new versions as well. For a type annotated global variable, both reading from and writing to the variable by users of a package is considered API, while for const global variables only reading is considered API, writing is not API.
The type of a type annotated variable is allowed to be narrowed to a subtype of the original type (i.e. a type giving more guarantees), since all uses of the old assumption (the weaker supertype giving less guarantees) are expected to continue to work.
Non-type-annotated global variables can never be considered API, as the variable can make no guarantees about the object in question and any implicit assumptions of the object that should hold ought to be encoded in an abstract type representing those assumptions/invariants. It is legal to explicitly write api Bar::Any = ....
It should be noted that it is the variable binding that is considered API, not the the variable refers to itself. It is legal to document additional guarantees or requirements of an object being referred to through a binding marked as api.
module
Annotating an entire module expression with api means that all first-level symbols defined in that module are considered API under semver. This means that they cannot be removed from the module and accessing them should return an object compatible with the type that same binding had in the previous version.
Consider this example:
api module Foo
module Bar
api const baz = 1
const bak = 42
end
f() = "hello"
endIn this example, Foo, Foo.f, Foo.Bar, Foo.Bar.baz are considered API, while Foo.Bar.bak is not.
Consider this other example:
module Foo
api module Bar
const baz = 1
const bak = 42
end
f() = "hello"
api g() =
endIn this example, Foo.g, Foo.Bar, Foo.Bar.baz and Foo.Bar.bak are considered API, while Foo and Foo.f are not.
Consider this third example:
module Foo
module Bar
api const baz = 1
const bak = 42
end
f() = "hello"
endOnly Foo.Bar.baz is considered API, the other names in and from Foo and Foo.Bar are not.
Uses
This is a list of imagined uses of this functionality:
- Linting hints in LSP.jl, to make users aware of accidentally using internals of a package/Base.
- API surface tracking over time, especially in regards to test coverage and breaking changes.
- Tree-shaking for Pkg images/static compilation, to only make
apibindings available in the final image/binary/shared object- This is primarily thought of in the context of compilation to a shared object - the
apimarker could be used for not mangling the names of julia functions when compiling an.so, as currently all names are mangled by default (unless marked as@ccallable, if I'm not mistaken, which is limited to taking C-compatible types in a C-style ABI/calling convention).
- This is primarily thought of in the context of compilation to a shared object - the
- Automatic generation of documentation hints about API surface in Documenter.jl
- Easier tracking of deprecated functionality before it is ultimately removed in a breaking change
- Have a PR template mentioning "Does this PR introduce new API?"
- CI Check enforcing that new API bindings have a docstring associated with them
Do you have ideas? Mention them and I'll edit them in here!
Required Implementation Steps
- Parse the
apikeyword in the correct places and produce an expression the compiler can use later on - Add
apimarker handling to methods and the method table implementation, as well as to binding lookup in modules - Make the REPL
help>mode aware ofapitags. - Go through all of Base and the stdlibs and mark the bindings currently residing in the manual as
api
Known Difficulties with the proposal
Expr(:function)does not have space in its first two arguments for additional metadata, so this would need to be added to either a third argument, or create a newExpr(:api_function). Analogous issues exist forExpr(:struct),Expr(:=),Expr(:module)etc. Both approaches potentially require macros to be changed, to be aware ofapiin their expansion.- There is a lot of doc churn, but sadly there's no way around that. My hope is that this can make it easier to just write more docstrings, by virtue of not "promoting" a function to API status just by virtue of having a docstring.
- This proposal requires quite a bit of julia-internals expertise, touching the parser, probably lowering as well, lots of internal objects/state, the REPL and external packages. It's a very large amount of work, and there's likely no chance of any one person being able to implement all of it - this will be a group effort.
FAQ
- Why not a macro instead of new syntax?
- A macro has the disadvantage of not composing nicely with existing macros. Additionally, since this
requires quite deep changes toMethodand other (internal?) objects ofBase, exposing this
as a macro would also mean exposing this as an API to the runtime, even though thisapidistinction is
not about dynamicness - theapisurface of a package really ought to be fixed in a given version,
and not change dynamically at runtime.
- A macro has the disadvantage of not composing nicely with existing macros. Additionally, since this
- What about annotating bindings as
privateinstead?- There is no intention of preventing access to an internal object or otherwise introduce access modifiers
into the language. Additionally, marking things asprivate,internalor similar instead ofapi
means that any time a developer accidentally forgets to add that modifier means a technically breaking
change in a release by adding that. The whole point of this proposal is to avoid this kind of breakage.
- There is no intention of preventing access to an internal object or otherwise introduce access modifiers
- What about naming the keyword
public?- As there is no intention to provide access modifiers, I feel like naming this
publicoverloads this
already overloaded term in the wider programming community too much.public/privateare commonly
associated with access modifiers, which is decidedly not what this proposal is about.
- As there is no intention to provide access modifiers, I feel like naming this
- What about naming the keyword ?
- Bikeshedding the name is always welcome, though I think it hard to compete with the short conciseness of
api, which makes its intent very clear. It would also be prudent to have that discussion after we've
come to a compromise about the desired semantics.
- Bikeshedding the name is always welcome, though I think it hard to compete with the short conciseness of
- How does this proposal interact with
export?exportis a bit tricky, since it doesn't distinguish between methods the wayapidoes. I think
it could work to mark allexported symbols withapias well (this is certainly not without its
own pitfallse..), though I also think thatexport
is a bit of an orthogonal concept toapi, due to the former being about namespacing and the latter
being exclusively about what is considered to be actually supported. I think a good example is
the waysave/loadare implemented with FileIO.jl. While the parent interface package exportssave
andload, packages wishing to register a new file format define new, private functions for these
and register those on loading with FileIO (or FileIO calls into them if they're in the environment).
This means thatMyPkg.saveis not exported fromMyPkg, but is nevertheless a supported API
provided byMyPkg. The intention is to support these kinds of usecases, whereexportis
undesirable for various reasons, while still wishing to provide a documented/supported API surface
to a package.
- Why not prototype this in a package?
- There are various prototypes of similar things in some packages, none of which has been widely adopted as far as I know and I think this is something Base itself could really use as well. Not to mention that starting this in a package is IMO going to splinter the "this is how we define API" discussion further.
I hope this proposal leads to at least some discussion around the issues we face or, failing to get implemented directly, hopefully some other version of more formalized API semantics being merged at some point.