Skip to content

Taking API surface seriously: Proposing Syntax for declaring API #49973

@Seelengrab

Description

@Seelengrab

The general goal of this proposal is to formalize what it means to be considered API in julia, to give a framework to talk about what even is a "breaking" change to make it easier for developers to provide more documentation for their packages as well as avoiding breakage in the first place. The reasoning in this document comes from the current practices in use in the ecosystem, as well as type theoretic requirements of the type system, though having a background in type theory is not required to participate in the discussion.

Please do reply if you have something to add/comment on!

Motivation

The current stance of Base and much of the package ecosystem of what is considered to be API under semver is "If it's in the manual, it's API". Other approaches are "If it has a docstring it's API", "If it's exported, it's API", "If it's in the Internal submodule it's API" or "If we explicitly mention it, it's API". These approaches are inconsistent with each other and have a number of issues:

  • Discoverability
    • As a user writing code, it is nigh impossible to figure out whether a given function they found being used in some other code is stable under semver or not, unnecessarily increasing the chances for accidental breakage later on.
    • Tooling can't reliably query/show whether some symbol/object is stable API or not, because the inconsistencies between the approaches only lend themselves to bespoke solutions on a package by package basis.
  • Maintainability
    • The fuzziness of some of these approaches disincentivizes writing documentation for internal functionality, for fear of others taking the existence of a docstring as guarantor for the function in question being API.
    • This fuzziness and lack of internal documentation increases the "time to first contribution" by newcomers and returning developers as well, due to having to reverse engineer the package to contribute in the first place.
    • "Unofficial" API surfaces grow that are using internals (often because the developers in question wrote the internal in the first place), leading to a sort-of tacit implication that it is ok to use internal functionality of Base/a package.
    • Accidental exposure of an internal as API (even just perceived API!) generally means that removing that internal would necessitate a breaking change - this doesn't scale well, and would quickly lead to not being able to change any code.
    • Due to the fuzziness of what exactly is considered API, deprecations are difficult - @deprecate is hard to use correctly and often not used at all.
  • Reviewability
    • It is all too easy to accidentally rely on an internal of another package due to the bad discoverability of what actually is API. This increases the burden on reviewers to check contributions for whether or not the functionality a PR is introducing is relying on/bringing in internal uses, even if they may not be familiar with the dependency in question.
    • The lax stance around what exactly is considered API and the fact that internals rarely get docstrings or even comments explaining what the code does means that contributions don't usually get checked for whether they (accidentally or not) increase the API surface or whether a particular aspect of a contribution or new feature ought to have a documented API in the first place.

This proposal has three parts - first, presenting a full-stack solution to both the Discoverability and Maintainability issues described above, and second, proposing a small list of things that could be done with the proposed feature to make Reviewability easier. Finally, the third part is focused on a (preliminary, nonexhaustive) "How do we get there" list of things required to be implemented for this proposal.

There is also a FAQ list at the end, hoping to anticipate some questions about this proposal that already came up in previous discussions and thoughts about this design.

The api keyword

The main proposal for user-facing interactions with declaring whether an object is the new api keyword. The keyword can be placed in front of definitions in global scope, like struct, abstract type, module, function and const/type annotated global variables. Using api is declaring the intent of a developer, about which parts of the accessible symbols they consider API under semver and plan to support in newer versions.

When a project/environment importing a package wants to access a symbol not marked as API (if it is not the same project/environment originally defining the symbol), a warning is displayed, making the user aware of the unsupported access but doesn't otherwise hinder it. This behavior should be, to an extent, configurable, to support legitimate accesses of internals (insofar those exist). There is the caveat that silencing these warnings makes no difference to whether or not the access is supported by a developer. This is intended to provide an incentive to either use the supported subset of the API, or encourage a user to start a discussion with the developer to provide an API for what they would like to do. The result of such a discussion can be a "No, we won't support this"! This, however, is a far more desirable outcome to accessing internals and fearing breakage later on, if it would have been avoided by such a discussion.

The following sections explain how api interacts with various uses, what the interactions ought to mean semantically as well as the reasoning for choosing the semantics to be so.

function

Consider this example:

api function foo(arg1, arg2)
   arg1 + arg2
end

# equivalently:
# api foo(arg1, arg2) = arg1 + arg2

This declares the function foo, with a single method taking two arguments of any type. The api keyword, when written in front of such a method definition, only declares the given method as API. If we were to later on define a new method, taking different arguments like so

function foo(arg1::Int, arg2::Float64)
  arg1 * arg2
end

the new method would not be considered API under semver. The reasoning for this is as simple - once a method (any object, really) is included in a new release as API, removing it is a breaking change, even if that inclusion as API was accidental. As such, being conservative with what is considered API is a boon to maintainability.

Being declared on a per-method case means the following:

  • A method annotated with api MAY NOT be removed in a non-breaking release, without splitting the existing method into multiple definitions that are able to fully take on the existing dispatches of the previous single method. In type parlance, this means that the type union of the signatures of the replacement methods MUST be at least as specific as the original method, but MAY be less specific. This is to prevent introducing accidental MethodErrors where there were none before.
  • A function/method annotated with api MAY NOT introduce an error where there was none before, without that being done in a breaking release.
  • A function/method annotated with api MAY change the return type of a given set of input arguments, even in a non-breaking release. Developers are free to strengthen this to MAY NOT if they feel it appropriate for their function/method.
  • A function/method annotated with api MAY remove an error and introduce a non-throwing result.
  • A function/method annotated with api MAY change one error type to another, but developers are free to strengthen this to MAY NOT if they feel it appropriate for their function.

This is not enforced in the compiler (it can't do so without versioned consistency checking between executions/compilations, though some third party tooling could implement such a mechanism for CI checks or similar), but serves as a semantic guideline to be able to anticipate breaking changes and allow developers to plan around and test for them easier. The exact semantics a method that is marked as API must obey to be considered API, apart of its signature and the above points, are up to the developers of a function.

Depending on usecase (for example, some interface packages), it is desirable to mark all methods of a function as API. As a shorthand, the syntax

api function bar end

declares ALL methods of bar to be public API as an escape hatch - i.e., the above syntax declares the function bar to be API, not just individual methods. In Base-internal parlance, the api keyword on a single method only marks an entry in the method table as API, while the use on a zero-arg definition marks the whole method table as API. An API mark on the whole method table trumps a nonexistent mark on a single method - it effectively acts as if there were a method taking a sole ::Vararg{Any} argument and marking that as API.

struct

The cases elaborated on here are (effectively) already the case today, and are only mentioned here for clarity. They are (mostly) a consequence of subtyping relationships and dispatch.

Consider this example:

abstract type AbstractFoo end

api struct MyStruct{T, S} <: AbstractFoo
    a::T
    b::S
end

A struct like the above annotated with api guarantees that the default constructor methods are marked as api. The subtyping relationship is considered API under semver, up to and including Any. The existence of the fields a and b are considered API, as well as their relationship to the type parameters T and S.

In the example above, the full chain MyStruct{T,S} <: AbstractFoo <: Any is considered API under semver, which means that methods declared as taking either an AbstractFoo or an Any argument must continue to also take objects of type MyStruct. This means that changing a definition like the one above into one like this

abstract type AbstractBar end
abstract type AbstractFoo end

api struct MyStruct{T,S} <: AbstractBar
    a::T
    b::S
end

is a breaking change under semver. It is however legal to do the following:

abstract type AbstractFoo end
abstract type AbstractBar <: AbstractFoo end

api struct MyStruct{T,S} <: AbstractBar
    a::T
    b::S
end

because the old subtyping chain MyStruct{T,S} <: AbstractFoo <: Any is a subchain of the new chain MyStruct{T,S} <: AbstractBar <: AbstractFoo <: Any. That is, it is legal to grow the subtyping chain downwards.

Notably, making MyStruct API does not mean that AbstractFoo itself is API, i.e. adding new subtypes to AbstractBar is not supported and is not considered API purely by annotating a subtype as API.

Since the new type in a changing release must be useable in all places where the old type was used, the only additional restriction placed on MyStruct as defined above is that no type parameters may be removed. Due to the way dispatch is lazy in terms of matching type parameters, it is legal to add more type parameters without making a breaking change (even if this makes uses of things like MyStruct{S,T} in structs containing objects of this type type unstable).

In regards to whether field access is considered API or not, it is possible to annotate individual fields as api:

api struct MyStruct{T,S}
    a::T
    api b::S
end

This requires the main struct to be annotated as api as well - annotating a field as API without also annotating the struct as API is illegal. This means that accessing an object of type MyStruct via getfield(::MyStruct, :b) or getproperty(::MyStruct, :b) is covered under semver and considered API. The same is not true of the field a, its type or the connection to the first type parameter, the layout of MyStruct or the internal padding bytes that may be inserted into instances of MyStruct.

abstract type

abstract type behaves similarly to struct, in that it is illegal to remove a type from a subtype chain while it being legal to extend the chain downwards or to introduce new supertypes in the supertype chain.

Consider this example:

abstract type AbstractBar end
api abstract type MyAbstract <: AbstractBar end
# MyAbstract <: AbstractBar <: Any

The following changes do not require a breaking version:

# introducing a supertype
abstract type AbstractBar end
abstract type AbstractFoo <: AbstractBar end
api abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: AbstractFoo <: AbstractBar <: Any

The following changes require a new breaking version:

# removing a supertype
api abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: <: Any
# removing the `api` keyword
abstract type AbstractBar end
abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: AbstractBar <: Any

What the api keyword used on abstract types effectively means for the users of a package is that it is considered API to subtype the abstract type, to opt into some behavior/set of API methods/dispatches the package provides, as long as the semantics of the type (usually detailed in its docstring) are followed. In particular, this means that methods like api function foo(a::MyAbstract) are expected to work with new objects MyConcrete <: MyAbstract defined by a user, but methods like function bar(a::MyAbstract) (note the lack of api) are not.

At the same time, a lack of api can be considered an indicator that it is not generally expected nor supported to subtype the abstract type in question.

type annotated/const global variables

api MyFoo::Foo = ...
api const MyBar = ...

Marking a global const or type annotated variable as api means that the variable binding is considered API under semver and is guaranteed to exist in new versions as well. For a type annotated global variable, both reading from and writing to the variable by users of a package is considered API, while for const global variables only reading is considered API, writing is not API.

The type of a type annotated variable is allowed to be narrowed to a subtype of the original type (i.e. a type giving more guarantees), since all uses of the old assumption (the weaker supertype giving less guarantees) are expected to continue to work.

Non-type-annotated global variables can never be considered API, as the variable can make no guarantees about the object in question and any implicit assumptions of the object that should hold ought to be encoded in an abstract type representing those assumptions/invariants. It is legal to explicitly write api Bar::Any = ....

It should be noted that it is the variable binding that is considered API, not the the variable refers to itself. It is legal to document additional guarantees or requirements of an object being referred to through a binding marked as api.

module

Annotating an entire module expression with api means that all first-level symbols defined in that module are considered API under semver. This means that they cannot be removed from the module and accessing them should return an object compatible with the type that same binding had in the previous version.

Consider this example:

api module Foo
    module Bar
      api const baz = 1
      const bak = 42
    end

    f() = "hello"
end

In this example, Foo, Foo.f, Foo.Bar, Foo.Bar.baz are considered API, while Foo.Bar.bak is not.

Consider this other example:

module Foo
    api module Bar
      const baz = 1
      const bak = 42
    end

    f() = "hello"
    api g() = 
end

In this example, Foo.g, Foo.Bar, Foo.Bar.baz and Foo.Bar.bak are considered API, while Foo and Foo.f are not.

Consider this third example:

module Foo
    module Bar
      api const baz = 1
      const bak = 42
    end

    f() = "hello"
end

Only Foo.Bar.baz is considered API, the other names in and from Foo and Foo.Bar are not.

Uses

This is a list of imagined uses of this functionality:

  • Linting hints in LSP.jl, to make users aware of accidentally using internals of a package/Base.
  • API surface tracking over time, especially in regards to test coverage and breaking changes.
  • Tree-shaking for Pkg images/static compilation, to only make api bindings available in the final image/binary/shared object
    • This is primarily thought of in the context of compilation to a shared object - the api marker could be used for not mangling the names of julia functions when compiling an .so, as currently all names are mangled by default (unless marked as @ccallable, if I'm not mistaken, which is limited to taking C-compatible types in a C-style ABI/calling convention).
  • Automatic generation of documentation hints about API surface in Documenter.jl
  • Easier tracking of deprecated functionality before it is ultimately removed in a breaking change
  • Have a PR template mentioning "Does this PR introduce new API?"
  • CI Check enforcing that new API bindings have a docstring associated with them

Do you have ideas? Mention them and I'll edit them in here!

Required Implementation Steps

  • Parse the api keyword in the correct places and produce an expression the compiler can use later on
  • Add api marker handling to methods and the method table implementation, as well as to binding lookup in modules
  • Make the REPL help> mode aware of api tags.
  • Go through all of Base and the stdlibs and mark the bindings currently residing in the manual as api

Known Difficulties with the proposal

  • Expr(:function) does not have space in its first two arguments for additional metadata, so this would need to be added to either a third argument, or create a new Expr(:api_function). Analogous issues exist for Expr(:struct), Expr(:=), Expr(:module) etc. Both approaches potentially require macros to be changed, to be aware of api in their expansion.
  • There is a lot of doc churn, but sadly there's no way around that. My hope is that this can make it easier to just write more docstrings, by virtue of not "promoting" a function to API status just by virtue of having a docstring.
  • This proposal requires quite a bit of julia-internals expertise, touching the parser, probably lowering as well, lots of internal objects/state, the REPL and external packages. It's a very large amount of work, and there's likely no chance of any one person being able to implement all of it - this will be a group effort.

FAQ

  • Why not a macro instead of new syntax?
    • A macro has the disadvantage of not composing nicely with existing macros. Additionally, since this
      requires quite deep changes to Method and other (internal?) objects of Base, exposing this
      as a macro would also mean exposing this as an API to the runtime, even though this api distinction is
      not about dynamicness - the api surface of a package really ought to be fixed in a given version,
      and not change dynamically at runtime.
  • What about annotating bindings as private instead?
    • There is no intention of preventing access to an internal object or otherwise introduce access modifiers
      into the language. Additionally, marking things as private, internal or similar instead of api
      means that any time a developer accidentally forgets to add that modifier means a technically breaking
      change in a release by adding that. The whole point of this proposal is to avoid this kind of breakage.
  • What about naming the keyword public?
    • As there is no intention to provide access modifiers, I feel like naming this public overloads this
      already overloaded term in the wider programming community too much. public/private are commonly
      associated with access modifiers, which is decidedly not what this proposal is about.
  • What about naming the keyword ?
    • Bikeshedding the name is always welcome, though I think it hard to compete with the short conciseness of
      api, which makes its intent very clear. It would also be prudent to have that discussion after we've
      come to a compromise about the desired semantics.
  • How does this proposal interact with export?
    • export is a bit tricky, since it doesn't distinguish between methods the way api does. I think
      it could work to mark all exported symbols with api as well (this is certainly not without its
      own pitfallse..), though I also think that export
      is a bit of an orthogonal concept to api, due to the former being about namespacing and the latter
      being exclusively about what is considered to be actually supported. I think a good example is
      the way save/load are implemented with FileIO.jl. While the parent interface package exports save
      and load, packages wishing to register a new file format define new, private functions for these
      and register those on loading with FileIO (or FileIO calls into them if they're in the environment).
      This means that MyPkg.save is not exported from MyPkg, but is nevertheless a supported API
      provided by MyPkg. The intention is to support these kinds of usecases, where export is
      undesirable for various reasons, while still wishing to provide a documented/supported API surface
      to a package.
  • Why not prototype this in a package?
    • There are various prototypes of similar things in some packages, none of which has been widely adopted as far as I know and I think this is something Base itself could really use as well. Not to mention that starting this in a package is IMO going to splinter the "this is how we define API" discussion further.

I hope this proposal leads to at least some discussion around the issues we face or, failing to get implemented directly, hopefully some other version of more formalized API semantics being merged at some point.

Metadata

Metadata

Assignees

No one assigned

    Labels

    designDesign of APIs or of the language itselffeatureIndicates new feature / enhancement requests

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions