Taking API surface seriously: Proposing Syntax for declaring API

The general goal of this proposal is to formalize what it means to be considered API in julia, to give a framework to talk about what even is a "breaking" change to make it easier for developers to provide more documentation for their packages as well as avoiding breakage in the first place. The reasoning in this document comes from the current practices in use in the ecosystem, as well as type theoretic requirements of the type system, though having a background in type theory is not required to participate in the discussion.

Please do reply if you have something to add/comment on!

## Motivation

The current stance of Base and much of the package ecosystem of what is considered to be API under semver is "If it's in the manual, it's API". Other approaches are "If it has a docstring it's API", "If it's exported, it's API", "If it's in the Internal submodule it's API" or "If we explicitly mention it, it's API". These approaches are inconsistent with each other and have a number of issues:

 * Discoverability
   * As a user writing code, it is nigh impossible to figure out whether a given function they found being used in some other code is stable under semver or not, unnecessarily increasing the chances for accidental breakage later on.
   * Tooling can't reliably query/show whether some symbol/object is stable API or not, because the inconsistencies between the approaches only lend themselves to bespoke solutions on a package by package basis.
 * Maintainability
   * The fuzziness of some of these approaches disincentivizes writing documentation for internal functionality, for fear of others taking the existence of a docstring as guarantor for the function in question being API.
   * This fuzziness and lack of internal documentation increases the "time to first contribution" by newcomers and returning developers as well, due to having to reverse engineer the package to contribute in the first place.
   * "Unofficial" API surfaces grow that are using internals (often because the developers in question wrote the internal in the first place), leading to a sort-of tacit implication that it is ok to use internal functionality of Base/a package.
   * Accidental exposure of an internal as API (even just perceived API!) generally means that removing that internal would necessitate a breaking change - this doesn't scale well, and would quickly lead to not being able to change any code.
   * Due to the fuzziness of what exactly is considered API, deprecations are difficult - `@deprecate` is hard to use correctly and often not used at all.
 * Reviewability
   * It is all too easy to accidentally rely on an internal of another package due to the bad discoverability of what actually is API. This increases the burden on reviewers to check contributions for whether or not the functionality a PR is introducing is relying on/bringing in internal uses, even if they may not be familiar with the dependency in question.
   * The lax stance around what exactly is considered API and the fact that internals rarely get docstrings or even comments explaining what the code does means that contributions don't usually get checked for whether they (accidentally or not) increase the API surface or whether a particular aspect of a contribution or new feature ought to have a documented API in the first place.

This proposal has three parts - first, presenting a full-stack solution to both the Discoverability and Maintainability issues described above, and second, proposing a small list of things that could be done with the proposed feature to make Reviewability easier. Finally, the third part is focused on a (preliminary, nonexhaustive) "How do we get there" list of things required to be implemented for this proposal.

There is also a FAQ list at the end, hoping to anticipate some questions about this proposal that already came up in previous discussions and thoughts about this design. 

## The `api` keyword

The main proposal for user-facing interactions with declaring whether an object is the new `api` keyword. The keyword can be placed in front of definitions in global scope, like `struct`, `abstract type`, `module`, `function` and `const`/type annotated global variables. Using `api` is declaring the _intent_ of a developer, about which parts of the accessible symbols they consider API under semver and plan to support in newer versions. 

When a project/environment importing a package wants to access a symbol not marked as API (if it is not the same project/environment originally defining the symbol), a warning is displayed, making the user aware of the unsupported access but doesn't otherwise hinder it. This behavior should be, to an extent, configurable, to support legitimate accesses of internals (insofar those exist). There is the caveat that silencing these warnings makes no difference to whether or not the access is supported by a developer. This is intended to provide an incentive to either use the supported subset of the API, or encourage a user to start a discussion with the developer to provide an API for what they would like to do. The result of such a discussion can be a "No, we won't support this"! This, however, is a far more desirable outcome to accessing internals and fearing breakage later on, if it would have been avoided by such a discussion.

The following sections explain how `api` interacts with various uses, what the interactions ought to mean semantically as well as the reasoning for choosing the semantics to be so.

### `function`

Consider this example:

```julia
api function foo(arg1, arg2)
   arg1 + arg2
end

# equivalently:
# api foo(arg1, arg2) = arg1 + arg2
```

This declares the function `foo`, with a single method taking two arguments of any type. The `api` keyword, when written in front of such a method definition, only declares the given method as API. If we were to later on define a new method, taking different arguments like so

```julia
function foo(arg1::Int, arg2::Float64)
  arg1 * arg2
end
```

the new method would not be considered API under semver. The reasoning for this is as simple - once a method (any object, really) is included in a new release as API, removing it is a breaking change, even if that inclusion as API was accidental. As such, being conservative with what is considered API is a boon to maintainability.

Being declared on a per-method case means the following: 

 * A method annotated with `api` MAY NOT be removed in a non-breaking release, without splitting the existing method into multiple definitions that are able to fully take on the existing dispatches of the previous single method. In type parlance, this means that the type union of the signatures of the replacement methods MUST be at least as specific as the original method, but MAY be less specific. This is to prevent introducing accidental `MethodError`s where there were none before.
 * A function/method annotated with `api` MAY NOT introduce an error where there was none before, without that being done in a breaking release.
 * A function/method annotated with `api` MAY change the return type of a given set of input arguments, even in a non-breaking release. Developers are free to strengthen this to `MAY NOT` if they feel it appropriate for their function/method.
 * A function/method annotated with `api` MAY remove an error and introduce a non-throwing result.
 * A function/method annotated with `api` MAY change one error type to another, but developers are free to strengthen this to `MAY NOT` if they feel it appropriate for their function.

This is not enforced in the compiler (it can't do so without versioned consistency checking between executions/compilations, though some third party tooling could implement such a mechanism for CI checks or similar), but serves as a semantic guideline to be able to anticipate breaking changes and allow developers to plan around and test for them easier. The exact semantics a method that is marked as API must obey to be considered API, apart of its signature and the above points, are up to the developers of a function. 

Depending on usecase (for example, some interface packages), it is desirable to mark all methods of a function as API. As a shorthand, the syntax

```julia
api function bar end
```

declares ALL methods of `bar` to be public API as an escape hatch - i.e., the above syntax declares the _function_ `bar` to be API, not just individual methods. In Base-internal parlance, the `api` keyword on a single method only marks an entry in the method table as API, while the use on a zero-arg definition marks the whole method table as API. An API mark on the whole method table trumps a nonexistent mark on a single method - it effectively acts as if there were a method taking a sole `::Vararg{Any}` argument and marking that as API.

### `struct`

The cases elaborated on here are (effectively) already the case today, and are only mentioned here for clarity. They are (mostly) a consequence of subtyping relationships and dispatch.

Consider this example:

```julia
abstract type AbstractFoo end

api struct MyStruct{T, S} <: AbstractFoo
    a::T
    b::S
end
```

A struct like the above annotated with `api` guarantees that the default constructor methods are marked as `api`. The subtyping relationship is considered API under semver, up to and including `Any`. The existence of the fields `a` and `b` are considered API, as well as their relationship to the type parameters `T` and `S`.

In the example above, the full chain `MyStruct{T,S} <: AbstractFoo <: Any` is considered API under semver, which means that methods declared as taking either an `AbstractFoo` or an `Any` argument must continue to also take objects of type `MyStruct`. This means that changing a definition like the one above into one like this

```julia
abstract type AbstractBar end
abstract type AbstractFoo end

api struct MyStruct{T,S} <: AbstractBar
    a::T
    b::S
end
```

is _a breaking change under semver_. It is however legal to do the following:

```julia
abstract type AbstractFoo end
abstract type AbstractBar <: AbstractFoo end

api struct MyStruct{T,S} <: AbstractBar
    a::T
    b::S
end
```

because the old subtyping chain `MyStruct{T,S} <: AbstractFoo <: Any` is a subchain of the new chain `MyStruct{T,S} <: AbstractBar <: AbstractFoo <: Any`. That is, it is legal to grow the subtyping chain downwards.

Notably, making `MyStruct` API does not mean that `AbstractFoo` itself is API, i.e. adding new subtypes to `AbstractBar` is not supported and is not considered API purely by annotating a subtype as `API`.

Since the new type in a changing release must be useable in all places where the old type was used, the only additional restriction placed on `MyStruct` as defined above is that no type parameters may be removed. Due to the way dispatch is lazy in terms of matching type parameters, it is legal to add more type parameters without making a breaking change (even if this makes uses of things like `MyStruct{S,T}` in structs containing objects of this type type unstable).

In regards to whether field access is considered API or not, it is possible to annotate individual fields as `api`:

```julia
api struct MyStruct{T,S}
    a::T
    api b::S
end
```

This requires the main struct to be annotated as `api` as well - annotating a field as API without also annotating the struct as API is illegal. This means that accessing an object of type `MyStruct` via `getfield(::MyStruct, :b)` or `getproperty(::MyStruct, :b)` is covered under semver and considered API. The same is not true of the field `a`, its type or the connection to the first type parameter, the layout of `MyStruct` or the internal padding bytes that may be inserted into instances of `MyStruct`.

### `abstract type`

`abstract type` behaves similarly to `struct`, in that it is illegal to remove a type from a subtype chain while it being legal to extend the chain downwards or to introduce new supertypes in the supertype chain.

Consider this example:

```julia
abstract type AbstractBar end
api abstract type MyAbstract <: AbstractBar end
# MyAbstract <: AbstractBar <: Any
```

The following changes do not require a breaking version:

```julia
# introducing a supertype
abstract type AbstractBar end
abstract type AbstractFoo <: AbstractBar end
api abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: AbstractFoo <: AbstractBar <: Any
```

The following changes require a new breaking version:

```julia
# removing a supertype
api abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: <: Any
```

```julia
# removing the `api` keyword
abstract type AbstractBar end
abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: AbstractBar <: Any
```

What the `api` keyword used on abstract types effectively means for the users of a package is that it is considered API to subtype the abstract type, to opt into some behavior/set of API methods/dispatches the package provides, as long as the semantics of the type (usually detailed in its docstring) are followed. In particular, this means that methods like `api function foo(a::MyAbstract)` are expected to work with new objects `MyConcrete <: MyAbstract` defined by a user, but methods like `function bar(a::MyAbstract)` (note the lack of `api`) are not.

At the same time, a lack of `api` can be considered an indicator that it is not generally expected nor supported to subtype the abstract type in question.

### type annotated/`const` global variables

```
api MyFoo::Foo = ...
api const MyBar = ...
````

Marking a global `const` or type annotated variable as `api` means that the variable binding is considered API under semver and is guaranteed to exist in new versions as well. For a type annotated global variable, both reading from and writing to the variable by users of a package is considered API, while for `const` global variables only reading is considered API, writing is not API.

The type of a type annotated variable is allowed to be narrowed to a subtype of the original type (i.e. a type giving more guarantees), since all uses of the old assumption (the weaker supertype giving less guarantees) are expected to continue to work.

Non-type-annotated global variables can never be considered API, as the variable can make no guarantees about the object in question and any implicit assumptions of the object that should hold ought to be encoded in an abstract type representing those assumptions/invariants. It is legal to explicitly write `api Bar::Any = ...`.

It should be noted that it is the _variable binding_ that is considered API, not the the variable refers to itself. It is legal to document additional guarantees or requirements of an object being referred to through a binding marked as `api`.

### `module`

Annotating an entire `module` expression with `api` means that all first-level symbols defined in that module are considered API under semver. This means that they cannot be removed from the module and accessing them should return an object compatible with the type that same binding had in the previous version.

Consider this example:

```julia
api module Foo
    module Bar
      api const baz = 1
      const bak = 42
    end

    f() = "hello"
end
```

In this example, `Foo`, `Foo.f`, `Foo.Bar`, `Foo.Bar.baz` are considered API, while `Foo.Bar.bak` is not.

Consider this other example:

```julia
module Foo
    api module Bar
      const baz = 1
      const bak = 42
    end

    f() = "hello"
    api g() = 
end
```

In this example, `Foo.g`, `Foo.Bar`, `Foo.Bar.baz` and `Foo.Bar.bak` are considered API, while `Foo` and `Foo.f` are not.

Consider this third example:

```julia
module Foo
    module Bar
      api const baz = 1
      const bak = 42
    end

    f() = "hello"
end
```

Only `Foo.Bar.baz` is considered API, the other names in and from `Foo` and `Foo.Bar` are not.


## Uses

This is a list of imagined uses of this functionality:

 * Linting hints in LSP.jl, to make users aware of accidentally using internals of a package/Base.
 * API surface tracking over time, especially in regards to test coverage and breaking changes.
 * Tree-shaking for Pkg images/static compilation, to only make `api` bindings available in the final image/binary/shared object
    * This is primarily thought of in the context of compilation to a shared object - the `api` marker could be used for not mangling the names of julia functions when compiling an `.so`, as currently all names are mangled by default (unless marked as `@ccallable`, if I'm not mistaken, which is limited to taking C-compatible types in a C-style ABI/calling convention).
 * Automatic generation of documentation hints about API surface in Documenter.jl
 * Easier tracking of deprecated functionality before it is ultimately removed in a breaking change
 * Have a PR template mentioning "Does this PR introduce new API?"
 * CI Check enforcing that new API bindings have a docstring associated with them

Do you have ideas? Mention them and I'll edit them in here!

## Required Implementation Steps

 * Parse the `api` keyword in the correct places and produce an expression the compiler can use later on
 * Add `api` marker handling to methods and the method table implementation, as well as to binding lookup in modules
 * Make the REPL `help>` mode aware of `api` tags.
 * Go through all of Base and the stdlibs and mark the bindings currently residing in the manual as `api`

### Known Difficulties with the proposal

 * `Expr(:function)` does not have space in its first two arguments for additional metadata, so this would need to be added to either a third argument, or create a new `Expr(:api_function)`. Analogous issues exist for `Expr(:struct)`, `Expr(:=)`, `Expr(:module)` etc. Both approaches potentially require macros to be changed, to be aware of `api` in their expansion.
 * There is a lot of doc churn, but sadly there's no way around that. My hope is that this can make it easier to just write more docstrings, by virtue of not "promoting" a function to API status just by virtue of having a docstring.
 * This proposal requires quite a bit of julia-internals expertise, touching the parser, probably lowering as well, lots of internal objects/state, the REPL and external packages. It's a very large amount of work, and there's likely no chance of any one person being able to implement all of it - this will be a group effort.

## FAQ

 * Why not a macro instead of new syntax?
    * A macro has the disadvantage of not composing nicely with existing macros. Additionally, since this
      requires quite deep changes to `Method` and other (internal?) objects of `Base`, exposing this
      as a macro would also mean exposing this as an API to the runtime, even though this `api` distinction is
      not about dynamicness - the `api` surface of a package really ought to be fixed in a given version,
      and not change dynamically at runtime.
 * What about annotating bindings as `private` instead?
    * There is no intention of preventing access to an internal object or otherwise introduce access modifiers
      into the language. Additionally, marking things as `private`, `internal` or similar instead of `api`
      means that any time a developer accidentally _forgets_ to add that modifier means a technically breaking
      change in a release by adding that. The whole point of this proposal is to avoid this kind of breakage.
 * What about naming the keyword `public`?
    * As there is no intention to provide access modifiers, I feel like naming this `public` overloads this
      already overloaded term in the wider programming community too much. `public`/`private` are commonly
      associated with access modifiers, which is decidedly not what this proposal is about.
 * What about naming the keyword <insert favorite here>?
    * Bikeshedding the name is always welcome, though I think it hard to compete with the short conciseness of
      `api`, which makes its intent very clear. It would also be prudent to have that discussion after we've
      come to a compromise about the desired semantics.
 * How does this proposal interact with `export`?
    * `export` is a bit tricky, since it doesn't distinguish between methods the way `api` does. I think
      it could work to mark all `export`ed symbols with `api` as well (this is certainly not without its
      own pitfallse..), though I also think that `export`
      is a bit of an orthogonal concept to `api`, due to the former being about namespacing and the latter
      being exclusively about what is considered to be actually supported. I think a good example is
      the way `save`/`load` are implemented with FileIO.jl. While the parent interface package exports `save`
      and `load`, packages wishing to register a new file format define _new, private_ functions for these
      and register those on loading with FileIO (or FileIO calls into them if they're in the environment).
      This means that `MyPkg.save` is _not_ exported from `MyPkg`, but is nevertheless a supported API
      provided by `MyPkg`. The intention is to support these kinds of usecases, where `export` is
      undesirable for various reasons, while still wishing to provide a documented/supported API surface
      to a package.
 * Why not prototype this in a package?
   * There are various prototypes of similar things in some packages, none of which has been widely adopted as far as I know and I think this is something Base itself could really use as well. Not to mention that starting this in a package is IMO going to splinter the "this is how we define API" discussion further. 


---

I hope this proposal leads to at least some discussion around the issues we face or, failing to get implemented directly, hopefully some other version of more formalized API semantics being merged at some point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Taking API surface seriously: Proposing Syntax for declaring API #49973

Motivation

The `api` keyword

`function`

`struct`

`abstract type`

type annotated/`const` global variables

`module`

Uses

Required Implementation Steps

Known Difficulties with the proposal

FAQ

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Taking API surface seriously: Proposing Syntax for declaring API #49973

Description

Motivation

The api keyword

function

struct

abstract type

type annotated/const global variables

module

Uses

Required Implementation Steps

Known Difficulties with the proposal

FAQ

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The `api` keyword

`function`

`struct`

`abstract type`

type annotated/`const` global variables

`module`