Skip to content

Commit e3a60d3

Browse files
authored
Merge pull request #2454 from youngnick/implementers-guide
Add implementer's guide
2 parents d031acf + 1653156 commit e3a60d3

File tree

4 files changed

+341
-1
lines changed

4 files changed

+341
-1
lines changed

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ nav:
7777
- gRPC Routing: guides/grpc-routing.md
7878
- Migrating from Ingress: guides/migrating-from-ingress.md
7979
- Reference:
80+
- Implementer's Guide: reference/implementers-guide.md
8081
- API Types:
8182
GatewayClass: api-types/gatewayclass.md
8283
Gateway: api-types/gateway.md

site-src/concepts/guidelines.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Implementation guidelines
1+
# Design guidelines
22

33
There are some general design guidelines used throughout this API.
44

Lines changed: 337 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,337 @@
1+
# Gateway API Implementer's Guide
2+
3+
Everything you wanted to know about building a Gateway API implementation
4+
but were too afraid to ask.
5+
6+
This document is a place to collect tips and tricks for _writing a Gateway API
7+
implementation_ that have no straightforward place within the godoc fields of the
8+
underlying types.
9+
10+
It's also intended to be a place to write down some guidelines to
11+
help implementers of this API to skip making common mistakes.
12+
13+
It may not be very relevant if you are intending to _use_ this API as an end
14+
user as opposed to _building_ something that uses it.
15+
16+
This is a living document, if you see something missing, PRs welcomed!
17+
18+
## Important things to remember about Gateway API
19+
20+
Hopefully most of these are not surprising, but they sometimes have non-obvious
21+
implications that we'll try and lay out here.
22+
23+
### Gateway API is a `kubernetes.io` API
24+
25+
Gateway API uses the `gateway.networking.k8s.io` API group. This means that,
26+
like APIs delivered in the core Kubernetes binaries, each time a release happens,
27+
the APIs have been reviewed by upstream Kubernetes reviewers, just like the APIs
28+
delivered in the core binaries.
29+
30+
### Gateway API is delivered using CRDs
31+
32+
Gateway API is supplied as a set of CRDs, version controlled using our [versioning
33+
policy][versioning].
34+
35+
The most important part of that versioning policy is that what _appears to be_
36+
the same object (that is, it has the same `group`,`version`, and `kind`) may have
37+
a slightly different schema. We make changes in ways that are _compatible_, so
38+
things should generally "just work", but there are some actions implementations
39+
need to take to make "just work"ing more reliable; these are detailed below.
40+
41+
The CRD-based delivery also means that if an implementation tries to use (that is
42+
get, list, watch, etc) Gateway API objects when the CRDs have _not_ been installed,
43+
then it's likely that your Kubernetes client code will return serious errors.
44+
Tips to deal with this are also detailed below.
45+
46+
The CRD definitions for Gateway API objects all contain two specific
47+
annotations:
48+
49+
- `gateway.networking.k8s.io/bundle-version: <semver-release-version>`
50+
- `gateway.networking.k8s.io/channel: <channel-name>`
51+
52+
The concepts of "bundle version" and "channel" (short for "release channel") are
53+
explained in our [versioning][versioning] documentation.
54+
55+
Implementations may use these to determine what schema versions are installed in
56+
the cluster, if any.
57+
58+
[versioning]: /concepts/versioning
59+
60+
### Changes to the Gateway API CRDs are backwards compatible
61+
62+
Part of the contract for Gateway API CRDs is that changes _within an API version_
63+
must be _compatible_.
64+
65+
"Within an API Version" means changes to a CRD that occur while the same API version
66+
(`v1alpha2` or `v1` for example) is in use, and "compatible" means that any new
67+
fields, values, or validation will be added to ensure that _previous_
68+
objects _will still be valid objects_ after the change.
69+
70+
This means that once Gateway API objects move to the `v1` API version, then _all_
71+
changes must be compatible.
72+
73+
This contract also means that an implementation will not fail with a higher version
74+
of the API than the version it was written with, because the newer schema being
75+
stored by Kubernetes will definitely be able to be serialized into the older version
76+
used in code by the implementation.
77+
78+
Similarly, if an implementation was written with a _higher_ version, the newer
79+
values that it understands will simply _never be used_, as they are not present
80+
in the older version.
81+
82+
## Implementation Rules and Guidelines
83+
84+
### CRD Management
85+
86+
For a Gateway API implementation to work, the Gateway API CRDs must be installed
87+
in the Kubernetes cluster the implementation is watching.
88+
89+
Implementations have two possible options: installing CRDs themselves (implementation
90+
controlled) or requiring installation by some other mechanism before working
91+
(externally controlled). Both have tradeoffs, but implementation controlled has
92+
significantly more, and so we DO NOT recommend using implementation controlled
93+
methods at this time.
94+
95+
Regardless, either way has certain things that SHOULD be true, however:
96+
97+
Whatever method is used, infra and cluster admins SHOULD attempt to ensure that
98+
the Bundle version of the CRDs is not _downgraded_. Although we ensure that
99+
API changes are backwards compatible, changing CRD definitions can change the
100+
storage version of the resource, which could have unforseen effects. Most of the
101+
time, things will probably work, but if it doesn't work, it will most likely
102+
break in weird ways.
103+
104+
Additionally, older versions of the API may be missing fields or features, which
105+
could be very disruptive for users.
106+
107+
Try your best to ensure that the bundle version doesn't roll backwards. It's safer.
108+
109+
Implementations SHOULD also handle the Gateway API CRDs _not_ being present in
110+
the cluster without crashing or panicking. Exiting with a clear fatal error is
111+
acceptable in this case, as is disabling Gateway API support even if enabled in
112+
configuration.
113+
114+
Practically, for implementations using tools like `controller-runtime` or
115+
similar tooling, they may need to check for the _presence_ of the CRDs by
116+
getting the list of installed CRDs before attempting to watch those resources.
117+
(Note that this will require the implementation to have `read` access to those
118+
resources though.)
119+
120+
#### Implementation-controlled CRD installation
121+
122+
Implementation-controlled CRD installation also includes automatic installation
123+
mechanisms such as Helm, if the CRDs are included in a Helm chart with the
124+
implementation's installation.
125+
126+
Because of significant caveats we DO NOT recommend doing implementation-controlled
127+
CRD management at this time.
128+
129+
However, if you really must, CRD definitions MAY be installed by implementations,
130+
but if they do, they MUST have a way to ensure:
131+
132+
- there are no other Gateway API CRDs installed in the cluster before starting, or
133+
- that the CRD definitions are only installed if they are a higher bundle version
134+
than any existing Gateway API CRDs. Note that even this may not be safe if there
135+
are breaking changes in the experimental channel resources, so implementations
136+
should be _very_ careful with doing this.
137+
138+
This avoids problems if another implementation is also installed in the cluster
139+
and expects a higher version of the CRDs to be installed.
140+
141+
The worst outcome here would be two implementations trying to do automatic install
142+
of _different_ CRD versions, resulting in the CRD versions flapping between
143+
versions or channels. This would _not_ produce good outcomes.
144+
145+
The safer method for an automatic installation would require the implementation
146+
to:
147+
148+
- Check if there are any Gateway API CRDs installed in the cluster.
149+
- If not, install its most compatible version of the CRDs.
150+
- If so, only install its version of the CRDs if the bundle version is higher
151+
than the existing one, and the mechanism will also need to check if there are
152+
incompatible changes included in any versions as well.
153+
154+
This is going to be _very_ difficult to pull off in practice.
155+
156+
It should also be noted that many infra and cluster admins manage CRDs using
157+
externally controlled methods that will not be visible to a Gateway
158+
implementation, so if you still proceed with automatic installation, it MUST be
159+
able to be disabled by the installation owner (whether that is the infra or cluster
160+
admin).
161+
162+
Because of all these caveats, we DO NOT recommend doing automatic CRD management
163+
at this time.
164+
165+
#### Externally controlled CRD installation
166+
167+
Because of all of the complexities mentioned in the "Implementation controlled"
168+
section of this document, we recommend that implementations supply documentation
169+
on how to check if CRDs are installed and upgrade versions if required.
170+
171+
Additions to this document to add suggested commands here are welcomed.
172+
173+
### Conformance and Version compatibility
174+
175+
A conformant Gateway API implementation is one that passes the conformance tests
176+
that are included in each Gateway API bundle version release.
177+
178+
An implementation MUST pass the conformance suite with _no_ skipped tests to be
179+
conformant. Tests may be skipped during development, but a version you want to
180+
be conformant MUST have no skipped tests.
181+
182+
Extended features may, as per the contract for Extended status, be disabled.
183+
184+
Gateway API conformance is version-specific. An implementation that passes
185+
conformance for version N may not pass conformance for version N+1 without changes.
186+
187+
Implementations SHOULD submit a report from the conformance testing suite back
188+
to the Gateway API Github repo containing details of their testing.
189+
190+
The conformance suite output includes the Gateway API version supported.
191+
192+
#### Version compatibility
193+
194+
Once v1.0 is released, for implementations supporting Gateway and GatewayClass,
195+
they MUST set a new Condition, `SupportedVersion`, with `status: true` meaning
196+
that the installed CRD version is supported, and `status: false` meaning that it
197+
is not.
198+
199+
### Standard Status fields and Conditions
200+
201+
Gateway API has many resources, but when designing this, we've worked to keep
202+
the status experience as consistent as possible across objects, using the
203+
Condition type and the `status.conditions` field.
204+
205+
Most resources have a `status.conditions` field, but some also have a namespaced
206+
field that _contains_ a `conditions` field.
207+
208+
For the latter, Gateway's `status.listeners` and the Route `status.parents`
209+
fields are examples where each item in the slice identifies the Conditions
210+
associated with some subset of configuration.
211+
212+
For the Gateway case, it's to allow Conditions per _Listener_, and in the Route
213+
case, it's to allow Conditions per _implementation_ (since Route objects can
214+
be used in multiple Gateways, and those Gateways can be reconciled by different
215+
implementations).
216+
217+
In all of these cases, there are some relatively-common Condition types that have
218+
similar meanings:
219+
220+
- `Accepted` - the resource or part thereof contains acceptable config that will
221+
produce some configuration in the underlying data plane that the implementation
222+
controls. This does not mean that the _whole_ configuration is valid, just that
223+
_enough_ is valid to produce some effect.
224+
- `Programmed` - this represents a later phase of operation, after `Accepted`,
225+
when the resource or part thereof has been Accepted and programmed into the
226+
underlying dataplane. Users should expect the configuration to be ready for
227+
traffic to flow _at some point in the near future_. This Condition does _not_
228+
say that the dataplane is ready _when it's set_, just that everything is valid
229+
and it _will become ready soon_. "Soon" may have different meanings depending
230+
on the implementation.
231+
- `ResolvedRefs` - this Condition indicates that all references in the resource
232+
or part thereof were valid and pointed to an object that both exists and allows
233+
that reference. If this Condition is set to `status: false`, then _at least one_
234+
reference in the resource or part thereof is invalid for some reason, and the
235+
`message` field should indicate which one are invalid.
236+
237+
Implementers should check the godoc for each type to see the exact details of
238+
these Conditions on each resource or part thereof.
239+
240+
Additionally, the upstream `Conditions` struct contains an optional
241+
`observedGeneration` field - implementations MUST use this field and set it to
242+
the `metadata.generation` field of the object at the time the status is generated.
243+
This allows users of the API to determine if the status is relevant to the current
244+
version of the object.
245+
246+
247+
### Resource details
248+
249+
For each currently available conformance profile, there are a set of resources
250+
that implementations are expected to reconcile.
251+
252+
The following section goes through each Gateway API object and indicates expected
253+
behaviors.
254+
255+
#### GatewayClass
256+
257+
GatewayClass has one main `spec` field - `controllerName`. Each implementation
258+
is expected to claim a domain-prefixed string value (like
259+
`example.com/example-ingress`) as its `controllerName`.
260+
261+
Implementations MUST watch _all_ GatewayClasses, and reconcile GatewayClasses
262+
that have a matching `controllerName`. The implementation must choose at least
263+
one compatible GatewayClass out of the set of GatewayClasses that have a matching
264+
`controllerName`, and indicate that it accepts processing of that GatewayClass
265+
by setting an `Accepted` Condition to `status: true` in each. Any GatewayClasses
266+
that have a matching `controllerName` but are _not_ Accepted must have the
267+
`Accepted` Condition sett to `status: false`.
268+
269+
Implementations MAY choose only one GatewayClass out of the pool of otherwise
270+
acceptable GatewayClasses if they can only reconcile one, or, if they are capable
271+
of reconciling multiple GatewayClasses, they may also choose as many as they like.
272+
273+
If something in the GatewayClass renders it incompatibie (at the time of writing,
274+
the only possible reason for this is that there is a pointer to a `paramsRef`
275+
object that is not supported by the implementation), then the implementation
276+
SHOULD mark the incompatible GatewayClass as not `Accepted`.
277+
278+
#### Gateway
279+
280+
Gateway objects MUST refer in the `spec.gatewayClassName` field to a GatewayClass
281+
that exists and is `Accepted` by an implementation for that implementation to
282+
reconcile them.
283+
284+
Gateway objects that fall out of scope (for example, because the GatewayClass
285+
they reference was deleted) for reconciliation MAY have their status removed by
286+
the implementation as part of the delete process, but this is not required.
287+
288+
#### General Route information
289+
290+
All Route objects share some properties:
291+
292+
- They MUST be attached to an in-scope parent for the implementation to consider
293+
them reconcilable.
294+
- The implementation MUST update the status for each in-scope Route with the
295+
relevant Conditions, using the namespaced `parents` field. See the specific Route
296+
types for details, but this usually includes `Accepted`, `Programmed` and
297+
`ResovledRefs` Conditions.
298+
- Routes that fall out of scope SHOULD NOT have status updated, since it's possible
299+
that these updates may overwrite any new owners. The `observedGeneration` field
300+
will indicate that any remaining status is out of date.
301+
302+
303+
#### HTTPRoute
304+
305+
HTTPRoutes route HTTP traffic that is _unencrypted_ and available for inspection.
306+
This includes HTTPS traffic that's terminated at the Gateway (since that is then
307+
decrypted), and allows the HTTPRoute to use HTTP properties, like path, method,
308+
or headers in its routing directives.
309+
310+
#### TLSRoute
311+
312+
TLSRoutes route encrypted TLS traffic using the SNI header, _without decrypting
313+
the traffic stream_, to the relevant backends.
314+
315+
#### TCPRoute
316+
317+
TCPRoutes route a TCP stream that arrives at a Listener to one of the given
318+
backends.
319+
320+
#### UDPRoute
321+
322+
UDPRoutes route UDP packets that arrive at a Listener to one of the given
323+
backends.
324+
325+
#### ReferenceGrant
326+
327+
ReferenceGrant is a special resource that is used by resource owners in one
328+
namespace to _selectively_ allow references from Gateway API objects in other
329+
namespaces.
330+
331+
A ReferenceGrant is created in the same namespace as the thing it's granting
332+
reference access to, and allows access from other namespaces, from other Kinds,
333+
or both.
334+
335+
Implementations that support cross-namespace references MUST watch ReferenceGrant
336+
and reconcile any ReferenceGrant that points to an object that's referred to by
337+
an in-scope Gateway API object.

site-src/references/spec.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
# API Specification
22

3+
This page contains the API field specification for Gateway API.
4+
35
REPLACE_WITH_GENERATED_CONTENT

0 commit comments

Comments
 (0)