|
| 1 | +# Gateway API Implementer's Guide |
| 2 | + |
| 3 | +Everything you wanted to know about building a Gateway API implementation |
| 4 | +but were too afraid to ask. |
| 5 | + |
| 6 | +This document is a place to collect tips and tricks for _writing a Gateway API |
| 7 | +implementation_ that have no straightforward place within the godoc fields of the |
| 8 | +underlying types. |
| 9 | + |
| 10 | +It's also intended to be a place to write down some guidelines to |
| 11 | +help implementers of this API to skip making common mistakes. |
| 12 | + |
| 13 | +It may not be very relevant if you are intending to _use_ this API as an end |
| 14 | +user as opposed to _building_ something that uses it. |
| 15 | + |
| 16 | +This is a living document, if you see something missing, PRs welcomed! |
| 17 | + |
| 18 | +## Important things to remember about Gateway API |
| 19 | + |
| 20 | +Hopefully most of these are not surprising, but they sometimes have non-obvious |
| 21 | +implications that we'll try and lay out here. |
| 22 | + |
| 23 | +### Gateway API is a `kubernetes.io` API |
| 24 | + |
| 25 | +Gateway API uses the `gateway.networking.k8s.io` API group. This means that, |
| 26 | +like APIs delivered in the core Kubernetes binaries, each time a release happens, |
| 27 | +the APIs have been reviewed by upstream Kubernetes reviewers, just like the APIs |
| 28 | +delivered in the core binaries. |
| 29 | + |
| 30 | +### Gateway API is delivered using CRDs |
| 31 | + |
| 32 | +Gateway API is supplied as a set of CRDs, version controlled using our [versioning |
| 33 | +policy][versioning]. |
| 34 | + |
| 35 | +The most important part of that versioning policy is that what _appears to be_ |
| 36 | +the same object (that is, it has the same `group`,`version`, and `kind`) may have |
| 37 | +a slightly different schema. We make changes in ways that are _compatible_, so |
| 38 | +things should generally "just work", but there are some actions implementations |
| 39 | +need to take to make "just work"ing more reliable; these are detailed below. |
| 40 | + |
| 41 | +The CRD-based delivery also means that if an implementation tries to use (that is |
| 42 | +get, list, watch, etc) Gateway API objects when the CRDs have _not_ been installed, |
| 43 | +then it's likely that your Kubernetes client code will return serious errors. |
| 44 | +Tips to deal with this are also detailed below. |
| 45 | + |
| 46 | +The CRD definitions for Gateway API objects all contain two specific |
| 47 | +annotations: |
| 48 | + |
| 49 | +- `gateway.networking.k8s.io/bundle-version: <semver-release-version>` |
| 50 | +- `gateway.networking.k8s.io/channel: <channel-name>` |
| 51 | + |
| 52 | +The concepts of "bundle version" and "channel" (short for "release channel") are |
| 53 | +explained in our [versioning][versioning] documentation. |
| 54 | + |
| 55 | +Implementations may use these to determine what schema versions are installed in |
| 56 | +the cluster, if any. |
| 57 | + |
| 58 | +[versioning]: /concepts/versioning |
| 59 | + |
| 60 | +### Changes to the Gateway API CRDs are backwards compatible |
| 61 | + |
| 62 | +Part of the contract for Gateway API CRDs is that changes _within an API version_ |
| 63 | +must be _compatible_. |
| 64 | + |
| 65 | +"Within an API Version" means changes to a CRD that occur while the same API version |
| 66 | +(`v1alpha2` or `v1` for example) is in use, and "compatible" means that any new |
| 67 | +fields, values, or validation will be added to ensure that _previous_ |
| 68 | +objects _will still be valid objects_ after the change. |
| 69 | + |
| 70 | +This means that once Gateway API objects move to the `v1` API version, then _all_ |
| 71 | +changes must be compatible. |
| 72 | + |
| 73 | +This contract also means that an implementation will not fail with a higher version |
| 74 | +of the API than the version it was written with, because the newer schema being |
| 75 | +stored by Kubernetes will definitely be able to be serialized into the older version |
| 76 | +used in code by the implementation. |
| 77 | + |
| 78 | +Similarly, if an implementation was written with a _higher_ version, the newer |
| 79 | +values that it understands will simply _never be used_, as they are not present |
| 80 | +in the older version. |
| 81 | + |
| 82 | +## Implementation Rules and Guidelines |
| 83 | + |
| 84 | +### CRD Management |
| 85 | + |
| 86 | +For a Gateway API implementation to work, the Gateway API CRDs must be installed |
| 87 | +in the Kubernetes cluster the implementation is watching. |
| 88 | + |
| 89 | +Implementations have two possible options: installing CRDs themselves (implementation |
| 90 | +controlled) or requiring installation by some other mechanism before working |
| 91 | +(externally controlled). Both have tradeoffs, but implementation controlled has |
| 92 | +significantly more, and so we DO NOT recommend using implementation controlled |
| 93 | +methods at this time. |
| 94 | + |
| 95 | +Regardless, either way has certain things that SHOULD be true, however: |
| 96 | + |
| 97 | +Whatever method is used, infra and cluster admins SHOULD attempt to ensure that |
| 98 | +the Bundle version of the CRDs is not _downgraded_. Although we ensure that |
| 99 | +API changes are backwards compatible, changing CRD definitions can change the |
| 100 | +storage version of the resource, which could have unforseen effects. Most of the |
| 101 | +time, things will probably work, but if it doesn't work, it will most likely |
| 102 | +break in weird ways. |
| 103 | + |
| 104 | +Additionally, older versions of the API may be missing fields or features, which |
| 105 | +could be very disruptive for users. |
| 106 | + |
| 107 | +Try your best to ensure that the bundle version doesn't roll backwards. It's safer. |
| 108 | + |
| 109 | +Implementations SHOULD also handle the Gateway API CRDs _not_ being present in |
| 110 | +the cluster without crashing or panicking. Exiting with a clear fatal error is |
| 111 | +acceptable in this case, as is disabling Gateway API support even if enabled in |
| 112 | +configuration. |
| 113 | + |
| 114 | +Practically, for implementations using tools like `controller-runtime` or |
| 115 | +similar tooling, they may need to check for the _presence_ of the CRDs by |
| 116 | +getting the list of installed CRDs before attempting to watch those resources. |
| 117 | +(Note that this will require the implementation to have `read` access to those |
| 118 | +resources though.) |
| 119 | + |
| 120 | +#### Implementation-controlled CRD installation |
| 121 | + |
| 122 | +Implementation-controlled CRD installation also includes automatic installation |
| 123 | +mechanisms such as Helm, if the CRDs are included in a Helm chart with the |
| 124 | +implementation's installation. |
| 125 | + |
| 126 | +Because of significant caveats we DO NOT recommend doing implementation-controlled |
| 127 | +CRD management at this time. |
| 128 | + |
| 129 | +However, if you really must, CRD definitions MAY be installed by implementations, |
| 130 | +but if they do, they MUST have a way to ensure: |
| 131 | + |
| 132 | +- there are no other Gateway API CRDs installed in the cluster before starting, or |
| 133 | +- that the CRD definitions are only installed if they are a higher bundle version |
| 134 | + than any existing Gateway API CRDs. Note that even this may not be safe if there |
| 135 | + are breaking changes in the experimental channel resources, so implementations |
| 136 | + should be _very_ careful with doing this. |
| 137 | + |
| 138 | +This avoids problems if another implementation is also installed in the cluster |
| 139 | +and expects a higher version of the CRDs to be installed. |
| 140 | + |
| 141 | +The worst outcome here would be two implementations trying to do automatic install |
| 142 | +of _different_ CRD versions, resulting in the CRD versions flapping between |
| 143 | +versions or channels. This would _not_ produce good outcomes. |
| 144 | + |
| 145 | +The safer method for an automatic installation would require the implementation |
| 146 | +to: |
| 147 | + |
| 148 | +- Check if there are any Gateway API CRDs installed in the cluster. |
| 149 | +- If not, install its most compatible version of the CRDs. |
| 150 | +- If so, only install its version of the CRDs if the bundle version is higher |
| 151 | + than the existing one, and the mechanism will also need to check if there are |
| 152 | + incompatible changes included in any versions as well. |
| 153 | + |
| 154 | +This is going to be _very_ difficult to pull off in practice. |
| 155 | + |
| 156 | +It should also be noted that many infra and cluster admins manage CRDs using |
| 157 | +externally controlled methods that will not be visible to a Gateway |
| 158 | +implementation, so if you still proceed with automatic installation, it MUST be |
| 159 | +able to be disabled by the installation owner (whether that is the infra or cluster |
| 160 | +admin). |
| 161 | + |
| 162 | +Because of all these caveats, we DO NOT recommend doing automatic CRD management |
| 163 | +at this time. |
| 164 | + |
| 165 | +#### Externally controlled CRD installation |
| 166 | + |
| 167 | +Because of all of the complexities mentioned in the "Implementation controlled" |
| 168 | +section of this document, we recommend that implementations supply documentation |
| 169 | +on how to check if CRDs are installed and upgrade versions if required. |
| 170 | + |
| 171 | +Additions to this document to add suggested commands here are welcomed. |
| 172 | + |
| 173 | +### Conformance and Version compatibility |
| 174 | + |
| 175 | +A conformant Gateway API implementation is one that passes the conformance tests |
| 176 | +that are included in each Gateway API bundle version release. |
| 177 | + |
| 178 | +An implementation MUST pass the conformance suite with _no_ skipped tests to be |
| 179 | +conformant. Tests may be skipped during development, but a version you want to |
| 180 | +be conformant MUST have no skipped tests. |
| 181 | + |
| 182 | +Extended features may, as per the contract for Extended status, be disabled. |
| 183 | + |
| 184 | +Gateway API conformance is version-specific. An implementation that passes |
| 185 | +conformance for version N may not pass conformance for version N+1 without changes. |
| 186 | + |
| 187 | +Implementations SHOULD submit a report from the conformance testing suite back |
| 188 | +to the Gateway API Github repo containing details of their testing. |
| 189 | + |
| 190 | +The conformance suite output includes the Gateway API version supported. |
| 191 | + |
| 192 | +#### Version compatibility |
| 193 | + |
| 194 | +Once v1.0 is released, for implementations supporting Gateway and GatewayClass, |
| 195 | +they MUST set a new Condition, `SupportedVersion`, with `status: true` meaning |
| 196 | +that the installed CRD version is supported, and `status: false` meaning that it |
| 197 | +is not. |
| 198 | + |
| 199 | +### Standard Status fields and Conditions |
| 200 | + |
| 201 | +Gateway API has many resources, but when designing this, we've worked to keep |
| 202 | +the status experience as consistent as possible across objects, using the |
| 203 | +Condition type and the `status.conditions` field. |
| 204 | + |
| 205 | +Most resources have a `status.conditions` field, but some also have a namespaced |
| 206 | +field that _contains_ a `conditions` field. |
| 207 | + |
| 208 | +For the latter, Gateway's `status.listeners` and the Route `status.parents` |
| 209 | +fields are examples where each item in the slice identifies the Conditions |
| 210 | +associated with some subset of configuration. |
| 211 | + |
| 212 | +For the Gateway case, it's to allow Conditions per _Listener_, and in the Route |
| 213 | +case, it's to allow Conditions per _implementation_ (since Route objects can |
| 214 | +be used in multiple Gateways, and those Gateways can be reconciled by different |
| 215 | +implementations). |
| 216 | + |
| 217 | +In all of these cases, there are some relatively-common Condition types that have |
| 218 | +similar meanings: |
| 219 | + |
| 220 | +- `Accepted` - the resource or part thereof contains acceptable config that will |
| 221 | +produce some configuration in the underlying data plane that the implementation |
| 222 | +controls. This does not mean that the _whole_ configuration is valid, just that |
| 223 | +_enough_ is valid to produce some effect. |
| 224 | +- `Programmed` - this represents a later phase of operation, after `Accepted`, |
| 225 | +when the resource or part thereof has been Accepted and programmed into the |
| 226 | +underlying dataplane. Users should expect the configuration to be ready for |
| 227 | +traffic to flow _at some point in the near future_. This Condition does _not_ |
| 228 | +say that the dataplane is ready _when it's set_, just that everything is valid |
| 229 | +and it _will become ready soon_. "Soon" may have different meanings depending |
| 230 | +on the implementation. |
| 231 | +- `ResolvedRefs` - this Condition indicates that all references in the resource |
| 232 | +or part thereof were valid and pointed to an object that both exists and allows |
| 233 | +that reference. If this Condition is set to `status: false`, then _at least one_ |
| 234 | +reference in the resource or part thereof is invalid for some reason, and the |
| 235 | +`message` field should indicate which one are invalid. |
| 236 | + |
| 237 | +Implementers should check the godoc for each type to see the exact details of |
| 238 | +these Conditions on each resource or part thereof. |
| 239 | + |
| 240 | +Additionally, the upstream `Conditions` struct contains an optional |
| 241 | +`observedGeneration` field - implementations MUST use this field and set it to |
| 242 | +the `metadata.generation` field of the object at the time the status is generated. |
| 243 | +This allows users of the API to determine if the status is relevant to the current |
| 244 | +version of the object. |
| 245 | + |
| 246 | + |
| 247 | +### Resource details |
| 248 | + |
| 249 | +For each currently available conformance profile, there are a set of resources |
| 250 | +that implementations are expected to reconcile. |
| 251 | + |
| 252 | +The following section goes through each Gateway API object and indicates expected |
| 253 | +behaviors. |
| 254 | + |
| 255 | +#### GatewayClass |
| 256 | + |
| 257 | +GatewayClass has one main `spec` field - `controllerName`. Each implementation |
| 258 | +is expected to claim a domain-prefixed string value (like |
| 259 | +`example.com/example-ingress`) as its `controllerName`. |
| 260 | + |
| 261 | +Implementations MUST watch _all_ GatewayClasses, and reconcile GatewayClasses |
| 262 | +that have a matching `controllerName`. The implementation must choose at least |
| 263 | +one compatible GatewayClass out of the set of GatewayClasses that have a matching |
| 264 | +`controllerName`, and indicate that it accepts processing of that GatewayClass |
| 265 | +by setting an `Accepted` Condition to `status: true` in each. Any GatewayClasses |
| 266 | +that have a matching `controllerName` but are _not_ Accepted must have the |
| 267 | +`Accepted` Condition sett to `status: false`. |
| 268 | + |
| 269 | +Implementations MAY choose only one GatewayClass out of the pool of otherwise |
| 270 | +acceptable GatewayClasses if they can only reconcile one, or, if they are capable |
| 271 | +of reconciling multiple GatewayClasses, they may also choose as many as they like. |
| 272 | + |
| 273 | +If something in the GatewayClass renders it incompatibie (at the time of writing, |
| 274 | +the only possible reason for this is that there is a pointer to a `paramsRef` |
| 275 | +object that is not supported by the implementation), then the implementation |
| 276 | +SHOULD mark the incompatible GatewayClass as not `Accepted`. |
| 277 | + |
| 278 | +#### Gateway |
| 279 | + |
| 280 | +Gateway objects MUST refer in the `spec.gatewayClassName` field to a GatewayClass |
| 281 | +that exists and is `Accepted` by an implementation for that implementation to |
| 282 | +reconcile them. |
| 283 | + |
| 284 | +Gateway objects that fall out of scope (for example, because the GatewayClass |
| 285 | +they reference was deleted) for reconciliation MAY have their status removed by |
| 286 | +the implementation as part of the delete process, but this is not required. |
| 287 | + |
| 288 | +#### General Route information |
| 289 | + |
| 290 | +All Route objects share some properties: |
| 291 | + |
| 292 | +- They MUST be attached to an in-scope parent for the implementation to consider |
| 293 | +them reconcilable. |
| 294 | +- The implementation MUST update the status for each in-scope Route with the |
| 295 | +relevant Conditions, using the namespaced `parents` field. See the specific Route |
| 296 | +types for details, but this usually includes `Accepted`, `Programmed` and |
| 297 | +`ResovledRefs` Conditions. |
| 298 | +- Routes that fall out of scope SHOULD NOT have status updated, since it's possible |
| 299 | +that these updates may overwrite any new owners. The `observedGeneration` field |
| 300 | +will indicate that any remaining status is out of date. |
| 301 | + |
| 302 | + |
| 303 | +#### HTTPRoute |
| 304 | + |
| 305 | +HTTPRoutes route HTTP traffic that is _unencrypted_ and available for inspection. |
| 306 | +This includes HTTPS traffic that's terminated at the Gateway (since that is then |
| 307 | +decrypted), and allows the HTTPRoute to use HTTP properties, like path, method, |
| 308 | +or headers in its routing directives. |
| 309 | + |
| 310 | +#### TLSRoute |
| 311 | + |
| 312 | +TLSRoutes route encrypted TLS traffic using the SNI header, _without decrypting |
| 313 | +the traffic stream_, to the relevant backends. |
| 314 | + |
| 315 | +#### TCPRoute |
| 316 | + |
| 317 | +TCPRoutes route a TCP stream that arrives at a Listener to one of the given |
| 318 | +backends. |
| 319 | + |
| 320 | +#### UDPRoute |
| 321 | + |
| 322 | +UDPRoutes route UDP packets that arrive at a Listener to one of the given |
| 323 | +backends. |
| 324 | + |
| 325 | +#### ReferenceGrant |
| 326 | + |
| 327 | +ReferenceGrant is a special resource that is used by resource owners in one |
| 328 | +namespace to _selectively_ allow references from Gateway API objects in other |
| 329 | +namespaces. |
| 330 | + |
| 331 | +A ReferenceGrant is created in the same namespace as the thing it's granting |
| 332 | +reference access to, and allows access from other namespaces, from other Kinds, |
| 333 | +or both. |
| 334 | + |
| 335 | +Implementations that support cross-namespace references MUST watch ReferenceGrant |
| 336 | +and reconcile any ReferenceGrant that points to an object that's referred to by |
| 337 | +an in-scope Gateway API object. |
0 commit comments