Skip to content

Conversation

@ubergesundheit
Copy link
Member

No description provided.

brb and others added 30 commits March 12, 2024 18:12
[ upstream commit 26f8349 ]

Marco reported that the following L7 proxy traffic is leaked (bypasses
the WireGuard encryption):

    1. WG: tunnel, L7 egress policy: forward traffic is leaked
    2. WG: tunnel, DNS: all DNS traffic is leaked
    3. WG: native routing, DNS: all DNS traffic is leaked

This was reported before the introduction of the --wireguard-encapsulate
[1].

The tunneling leak cases are obvious. The L7 proxy traffic got
encapsulated by the Cilium's tunneling device. This made it to bypass
the redirection to the Cilium's WireGuard device. However, [1] fixed
this behavior. For Cilium v1.15 (upcoming) nothing needs to be
configured. Meanwhile, for v1.14.4 users need to set
--wireguard-encapsulate=true.

The native routing case is more tricky. The L7 proxy taffic got a src IP
of a host instead of a client pod. So, the redirection was bypassed.
To fix this, we extended the redirection check to identify L7 proxy
traffic.

[1]: cilium#28917

Reported-by: Marco Iorio <[email protected]>
Signed-off-by: Martynas Pumputis <[email protected]>
[ upstream commit 96e01ad ]

Use marks set by the proxy instead of assuming that each pkt from
HOST_ID w/o MARK_MAGIC_HOST belongs to the proxy.

In addition, in the tunneling mode the mark might get reset before
entering wg_maybe_redirect_to_encrypt(), as the proxy packets are
instead routed to from_host@cilium_host. The latter calls
inherit_identity_from_host() which resets the mark. In this case, rely
on the TC index.

Suggested-by: Gray Lian <[email protected]>
Signed-off-by: Martynas Pumputis <[email protected]>
The result of running

```
images/scripts/update-cni-version.sh 1.4.1
```

Signed-off-by: André Martins <[email protected]>
[ upstream commit 2764994 ]

[ backporter's note: Discarded document changes. We'll backport it
  together with other recent document changes. ]

PodCIDR shouldn't take any effect for the unsupported IPAM modes. Modify
ExportPodCIDRReconciler's constructor to not provide ConfigReconciler
for unsupported IPAMs.

Signed-off-by: Yutaro Hayakawa <[email protected]>
Signed-off-by: Jarno Rajahalme <[email protected]>
Generated from https://github.com/cilium/cilium/actions/runs/8266651120.

`quay.io/cilium/cilium:v1.15.2@sha256:bfeb3f1034282444ae8c498dca94044df2b9c9c8e7ac678e0b43c849f0b31746`
`quay.io/cilium/cilium:stable@sha256:bfeb3f1034282444ae8c498dca94044df2b9c9c8e7ac678e0b43c849f0b31746`

`quay.io/cilium/clustermesh-apiserver:v1.15.2@sha256:478c77371f34d6fe5251427ff90c3912567c69b2bdc87d72377e42a42054f1c2`
`quay.io/cilium/clustermesh-apiserver:stable@sha256:478c77371f34d6fe5251427ff90c3912567c69b2bdc87d72377e42a42054f1c2`

`quay.io/cilium/docker-plugin:v1.15.2@sha256:ba4df0d63b48ba6181b6f3df3b747e15f5dfba06ff9ee83f34dd0143c1a9a98c`
`quay.io/cilium/docker-plugin:stable@sha256:ba4df0d63b48ba6181b6f3df3b747e15f5dfba06ff9ee83f34dd0143c1a9a98c`

`quay.io/cilium/hubble-relay:v1.15.2@sha256:48480053930e884adaeb4141259ff1893a22eb59707906c6d38de2fe01916cb0`
`quay.io/cilium/hubble-relay:stable@sha256:48480053930e884adaeb4141259ff1893a22eb59707906c6d38de2fe01916cb0`

`quay.io/cilium/operator-alibabacloud:v1.15.2@sha256:e2dafa4c04ab05392a28561ab003c2894ec1fcc3214a4dfe2efd6b7d58a66650`
`quay.io/cilium/operator-alibabacloud:stable@sha256:e2dafa4c04ab05392a28561ab003c2894ec1fcc3214a4dfe2efd6b7d58a66650`

`quay.io/cilium/operator-aws:v1.15.2@sha256:3f459999b753bfd8626f8effdf66720a996b2c15c70f4e418011d00de33552eb`
`quay.io/cilium/operator-aws:stable@sha256:3f459999b753bfd8626f8effdf66720a996b2c15c70f4e418011d00de33552eb`

`quay.io/cilium/operator-azure:v1.15.2@sha256:568293cebc27c01a39a9341b1b2578ebf445228df437f8b318adbbb2c4db842a`
`quay.io/cilium/operator-azure:stable@sha256:568293cebc27c01a39a9341b1b2578ebf445228df437f8b318adbbb2c4db842a`

`quay.io/cilium/operator-generic:v1.15.2@sha256:4dd8f67630f45fcaf58145eb81780b677ef62d57632d7e4442905ad3226a9088`
`quay.io/cilium/operator-generic:stable@sha256:4dd8f67630f45fcaf58145eb81780b677ef62d57632d7e4442905ad3226a9088`

`quay.io/cilium/operator:v1.15.2@sha256:e592ceba377985eb4225b0da9121d0f8c68a564ea38e5732bd6d59005eb87c08`
`quay.io/cilium/operator:stable@sha256:e592ceba377985eb4225b0da9121d0f8c68a564ea38e5732bd6d59005eb87c08`

Signed-off-by: Jarno Rajahalme <[email protected]>
[ upstream commit e17cf21 ]

This variable used to be used in combination with the Sibz/github-status-action
action, which we replaced with myrotvorets/set-commit-status-action when
reworking the workflows to be triggered by Ariane [1]. Given it is now
unused, let's get rid of the leftover environment variable, so that we
also stop copying it to new workflows.

[1]: 9949c5a ("ci: rework workflows to be triggered by Ariane")

Signed-off-by: Marco Iorio <[email protected]>
Signed-off-by: Marco Iorio <[email protected]>
[ upstream commit 394b3de ]

Let's define kind-related variables (i.e., version, k8s image and k8s
version) inside the set-env-variables action. One all consumers will
have been migrated through the subsequent commit, this will ensure
consistency across workflows, simplify version bumps as well as the
introduction of new workflows depending on them. One extra byproduct
is that renovate updates will also stop requesting reviews from all
the different teams owning each specific workflow.

Signed-off-by: Marco Iorio <[email protected]>
Signed-off-by: Marco Iorio <[email protected]>
[ upstream commit aabdfa7 ]

Let's switch all the workflows over to using the globally defined
kind-related variables, and remove the workflow specific definitions.
This also addresses a few cases which didn't specify any version.

Signed-off-by: Marco Iorio <[email protected]>
Signed-off-by: Marco Iorio <[email protected]>
[ upstream commit 39637d6 ]

They will never, because no CNI is present at that point. Hence, let's
just avoid wasting one minute waiting for the timeout to expire.

Signed-off-by: Marco Iorio <[email protected]>
Signed-off-by: Marco Iorio <[email protected]>
[ upstream commit 6716a9c ]

Currently, the GHA workflows running tests triggered on pull_request
and/or push events initially checkout the default branch to configure
the environment variables, before retrieving the PR head. However,
this is problematic on stable branches, as we then end up using the
variables from the default (i.e., main) branch (e.g., Kubernetes
version, Cilium CLI version), which may not be appropriate here.

Hence, let's change the initial checkout to retrieve the target (i.e.,
base) branch, falling back to the commit in case of push events.
This ensure that we retrieve the variables from the correct branch,
and matches the behavior of Ariane triggered workflows.

Signed-off-by: Marco Iorio <[email protected]>
Kind has released stable versions for k8s 1.29 so we can use this
image instead of the cilium kindest for ginkgo tests. The same
version has already been configured for the rest of the workflows
in the previous commits.

Signed-off-by: Marco Iorio <[email protected]>
[ upstream commit 70b405f ]

On Linux/Unix based implementations, exec/cmd.Run will return either context.ContextCancelled or the error "signal: killed" depending on whether the cancellation occurred while the process was running.

There's several places we check on ```is.Errors(err, context.Cancelled)``` on whether to emit high level logs about failed program compilations.  Because already running cmd.Run() doesn't return an error that satisfies this, this will result in spurious error logs about failed compilation (i.e. "signal: killed")

This meant that in cases where a compilation is legitimately cancelled, we would still log an error such as

msg="BPF template object creation failed" ... error="...: compile bpf_lxc.o: signal: killed"

This can occur occasionally in CI, which enforces no error to pass, causing failures.

example:
```
	ctx, c := context.WithTimeout(context.Background(), time.Second)
	go func() {
		time.Sleep(time.Second)
		c()
	}()
	cmd := exec.CommandContext(ctx, "sleep", "2")
	fmt.Println(cmd.Run())

	ctx, c = context.WithTimeout(context.Background(), time.Second)
	c()
	cmd = exec.CommandContext(ctx, "sleep", "2")
	fmt.Println(cmd.Run())
```

To fix this, this will join in the ctx.Err() if it is:
* context.Cancelled
* The process has not exited itself.
* The process appeared to be SIGKILL'ed.

Addresses: cilium#30991

Signed-off-by: Tom Hadlaw <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit a099bf1 ]

Unlike runtime agent/operator logs, CNI logs are just written to disk so we have no way to attach timestamps to them.
This makes it harder to debug CNI issues as we have no way to correlate when things happened between Agent logs and CNI events.

This switches CNI to use the same default logger, except with timestamps enabled.

Signed-off-by: Tom Hadlaw <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit d50c2e4 ]

This is mainly to pick up the new GRPC Conformance tests added recently.

Relates: kubernetes-sigs/gateway-api#2745
Signed-off-by: Tam Mach <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit f77f3c3 ]

This is to remove our manual GRPCRoute test, in favour of more
comprehensive tests added recently in upstream.

Signed-off-by: Tam Mach <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 927969b ]

This commit slightly changes the behavior of the "encrypt flush"
command in case of errors when trying to delete XFRM rules. The tool
currently lists rules, filters them based on user-given arguments, and
then deletes them. If an XFRM rule is deleted by the agent or the user
while we're filtering, the deletion will fail.

The current behavior in that case is to fatal. On busy clusters, that
might mean that we always fatal because XFRM states and policies are
constently added and removed.

This commit changes the behavior to proceed with subsequent deletions in
case one fails.

Signed-off-by: Paul Chaignon <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 5c2a67f ]

This commit refactors the code a bit simplify a latter commit. No
functional changes.

This may be a bit excessive in commit splitting, but at least I can
claim my last commit is free of any refactoring 😅

Signed-off-by: Paul Chaignon <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 5eb27e2 ]

This new flag will allow users to clean stale XFRM states and policies
based on the node ID map contents. If XFRM states or policies are found
with a node ID that is not in the BPF map, then we probably have a leak
somewhere.

Such leaks can lead in extreme cases to performance degradation when the
number of XFRM states and policies grows large (and if using ENI or
Azure IPAM). Having a tool to cleanup these XFRM states and policies
until the leak is fixed can therefore be critical.

The new flag is incompatible with the --spi and --node-id filter flags.

We first dump the XFRM rules and then dump the map content. In that way,
if a new node ID is allocated while we're running the tool, we will
simply ignore the corresponding XFRM rules. If a node ID is removed
while running the tool, we will fail to remove the corresponding XFRM
rules and continue with the others.

Tested on a GKE cluster by adding fake XFRM states and policies that the
tool was able to remove.

Signed-off-by: Paul Chaignon <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit f92b528 ]

The CNI version should be specify so that in case we have to fallback
the installation of k8s via binaries it doesn't fail with the error:

```
10:29:25      k8s1-1.25: gzip: stdin: not in gzip format
10:29:25      k8s1-1.25: tar: Child returned status 1
10:29:25      k8s1-1.25: tar: Error is not recoverable: exiting now
```

Fixes: ce69afd ("add support for k8s 1.25.0")
Signed-off-by: André Martins <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 070e4de ]

The optimization to reuse the last used "indexReadTxn" was broken
when a WriteTxn was first used for a read and then for a write
followed by a read (the last read re-used an old transaction).

Add test to observe this bug.

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 52944a0 ]

[ backporter's note: minor conflicts due to different variables' names]

The optimization in *txn to hold the last used index read transaction
to avoid a hash map lookup was broken as the write txn used for reading
was cloned, leading to not being able to read any of the future writes
in the same transaction.

Benchmark show that this optimization does not actually help, so solve
the problem by removing the optimization.

Before:
BenchmarkDB_RandomLookup-8       1000000              1471 ns/op
After:
BenchmarkDB_RandomLookup-8       1000000              1485 ns/op

Fixes: d0d4d46 ("statedb: Store index unique info in the index tree entry")
Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 7a301a4 ]

This commit adds the GH workflow to run on arm machines. This
effectively means that we can remove our travis integration and only use
GH actions from now on.

Signed-off-by: André Martins <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit f31cbef ]

Prior to commit c49ef45, the prometheus metrics server was
disabled by default in Cilium. Typically we would expect users to
specify in helm both `prometheus.enabled` and `prometheus.port` to
determine how to configure the prometheus server to ensure that the
prometheus port doesn't also conflict with other services that the user
is running on their nodes in their clusters.

With the refactor in the aforementioned commit, the default was set to
`:9962`. This means that even if the user installed Cilium without
prometheus settings, or explicitly configured helm with
`prometheus.enabled: false`, the prometheus metrics server would be
enabled.

This patch reverts the default back to the pre-v1.14 default.

Fixes: c49ef45 ("metrics: Modularize daemon metrics registry")
Signed-off-by: Joe Stringer <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit b68cf99 ]

This is to extend the existing basic ingress docs with external lockdown
CCNP, while still allows in-cluster traffic to Ingress LB IP.

Relates: cilium#28126
Signed-off-by: Tam Mach <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit db7b3ef ]

Suppress the "Unable to determine next hop address" logs. While it shows
the L2 neighbor resolution failure, it does not always indicate a
datapath connectivity issue.

For example, when "devices=eth+" is specified and the device
naming/purposing is not consistent across the nodes in the cluster, in
some nodes "eth1" is a device reachable to other nodes, but in some
nodes, it is not. As a result, L2 Discovery generates an "Unable to
determine next hop address".

Another example is ENI mode with automatic device detection. When
secondary interfaces are added, they are used for L2 Neighbor Discovery
as well. However, nodes can only be reached via the primary interface
through the default route in the main routing table. Thus, L2 Neighbor
Discovery generates "Unable to determine next hop address" for secondary
interfaces.

In both cases, it does not always mean the datapath has an issue for KPR
functionality. However, the logs appear repeatedly, are noisy, and the
message is error-ish, causing confusion.

This log started to appear for some users who did not see it before from
v1.14.3 (cilium#28858) and v1.15.0 (in theory). For v1.14.3, it affects KPR +
Tunnel users because of f2dcc86. Before the commit, we did not perform
L2 Neighbor Discovery in tunnel mode, so even if users had an interface
unreachable to other nodes, the log did not appear.

For v1.15.0, it affects to the users who used to have the unreachable
interface. 2058ed6 made it visible. Before the commit, some kind of the
errors like EHOSTUNREACH and ENETUNREACH were not caught because
FIBMatch option didn't specified. After v1.15.0, users started to see
the log.

Fixes: cilium#28858

Signed-off-by: Yutaro Hayakawa <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 77a0c6b ]

Let's additionally enable host firewall on a couple of existing matrix
entries associated with KPR disabled, so that we can additionally cover
this configuration and prevent regressions.

Signed-off-by: Marco Iorio <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 777b580 ]

Set ConnectionRetryTimeSeconds in the component tests to 1s in component
tests unless it is specified explicitly. Otherwise, when the initial
connect fails, we need to 120s for the next connection by default, which
may longer than the timeout of the test itself.

Fixes: cilium#31217

Signed-off-by: Yutaro Hayakawa <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 81f14bb ]

This commit adjusts the usage of send_trace_notify in bpf_network.c to
enable monitor aggregation for all events emitted at this observation
point in the datapath. This change helps improve resource usage by
reducing the overall number of events that the datapath emits, while
still enabling packet observability with Hubble.

The events in bpf_network.c enable observability into the IPSec
processing of the datapath. Before this commit, multiple other efforts
have been made to increase the aggregation of events related to IPSec to
reduce resource usage, see cilium#29616 and cilium#27168. These efforts were related
to packets that were specifically marked as encrypted or decrypted by
IPSec and did not include events in bpf_network.c that were emitted when
either: (a) a plaintext packet has been received from the network, or
(b) a packet was decrypted and reinserted into the stack by XFRM. Both
of these events are candidates for aggregation because similar to-stack
events will be emitted down the line in the datapath anyways.
Additionally, these events are mainly useful for root-cause
analysis or debugging and are not necessarily helpful from an overall
observability standpoint.

Signed-off-by: Ryan Drew <[email protected]>
Signed-off-by: Gilberto Bertin <[email protected]>
rgo3 and others added 28 commits April 18, 2024 11:32
Signed-off-by: renovate[bot] <[email protected]>
Signed-off-by: Julian Wiedmann <[email protected]>
[ upstream commit 7da6514 ]

The firstGlobalAddr in pkg/node tried to pick public IPs over private IPs
even after picking by scope. Include this logic in the address sorting
and add a test case to check the different sorting predicates.

For NodePort pick the first private address if any, otherwise pick
the first public address.

Fixes: 5342d01 ("datapath/tables: Add Table[NodeAddress]")
Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 100e625 ]

This prevents possible shenanigans caused by search domains possibly
configured on the runner, and propagated to the pods.

Signed-off-by: Marco Iorio <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 302604d ]

Signed-off-by: Joe Stringer <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 6f0a059 ]

GitHub changed the URL for the classic projects that we are currently
using to track patch releases. Fix the link.

Signed-off-by: Joe Stringer <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 804d5f0 ]

When number of concurrent dns requests was moderately high, there was a
chance that some of the gorutines would get stuck waiting for response.
Contains fix from cilium/dns#10

Signed-off-by: Marcel Zieba <[email protected]>
Signed-off-by: Zhichuan Liang <[email protected]>
[ upstream commit c869e6c ]

Do not return an error from xds server when the context is cancelled, as
this is part of normal operation, and we test for this in
server_e2e_test.

This resolves a test flake:

panic: Fail in goroutine after  has completed

Signed-off-by: Jarno Rajahalme <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 3cde59c ]

The main test goroutine might be completed before checks on the server
goroutine are completed, hence cause the below panic issue. This commit
is to defer the streamDone channel close to make sure the error check on
the stream server is done before returning from the test. We keep the
time check on the wait in the end of each test to not stall the tests in
case the stream server fails to exit.

Panic error
```
panic: Fail in goroutine after Test/ServerSuite/TestRequestStaleNonce has completed
```

Testing was done as per below:
```
$ go test -count 500 -run Test/ServerSuite/TestRequestStaleNonce ./pkg/envoy/xds/...
ok      github.com/cilium/cilium/pkg/envoy/xds  250.866s
```

Fixes: cilium#31855
Signed-off-by: Tam Mach <[email protected]>
Signed-off-by: Jarno Rajahalme <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit df6afbd ]

[ backporter's note: moved changes from pkg/envoy/xds/stream_test.go to
pkg/envoy/xds/stream.go as v1.15 doesn't have the former file ]

Return io.EOF if test channel was closed, rather than returning a nil
request. This mimics the behavior of generated gRPC code, which never
returns nil request with a nil error.

This resolves a test flake with this error logs:

time="2024-04-16T08:46:23+02:00" level=error msg="received nil request from xDS stream; stopping xDS stream handling" subsys=xds xdsClientNode="node0~10.0.0.0~node0~bar" xdsStreamID=1

Signed-off-by: Jarno Rajahalme <[email protected]>
Signed-off-by: Zhichuan Liang <[email protected]>
[ upstream commit d3c1fee ]

Speed up tests by eliminating CacheUpdateDelay, as it is generally not
needed.

When needed, replace with IsCompletedInTimeChecker that waits for upto
MaxCompletionDuration before returning, in contrast with
IsCompletedChecker that only returns the current state without any wait.

This change makes the server_e2e_test tests run >1000x faster.

Signed-off-by: Jarno Rajahalme <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 4efe9dd ]

TestRequestStaleNonce test code was written with the assumption that no
response would be reveived for a request with a stale nonce, and a second
SendRequest was done right after with the correct nonce value. This
caused two responses to be returned, and the first one could have been
with the old version of the resources.

Remove this duplicate SendRequest.

This resolves test flakes like this:

        --- FAIL: Test/ServerSuite/TestRequestStaleNonce (0.00s)
            server_e2e_test.go:784:
                ... response *discoveryv3.DiscoveryResponse = &discoveryv3.DiscoveryResponse{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), VersionInfo:"3", Resources:[]*anypb.Any{(*anypb.Any)(0x40003a63c0), (*anypb.Any)(0x40003a6410)}, Canary:false, TypeUrl:"type.googleapis.com/envoy.config.v3.DummyConfiguration", Nonce:"3", ControlPlane:(*corev3.ControlPlane)(nil)} ("version_info:\"3\" resources:{[type.googleapis.com/envoy.config.route.v3.RouteConfiguration]:{name:\"resource1\"}} resources:{[type.googleapis.com/envoy.config.route.v3.RouteConfiguration]:{name:\"resource0\"}} type_url:\"type.googleapis.com/envoy.config.v3.DummyConfiguration\" nonce:\"3\"")
                ... VersionInfo string = "4"
                ... Resources []protoreflect.ProtoMessage = []protoreflect.ProtoMessage{(*routev3.RouteConfiguration)(0xe45380)}
                ... Canary bool = false
                ... TypeUrl string = "type.googleapis.com/envoy.config.v3.DummyConfiguration"

Signed-off-by: Jarno Rajahalme <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 75d144c ]

Stream timeout is a duration we use in tests to make sure the stream does
not stall for too long. In production we do not have such a timeout at
all, and in fact the requests are long-lived and responses are only sent
when there is something (new) to send.

Test stream timeout was 2 seconds, and it would occasionally cause a test
flake, especially if debug logging is enabled. This seems to happen due
to goroutine scheduling, and for this reason debug logging should not be
on for these tests.

Bump the test stream timeout to 4 seconds to further reduce the chance of
a test flake due to it.

Signed-off-by: Jarno Rajahalme <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 350e9d3 ]

The goal being to slow down the rollout process, to better highlight
possible connection disruption occurring in the meanwhile. At the same
time, this also reduces the overall CPU load caused by datapath
recompilation, which is a possible additional cause for connection
disruption flakiness.

Signed-off-by: Marco Iorio <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 7d2505e ]

The default IPAM mode is cluster-pool, which gets automatically
overwritten by the Cilium CLI to kubernetes when running on kind.
However, the default helm value gets restored upon upgrade due to
--reset-values, causing confusion and possible issues. Hence, let's
explicitly configure it to kubernetes, to prevent changes.

Similarly, let's configure a single replica for the operator.

Signed-off-by: Marco Iorio <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit a0d7d37 ]

So that it gets actually executed.

Signed-off-by: Marco Iorio <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 1e28a10 ]

Hubble relay is not deployed in this workflow, hence it doesn't make
sense to wait for the image availability.

Signed-off-by: Marco Iorio <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 0c211e1 ]

As it simplifies troubleshooting possible connection disruptions.
However, let's configure monitor aggregation to medium (i.e., the
maximum, and default value) to avoid the performance penalty due
to the relatively high traffic load.

Signed-off-by: Marco Iorio <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit aef6814 ]

[ backporter's note: minor conflicts in pkg/k8s/apis/cilium.io/const.go ]

Having uid in security labels will significantly increase the number of
identities, not to mention about high cardinality in metrics. This
commit is to add *controller-uid related labels into default exclusion
list.

Signed-off-by: Tam Mach <[email protected]>
Signed-off-by: Zhichuan Liang <[email protected]>
[ upstream commit 9dc89f7 ]

Fixes: 5c06c8e ("ci-eks: Add IPsec key rotation tests")
Signed-off-by: Marco Iorio <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 9b35bc5 ]

bpffs directory paths cannot contain the character ".", thus we must
sanitize device names that contain any "." characters. Our solution is
to replace "." with "-". This introduces a risk of naming collisions,
e.g. "eth.0" and "eth-0", in practice the probability of this happening
should be very small.

Fixes: cilium#31813

Signed-off-by: Robin Gögge <[email protected]>
Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 1d56157 ]

[ backporter's notes: replaced the nonroot base image with the root one,
  to avoid requiring the Helm changes to configure the fsGroup, which
  could cause issues if users only updated the image version, without
  a full helm upgrade. ]

gops needs to write data (e.g., the PID file) to the file-system, which
turned out to be tricky when using scratch as base image, in case the
container is then run using a non-root UID.

Let's use the most basic version of a distroless image instead, which
contains:

- ca-certificates
- A /etc/passwd entry for a root, nonroot and nobody users
- A /tmp directory
- tzdata

This aligns the clustermesh-apiserver image with the Hubble Relay one,
and removes the need for manually importing the CA certificates. The
GOPS_CONFIG_DIR is explicitly configured to use a temporary directory,
to prevent permission issues depending on the UID configured to run
the entrypoint.

Finally, we explicitly configure the fsGroup as part of the
podSecurityContext, to ensure that mounted files are accessible
by the non-root user as well.

Signed-off-by: Marco Iorio <[email protected]>
[ upstream commit 8521880 ]

Configure the specified clustermesh-apiserver etcd container security
context for the etcd-init container as well, to make sure that they
always match, and prevent issues caused by the init container creating
files that cannot be read/written by the main instance later on.

Signed-off-by: Marco Iorio <[email protected]>
@ubergesundheit ubergesundheit merged commit 266bc7a into v1.15 Apr 25, 2024
@ubergesundheit ubergesundheit deleted the update-from-upstream-v1.15.4 branch April 25, 2024 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.