forked from cilium/cilium
-
Notifications
You must be signed in to change notification settings - Fork 0
Update from upstream v1.15.4 #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ upstream commit 26f8349 ] Marco reported that the following L7 proxy traffic is leaked (bypasses the WireGuard encryption): 1. WG: tunnel, L7 egress policy: forward traffic is leaked 2. WG: tunnel, DNS: all DNS traffic is leaked 3. WG: native routing, DNS: all DNS traffic is leaked This was reported before the introduction of the --wireguard-encapsulate [1]. The tunneling leak cases are obvious. The L7 proxy traffic got encapsulated by the Cilium's tunneling device. This made it to bypass the redirection to the Cilium's WireGuard device. However, [1] fixed this behavior. For Cilium v1.15 (upcoming) nothing needs to be configured. Meanwhile, for v1.14.4 users need to set --wireguard-encapsulate=true. The native routing case is more tricky. The L7 proxy taffic got a src IP of a host instead of a client pod. So, the redirection was bypassed. To fix this, we extended the redirection check to identify L7 proxy traffic. [1]: cilium#28917 Reported-by: Marco Iorio <[email protected]> Signed-off-by: Martynas Pumputis <[email protected]>
[ upstream commit 96e01ad ] Use marks set by the proxy instead of assuming that each pkt from HOST_ID w/o MARK_MAGIC_HOST belongs to the proxy. In addition, in the tunneling mode the mark might get reset before entering wg_maybe_redirect_to_encrypt(), as the proxy packets are instead routed to from_host@cilium_host. The latter calls inherit_identity_from_host() which resets the mark. In this case, rely on the TC index. Suggested-by: Gray Lian <[email protected]> Signed-off-by: Martynas Pumputis <[email protected]>
The result of running ``` images/scripts/update-cni-version.sh 1.4.1 ``` Signed-off-by: André Martins <[email protected]>
Signed-off-by: André Martins <[email protected]>
[ upstream commit 2764994 ] [ backporter's note: Discarded document changes. We'll backport it together with other recent document changes. ] PodCIDR shouldn't take any effect for the unsupported IPAM modes. Modify ExportPodCIDRReconciler's constructor to not provide ConfigReconciler for unsupported IPAMs. Signed-off-by: Yutaro Hayakawa <[email protected]>
Signed-off-by: Jarno Rajahalme <[email protected]>
Generated from https://github.com/cilium/cilium/actions/runs/8266651120. `quay.io/cilium/cilium:v1.15.2@sha256:bfeb3f1034282444ae8c498dca94044df2b9c9c8e7ac678e0b43c849f0b31746` `quay.io/cilium/cilium:stable@sha256:bfeb3f1034282444ae8c498dca94044df2b9c9c8e7ac678e0b43c849f0b31746` `quay.io/cilium/clustermesh-apiserver:v1.15.2@sha256:478c77371f34d6fe5251427ff90c3912567c69b2bdc87d72377e42a42054f1c2` `quay.io/cilium/clustermesh-apiserver:stable@sha256:478c77371f34d6fe5251427ff90c3912567c69b2bdc87d72377e42a42054f1c2` `quay.io/cilium/docker-plugin:v1.15.2@sha256:ba4df0d63b48ba6181b6f3df3b747e15f5dfba06ff9ee83f34dd0143c1a9a98c` `quay.io/cilium/docker-plugin:stable@sha256:ba4df0d63b48ba6181b6f3df3b747e15f5dfba06ff9ee83f34dd0143c1a9a98c` `quay.io/cilium/hubble-relay:v1.15.2@sha256:48480053930e884adaeb4141259ff1893a22eb59707906c6d38de2fe01916cb0` `quay.io/cilium/hubble-relay:stable@sha256:48480053930e884adaeb4141259ff1893a22eb59707906c6d38de2fe01916cb0` `quay.io/cilium/operator-alibabacloud:v1.15.2@sha256:e2dafa4c04ab05392a28561ab003c2894ec1fcc3214a4dfe2efd6b7d58a66650` `quay.io/cilium/operator-alibabacloud:stable@sha256:e2dafa4c04ab05392a28561ab003c2894ec1fcc3214a4dfe2efd6b7d58a66650` `quay.io/cilium/operator-aws:v1.15.2@sha256:3f459999b753bfd8626f8effdf66720a996b2c15c70f4e418011d00de33552eb` `quay.io/cilium/operator-aws:stable@sha256:3f459999b753bfd8626f8effdf66720a996b2c15c70f4e418011d00de33552eb` `quay.io/cilium/operator-azure:v1.15.2@sha256:568293cebc27c01a39a9341b1b2578ebf445228df437f8b318adbbb2c4db842a` `quay.io/cilium/operator-azure:stable@sha256:568293cebc27c01a39a9341b1b2578ebf445228df437f8b318adbbb2c4db842a` `quay.io/cilium/operator-generic:v1.15.2@sha256:4dd8f67630f45fcaf58145eb81780b677ef62d57632d7e4442905ad3226a9088` `quay.io/cilium/operator-generic:stable@sha256:4dd8f67630f45fcaf58145eb81780b677ef62d57632d7e4442905ad3226a9088` `quay.io/cilium/operator:v1.15.2@sha256:e592ceba377985eb4225b0da9121d0f8c68a564ea38e5732bd6d59005eb87c08` `quay.io/cilium/operator:stable@sha256:e592ceba377985eb4225b0da9121d0f8c68a564ea38e5732bd6d59005eb87c08` Signed-off-by: Jarno Rajahalme <[email protected]>
[ upstream commit e17cf21 ] This variable used to be used in combination with the Sibz/github-status-action action, which we replaced with myrotvorets/set-commit-status-action when reworking the workflows to be triggered by Ariane [1]. Given it is now unused, let's get rid of the leftover environment variable, so that we also stop copying it to new workflows. [1]: 9949c5a ("ci: rework workflows to be triggered by Ariane") Signed-off-by: Marco Iorio <[email protected]> Signed-off-by: Marco Iorio <[email protected]>
[ upstream commit 394b3de ] Let's define kind-related variables (i.e., version, k8s image and k8s version) inside the set-env-variables action. One all consumers will have been migrated through the subsequent commit, this will ensure consistency across workflows, simplify version bumps as well as the introduction of new workflows depending on them. One extra byproduct is that renovate updates will also stop requesting reviews from all the different teams owning each specific workflow. Signed-off-by: Marco Iorio <[email protected]> Signed-off-by: Marco Iorio <[email protected]>
[ upstream commit aabdfa7 ] Let's switch all the workflows over to using the globally defined kind-related variables, and remove the workflow specific definitions. This also addresses a few cases which didn't specify any version. Signed-off-by: Marco Iorio <[email protected]> Signed-off-by: Marco Iorio <[email protected]>
[ upstream commit 39637d6 ] They will never, because no CNI is present at that point. Hence, let's just avoid wasting one minute waiting for the timeout to expire. Signed-off-by: Marco Iorio <[email protected]> Signed-off-by: Marco Iorio <[email protected]>
[ upstream commit 6716a9c ] Currently, the GHA workflows running tests triggered on pull_request and/or push events initially checkout the default branch to configure the environment variables, before retrieving the PR head. However, this is problematic on stable branches, as we then end up using the variables from the default (i.e., main) branch (e.g., Kubernetes version, Cilium CLI version), which may not be appropriate here. Hence, let's change the initial checkout to retrieve the target (i.e., base) branch, falling back to the commit in case of push events. This ensure that we retrieve the variables from the correct branch, and matches the behavior of Ariane triggered workflows. Signed-off-by: Marco Iorio <[email protected]>
Kind has released stable versions for k8s 1.29 so we can use this image instead of the cilium kindest for ginkgo tests. The same version has already been configured for the rest of the workflows in the previous commits. Signed-off-by: Marco Iorio <[email protected]>
[ upstream commit 70b405f ] On Linux/Unix based implementations, exec/cmd.Run will return either context.ContextCancelled or the error "signal: killed" depending on whether the cancellation occurred while the process was running. There's several places we check on ```is.Errors(err, context.Cancelled)``` on whether to emit high level logs about failed program compilations. Because already running cmd.Run() doesn't return an error that satisfies this, this will result in spurious error logs about failed compilation (i.e. "signal: killed") This meant that in cases where a compilation is legitimately cancelled, we would still log an error such as msg="BPF template object creation failed" ... error="...: compile bpf_lxc.o: signal: killed" This can occur occasionally in CI, which enforces no error to pass, causing failures. example: ``` ctx, c := context.WithTimeout(context.Background(), time.Second) go func() { time.Sleep(time.Second) c() }() cmd := exec.CommandContext(ctx, "sleep", "2") fmt.Println(cmd.Run()) ctx, c = context.WithTimeout(context.Background(), time.Second) c() cmd = exec.CommandContext(ctx, "sleep", "2") fmt.Println(cmd.Run()) ``` To fix this, this will join in the ctx.Err() if it is: * context.Cancelled * The process has not exited itself. * The process appeared to be SIGKILL'ed. Addresses: cilium#30991 Signed-off-by: Tom Hadlaw <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit a099bf1 ] Unlike runtime agent/operator logs, CNI logs are just written to disk so we have no way to attach timestamps to them. This makes it harder to debug CNI issues as we have no way to correlate when things happened between Agent logs and CNI events. This switches CNI to use the same default logger, except with timestamps enabled. Signed-off-by: Tom Hadlaw <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit d50c2e4 ] This is mainly to pick up the new GRPC Conformance tests added recently. Relates: kubernetes-sigs/gateway-api#2745 Signed-off-by: Tam Mach <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit f77f3c3 ] This is to remove our manual GRPCRoute test, in favour of more comprehensive tests added recently in upstream. Signed-off-by: Tam Mach <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 927969b ] This commit slightly changes the behavior of the "encrypt flush" command in case of errors when trying to delete XFRM rules. The tool currently lists rules, filters them based on user-given arguments, and then deletes them. If an XFRM rule is deleted by the agent or the user while we're filtering, the deletion will fail. The current behavior in that case is to fatal. On busy clusters, that might mean that we always fatal because XFRM states and policies are constently added and removed. This commit changes the behavior to proceed with subsequent deletions in case one fails. Signed-off-by: Paul Chaignon <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 5c2a67f ] This commit refactors the code a bit simplify a latter commit. No functional changes. This may be a bit excessive in commit splitting, but at least I can claim my last commit is free of any refactoring 😅 Signed-off-by: Paul Chaignon <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 5eb27e2 ] This new flag will allow users to clean stale XFRM states and policies based on the node ID map contents. If XFRM states or policies are found with a node ID that is not in the BPF map, then we probably have a leak somewhere. Such leaks can lead in extreme cases to performance degradation when the number of XFRM states and policies grows large (and if using ENI or Azure IPAM). Having a tool to cleanup these XFRM states and policies until the leak is fixed can therefore be critical. The new flag is incompatible with the --spi and --node-id filter flags. We first dump the XFRM rules and then dump the map content. In that way, if a new node ID is allocated while we're running the tool, we will simply ignore the corresponding XFRM rules. If a node ID is removed while running the tool, we will fail to remove the corresponding XFRM rules and continue with the others. Tested on a GKE cluster by adding fake XFRM states and policies that the tool was able to remove. Signed-off-by: Paul Chaignon <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit f92b528 ] The CNI version should be specify so that in case we have to fallback the installation of k8s via binaries it doesn't fail with the error: ``` 10:29:25 k8s1-1.25: gzip: stdin: not in gzip format 10:29:25 k8s1-1.25: tar: Child returned status 1 10:29:25 k8s1-1.25: tar: Error is not recoverable: exiting now ``` Fixes: ce69afd ("add support for k8s 1.25.0") Signed-off-by: André Martins <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 070e4de ] The optimization to reuse the last used "indexReadTxn" was broken when a WriteTxn was first used for a read and then for a write followed by a read (the last read re-used an old transaction). Add test to observe this bug. Signed-off-by: Jussi Maki <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 52944a0 ] [ backporter's note: minor conflicts due to different variables' names] The optimization in *txn to hold the last used index read transaction to avoid a hash map lookup was broken as the write txn used for reading was cloned, leading to not being able to read any of the future writes in the same transaction. Benchmark show that this optimization does not actually help, so solve the problem by removing the optimization. Before: BenchmarkDB_RandomLookup-8 1000000 1471 ns/op After: BenchmarkDB_RandomLookup-8 1000000 1485 ns/op Fixes: d0d4d46 ("statedb: Store index unique info in the index tree entry") Signed-off-by: Jussi Maki <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 7a301a4 ] This commit adds the GH workflow to run on arm machines. This effectively means that we can remove our travis integration and only use GH actions from now on. Signed-off-by: André Martins <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit f31cbef ] Prior to commit c49ef45, the prometheus metrics server was disabled by default in Cilium. Typically we would expect users to specify in helm both `prometheus.enabled` and `prometheus.port` to determine how to configure the prometheus server to ensure that the prometheus port doesn't also conflict with other services that the user is running on their nodes in their clusters. With the refactor in the aforementioned commit, the default was set to `:9962`. This means that even if the user installed Cilium without prometheus settings, or explicitly configured helm with `prometheus.enabled: false`, the prometheus metrics server would be enabled. This patch reverts the default back to the pre-v1.14 default. Fixes: c49ef45 ("metrics: Modularize daemon metrics registry") Signed-off-by: Joe Stringer <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit b68cf99 ] This is to extend the existing basic ingress docs with external lockdown CCNP, while still allows in-cluster traffic to Ingress LB IP. Relates: cilium#28126 Signed-off-by: Tam Mach <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit db7b3ef ] Suppress the "Unable to determine next hop address" logs. While it shows the L2 neighbor resolution failure, it does not always indicate a datapath connectivity issue. For example, when "devices=eth+" is specified and the device naming/purposing is not consistent across the nodes in the cluster, in some nodes "eth1" is a device reachable to other nodes, but in some nodes, it is not. As a result, L2 Discovery generates an "Unable to determine next hop address". Another example is ENI mode with automatic device detection. When secondary interfaces are added, they are used for L2 Neighbor Discovery as well. However, nodes can only be reached via the primary interface through the default route in the main routing table. Thus, L2 Neighbor Discovery generates "Unable to determine next hop address" for secondary interfaces. In both cases, it does not always mean the datapath has an issue for KPR functionality. However, the logs appear repeatedly, are noisy, and the message is error-ish, causing confusion. This log started to appear for some users who did not see it before from v1.14.3 (cilium#28858) and v1.15.0 (in theory). For v1.14.3, it affects KPR + Tunnel users because of f2dcc86. Before the commit, we did not perform L2 Neighbor Discovery in tunnel mode, so even if users had an interface unreachable to other nodes, the log did not appear. For v1.15.0, it affects to the users who used to have the unreachable interface. 2058ed6 made it visible. Before the commit, some kind of the errors like EHOSTUNREACH and ENETUNREACH were not caught because FIBMatch option didn't specified. After v1.15.0, users started to see the log. Fixes: cilium#28858 Signed-off-by: Yutaro Hayakawa <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 77a0c6b ] Let's additionally enable host firewall on a couple of existing matrix entries associated with KPR disabled, so that we can additionally cover this configuration and prevent regressions. Signed-off-by: Marco Iorio <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 777b580 ] Set ConnectionRetryTimeSeconds in the component tests to 1s in component tests unless it is specified explicitly. Otherwise, when the initial connect fails, we need to 120s for the next connection by default, which may longer than the timeout of the test itself. Fixes: cilium#31217 Signed-off-by: Yutaro Hayakawa <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
[ upstream commit 81f14bb ] This commit adjusts the usage of send_trace_notify in bpf_network.c to enable monitor aggregation for all events emitted at this observation point in the datapath. This change helps improve resource usage by reducing the overall number of events that the datapath emits, while still enabling packet observability with Hubble. The events in bpf_network.c enable observability into the IPSec processing of the datapath. Before this commit, multiple other efforts have been made to increase the aggregation of events related to IPSec to reduce resource usage, see cilium#29616 and cilium#27168. These efforts were related to packets that were specifically marked as encrypted or decrypted by IPSec and did not include events in bpf_network.c that were emitted when either: (a) a plaintext packet has been received from the network, or (b) a packet was decrypted and reinserted into the stack by XFRM. Both of these events are candidates for aggregation because similar to-stack events will be emitted down the line in the datapath anyways. Additionally, these events are mainly useful for root-cause analysis or debugging and are not necessarily helpful from an overall observability standpoint. Signed-off-by: Ryan Drew <[email protected]> Signed-off-by: Gilberto Bertin <[email protected]>
Fixes: cilium#31944 Signed-off-by: Robin Gögge <[email protected]>
Signed-off-by: renovate[bot] <[email protected]> Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Cilium Imagebot <[email protected]>
Signed-off-by: renovate[bot] <[email protected]>
Signed-off-by: renovate[bot] <[email protected]>
This is mainly to address the below CVE GHSA-3mh5-6q8v-25wj Related release: https://github.com/envoyproxy/envoy/releases/tag/v1.27.5 Signed-off-by: Tam Mach <[email protected]>
[ upstream commit 7da6514 ] The firstGlobalAddr in pkg/node tried to pick public IPs over private IPs even after picking by scope. Include this logic in the address sorting and add a test case to check the different sorting predicates. For NodePort pick the first private address if any, otherwise pick the first public address. Fixes: 5342d01 ("datapath/tables: Add Table[NodeAddress]") Signed-off-by: Jussi Maki <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 100e625 ] This prevents possible shenanigans caused by search domains possibly configured on the runner, and propagated to the pods. Signed-off-by: Marco Iorio <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 302604d ] Signed-off-by: Joe Stringer <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 6f0a059 ] GitHub changed the URL for the classic projects that we are currently using to track patch releases. Fix the link. Signed-off-by: Joe Stringer <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 804d5f0 ] When number of concurrent dns requests was moderately high, there was a chance that some of the gorutines would get stuck waiting for response. Contains fix from cilium/dns#10 Signed-off-by: Marcel Zieba <[email protected]> Signed-off-by: Zhichuan Liang <[email protected]>
[ upstream commit c869e6c ] Do not return an error from xds server when the context is cancelled, as this is part of normal operation, and we test for this in server_e2e_test. This resolves a test flake: panic: Fail in goroutine after has completed Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 3cde59c ] The main test goroutine might be completed before checks on the server goroutine are completed, hence cause the below panic issue. This commit is to defer the streamDone channel close to make sure the error check on the stream server is done before returning from the test. We keep the time check on the wait in the end of each test to not stall the tests in case the stream server fails to exit. Panic error ``` panic: Fail in goroutine after Test/ServerSuite/TestRequestStaleNonce has completed ``` Testing was done as per below: ``` $ go test -count 500 -run Test/ServerSuite/TestRequestStaleNonce ./pkg/envoy/xds/... ok github.com/cilium/cilium/pkg/envoy/xds 250.866s ``` Fixes: cilium#31855 Signed-off-by: Tam Mach <[email protected]> Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit df6afbd ] [ backporter's note: moved changes from pkg/envoy/xds/stream_test.go to pkg/envoy/xds/stream.go as v1.15 doesn't have the former file ] Return io.EOF if test channel was closed, rather than returning a nil request. This mimics the behavior of generated gRPC code, which never returns nil request with a nil error. This resolves a test flake with this error logs: time="2024-04-16T08:46:23+02:00" level=error msg="received nil request from xDS stream; stopping xDS stream handling" subsys=xds xdsClientNode="node0~10.0.0.0~node0~bar" xdsStreamID=1 Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Zhichuan Liang <[email protected]>
[ upstream commit d3c1fee ] Speed up tests by eliminating CacheUpdateDelay, as it is generally not needed. When needed, replace with IsCompletedInTimeChecker that waits for upto MaxCompletionDuration before returning, in contrast with IsCompletedChecker that only returns the current state without any wait. This change makes the server_e2e_test tests run >1000x faster. Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 4efe9dd ] TestRequestStaleNonce test code was written with the assumption that no response would be reveived for a request with a stale nonce, and a second SendRequest was done right after with the correct nonce value. This caused two responses to be returned, and the first one could have been with the old version of the resources. Remove this duplicate SendRequest. This resolves test flakes like this: --- FAIL: Test/ServerSuite/TestRequestStaleNonce (0.00s) server_e2e_test.go:784: ... response *discoveryv3.DiscoveryResponse = &discoveryv3.DiscoveryResponse{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), VersionInfo:"3", Resources:[]*anypb.Any{(*anypb.Any)(0x40003a63c0), (*anypb.Any)(0x40003a6410)}, Canary:false, TypeUrl:"type.googleapis.com/envoy.config.v3.DummyConfiguration", Nonce:"3", ControlPlane:(*corev3.ControlPlane)(nil)} ("version_info:\"3\" resources:{[type.googleapis.com/envoy.config.route.v3.RouteConfiguration]:{name:\"resource1\"}} resources:{[type.googleapis.com/envoy.config.route.v3.RouteConfiguration]:{name:\"resource0\"}} type_url:\"type.googleapis.com/envoy.config.v3.DummyConfiguration\" nonce:\"3\"") ... VersionInfo string = "4" ... Resources []protoreflect.ProtoMessage = []protoreflect.ProtoMessage{(*routev3.RouteConfiguration)(0xe45380)} ... Canary bool = false ... TypeUrl string = "type.googleapis.com/envoy.config.v3.DummyConfiguration" Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 75d144c ] Stream timeout is a duration we use in tests to make sure the stream does not stall for too long. In production we do not have such a timeout at all, and in fact the requests are long-lived and responses are only sent when there is something (new) to send. Test stream timeout was 2 seconds, and it would occasionally cause a test flake, especially if debug logging is enabled. This seems to happen due to goroutine scheduling, and for this reason debug logging should not be on for these tests. Bump the test stream timeout to 4 seconds to further reduce the chance of a test flake due to it. Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 350e9d3 ] The goal being to slow down the rollout process, to better highlight possible connection disruption occurring in the meanwhile. At the same time, this also reduces the overall CPU load caused by datapath recompilation, which is a possible additional cause for connection disruption flakiness. Signed-off-by: Marco Iorio <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 7d2505e ] The default IPAM mode is cluster-pool, which gets automatically overwritten by the Cilium CLI to kubernetes when running on kind. However, the default helm value gets restored upon upgrade due to --reset-values, causing confusion and possible issues. Hence, let's explicitly configure it to kubernetes, to prevent changes. Similarly, let's configure a single replica for the operator. Signed-off-by: Marco Iorio <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit a0d7d37 ] So that it gets actually executed. Signed-off-by: Marco Iorio <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 1e28a10 ] Hubble relay is not deployed in this workflow, hence it doesn't make sense to wait for the image availability. Signed-off-by: Marco Iorio <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 0c211e1 ] As it simplifies troubleshooting possible connection disruptions. However, let's configure monitor aggregation to medium (i.e., the maximum, and default value) to avoid the performance penalty due to the relatively high traffic load. Signed-off-by: Marco Iorio <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit aef6814 ] [ backporter's note: minor conflicts in pkg/k8s/apis/cilium.io/const.go ] Having uid in security labels will significantly increase the number of identities, not to mention about high cardinality in metrics. This commit is to add *controller-uid related labels into default exclusion list. Signed-off-by: Tam Mach <[email protected]> Signed-off-by: Zhichuan Liang <[email protected]>
[ upstream commit 9dc89f7 ] Fixes: 5c06c8e ("ci-eks: Add IPsec key rotation tests") Signed-off-by: Marco Iorio <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 9b35bc5 ] bpffs directory paths cannot contain the character ".", thus we must sanitize device names that contain any "." characters. Our solution is to replace "." with "-". This introduces a risk of naming collisions, e.g. "eth.0" and "eth-0", in practice the probability of this happening should be very small. Fixes: cilium#31813 Signed-off-by: Robin Gögge <[email protected]> Signed-off-by: Gray Liang <[email protected]>
[ upstream commit 1d56157 ] [ backporter's notes: replaced the nonroot base image with the root one, to avoid requiring the Helm changes to configure the fsGroup, which could cause issues if users only updated the image version, without a full helm upgrade. ] gops needs to write data (e.g., the PID file) to the file-system, which turned out to be tricky when using scratch as base image, in case the container is then run using a non-root UID. Let's use the most basic version of a distroless image instead, which contains: - ca-certificates - A /etc/passwd entry for a root, nonroot and nobody users - A /tmp directory - tzdata This aligns the clustermesh-apiserver image with the Hubble Relay one, and removes the need for manually importing the CA certificates. The GOPS_CONFIG_DIR is explicitly configured to use a temporary directory, to prevent permission issues depending on the UID configured to run the entrypoint. Finally, we explicitly configure the fsGroup as part of the podSecurityContext, to ensure that mounted files are accessible by the non-root user as well. Signed-off-by: Marco Iorio <[email protected]>
[ upstream commit 8521880 ] Configure the specified clustermesh-apiserver etcd container security context for the etcd-init container as well, to make sure that they always match, and prevent issues caused by the init container creating files that cannot be read/written by the main instance later on. Signed-off-by: Marco Iorio <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.