Commit 4344049
authored
[NDMII-3603] Dynamic batch size for SNMP check (#41689)
### What does this PR do?
This PR adds implements a dynamic batch size for the SNMP check, it works as follows:
At start, the batch size is the configured batch size (or default if not configured).
For a fetch in a SNMP check:
- If the fetch fails:
- The batch size is divided by a [`decreaseFactor`](https://github.com/DataDog/datadog-agent/pull/41689/files#diff-7e1c5e2d2bee2b336f1316cf0bba6c2f50d17461d9ff6b8997d4c8ef32ed202aR16), we retry the fetch with the new batch size in the same check. If it still fails, we decrease the batch size again.. until it reaches 1.
- If the fetch succeeds:
- The batch size is increased by an [`increaseValue`](https://github.com/DataDog/datadog-agent/pull/41689/files#diff-7e1c5e2d2bee2b336f1316cf0bba6c2f50d17461d9ff6b8997d4c8ef32ed202aR16) for the next check. The increased batch size cannot be more than the configured batch size.
We also keep a map that associates the number of fetch failures count by batch size. This map is used such that if a batch size have failed too much time ([`maxFailuresPerWindow`](https://github.com/DataDog/datadog-agent/pull/41689/files#diff-7e1c5e2d2bee2b336f1316cf0bba6c2f50d17461d9ff6b8997d4c8ef32ed202aR19)) during a certain time window ([failuresWindowDuration](https://github.com/DataDog/datadog-agent/pull/41689/files#diff-7e1c5e2d2bee2b336f1316cf0bba6c2f50d17461d9ff6b8997d4c8ef32ed202aR18)), we do not increase to this batch size, and keep the current one.
I think this is useful to not always retry batch sizes that will always fail for some devices, but also do retry it after the time window in case some devices didn't have enough capacity only for temporary moment.
I also chose to separate the batch size for each SNMP operation (Get, GetBulk, GetNext), because I think it's a possibility that a certain batch size will work for one operation but not for an other one (I do not have proof of that), let me know if you think keeping a single batch size for all operation would be better.
### Motivation
### Describe how you validated your changes
- [Unit tests](https://github.com/DataDog/datadog-agent/pull/41689/files#diff-1e048e3bd44ca70b6119fbcd02b2890d0eeb568c7ec448e557e227c289ddb933R1004) with multiple fetch iterations.
- Did a QA with a Python script that set up a SNMPSim device with custom max batch size for each SNMP operation, and it worked as expected.
### Additional Notes1 parent 48e764c commit 4344049
10 files changed
Lines changed: 1178 additions & 46 deletions
File tree
- pkg/collector/corechecks/snmp
- internal
- devicecheck
- fetch
Lines changed: 4 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
142 | 142 | | |
143 | 143 | | |
144 | 144 | | |
| 145 | + | |
145 | 146 | | |
146 | 147 | | |
147 | 148 | | |
| |||
173 | 174 | | |
174 | 175 | | |
175 | 176 | | |
| 177 | + | |
176 | 178 | | |
177 | 179 | | |
178 | 180 | | |
| |||
390 | 392 | | |
391 | 393 | | |
392 | 394 | | |
393 | | - | |
394 | | - | |
| 395 | + | |
| 396 | + | |
395 | 397 | | |
396 | 398 | | |
397 | 399 | | |
| |||
Lines changed: 122 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
0 commit comments