Fix conversion of non-utf8 sequences to AnyValue#1253
Fix conversion of non-utf8 sequences to AnyValue#1253brettmc merged 1 commit intoopen-telemetry:mainfrom
AnyValue#1253Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1253 +/- ##
=========================================
Coverage 84.62% 84.62%
- Complexity 2136 2140 +4
=========================================
Files 284 284
Lines 6054 6062 +8
=========================================
+ Hits 5123 5130 +7
- Misses 931 932 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
f107d6e to
228102c
Compare
|
|
||
| private static function isUtf8(string $value): bool | ||
| { | ||
| return \extension_loaded('mbstring') |
There was a problem hiding this comment.
Technically this condition might not be necessary since symfony/polyfill-mbstring is pulled in transitively from the SDK. Not sure if it's best practice to rely on that though.
There was a problem hiding this comment.
@Nevay should we rely on the polyfill, or leave it as-is. Otherwise, I'm happy to approve and merge.
There was a problem hiding this comment.
We should leave it as-is. The polyfill is slower and exports can contain a large number of attributes.
Creating and exporting spans with 10 string attributes (16 bytes ea.; 8 valid and 2 invalid utf8 sequences) to local collector w/ default batch size was around 10% slower with the polyfill compared to the preg_match() fallback (fallback was incorrect until latest push).
Benchmark results for creating AnyValue from a 16 bytes string (polyfill is esp. costly for invalid utf8 sequences due to its iconv fallback):
+------------------------------------+---------+---------+--------+---------+
| subject | memory | mode | rstdev | stdev |
+------------------------------------+---------+---------+--------+---------+
| benchCheckEncoding (valid) | 1.959mb | 0.245μs | ±1.95% | 0.005μs |
| benchCheckEncoding (invalid-begin) | 1.959mb | 0.248μs | ±1.68% | 0.004μs |
| benchCheckEncoding (invalid-end) | 1.959mb | 0.250μs | ±1.94% | 0.005μs |
| benchPregMatch (valid) | 1.959mb | 0.307μs | ±2.06% | 0.006μs |
| benchPregMatch (invalid-begin) | 1.959mb | 0.262μs | ±1.53% | 0.004μs |
| benchPregMatch (invalid-end) | 1.959mb | 0.270μs | ±1.62% | 0.004μs |
| benchIsUtf8 (valid) | 1.959mb | 0.250μs | ±1.62% | 0.004μs |
| benchIsUtf8 (invalid-begin) | 1.959mb | 0.252μs | ±1.64% | 0.004μs |
| benchIsUtf8 (invalid-end) | 1.959mb | 0.253μs | ±1.31% | 0.003μs |
| benchPolyfill (valid) | 1.959mb | 0.383μs | ±1.67% | 0.006μs |
| benchPolyfill (invalid-begin) | 1.959mb | 0.830μs | ±1.51% | 0.013μs |
| benchPolyfill (invalid-end) | 1.959mb | 0.945μs | ±1.51% | 0.014μs |
+------------------------------------+---------+---------+--------+---------+
> String values which are not valid Unicode sequences SHOULD be converted to AnyValue's bytes_value with the bytes representing the string in the original order and format of the source string.
228102c to
a9b76b2
Compare
Converting to AnyValue: