Fix conversion of non-utf8 sequences to `AnyValue` by Nevay · Pull Request #1253 · open-telemetry/opentelemetry-php

Nevay · 2024-03-08T19:50:26Z

String values which are not valid Unicode sequences SHOULD be converted to AnyValue's bytes_value with the bytes representing the string in the original order and format of the source string.

codecov · 2024-03-08T19:52:06Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.62%. Comparing base (bb07aca) to head (a9b76b2).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##               main    #1253   +/-   ##
=========================================
  Coverage     84.62%   84.62%           
- Complexity     2136     2140    +4     
=========================================
  Files           284      284           
  Lines          6054     6062    +8     
=========================================
+ Hits           5123     5130    +7     
- Misses          931      932    +1

Flag	Coverage Δ
8.0	`84.57% <100.00%> (+<0.01%)`	⬆️
8.1	`84.60% <100.00%> (+<0.01%)`	⬆️
8.2	`84.60% <100.00%> (+<0.01%)`	⬆️
8.3	`84.60% <100.00%> (+<0.01%)`	⬆️
8.4	`84.60% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
src/Contrib/Otlp/AttributesConverter.php	`100.00% <100.00%> (ø)`

... and 1 file with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb07aca...a9b76b2. Read the comment docs.

agoallikmaa · 2024-03-11T03:19:16Z

src/Contrib/Otlp/AttributesConverter.php


+    private static function isUtf8(string $value): bool
+    {
+        return \extension_loaded('mbstring')


Technically this condition might not be necessary since symfony/polyfill-mbstring is pulled in transitively from the SDK. Not sure if it's best practice to rely on that though.

@Nevay should we rely on the polyfill, or leave it as-is. Otherwise, I'm happy to approve and merge.

We should leave it as-is. The polyfill is slower and exports can contain a large number of attributes.
Creating and exporting spans with 10 string attributes (16 bytes ea.; 8 valid and 2 invalid utf8 sequences) to local collector w/ default batch size was around 10% slower with the polyfill compared to the preg_match() fallback (fallback was incorrect until latest push).

Benchmark results for creating AnyValue from a 16 bytes string (polyfill is esp. costly for invalid utf8 sequences due to its iconv fallback):

+------------------------------------+---------+---------+--------+---------+ | subject | memory | mode | rstdev | stdev | +------------------------------------+---------+---------+--------+---------+ | benchCheckEncoding (valid) | 1.959mb | 0.245μs | ±1.95% | 0.005μs | | benchCheckEncoding (invalid-begin) | 1.959mb | 0.248μs | ±1.68% | 0.004μs | | benchCheckEncoding (invalid-end) | 1.959mb | 0.250μs | ±1.94% | 0.005μs | | benchPregMatch (valid) | 1.959mb | 0.307μs | ±2.06% | 0.006μs | | benchPregMatch (invalid-begin) | 1.959mb | 0.262μs | ±1.53% | 0.004μs | | benchPregMatch (invalid-end) | 1.959mb | 0.270μs | ±1.62% | 0.004μs | | benchIsUtf8 (valid) | 1.959mb | 0.250μs | ±1.62% | 0.004μs | | benchIsUtf8 (invalid-begin) | 1.959mb | 0.252μs | ±1.64% | 0.004μs | | benchIsUtf8 (invalid-end) | 1.959mb | 0.253μs | ±1.31% | 0.003μs | | benchPolyfill (valid) | 1.959mb | 0.383μs | ±1.67% | 0.006μs | | benchPolyfill (invalid-begin) | 1.959mb | 0.830μs | ±1.51% | 0.013μs | | benchPolyfill (invalid-end) | 1.959mb | 0.945μs | ±1.51% | 0.014μs | +------------------------------------+---------+---------+--------+---------+

> String values which are not valid Unicode sequences SHOULD be converted to AnyValue's bytes_value with the bytes representing the string in the original order and format of the source string.

Nevay requested a review from a team March 8, 2024 19:50

Nevay force-pushed the fix/non-utf8-sequences branch from f107d6e to 228102c Compare March 8, 2024 19:54

weslenteche approved these changes Mar 9, 2024

View reviewed changes

agoallikmaa approved these changes Mar 11, 2024

View reviewed changes

brettmc approved these changes Mar 12, 2024

View reviewed changes

bobstrecansky approved these changes Mar 13, 2024

View reviewed changes

Fix conversion of non-utf8 sequences to AnyValue

a9b76b2

> String values which are not valid Unicode sequences SHOULD be converted to AnyValue's bytes_value with the bytes representing the string in the original order and format of the source string.

Nevay force-pushed the fix/non-utf8-sequences branch from 228102c to a9b76b2 Compare March 13, 2024 14:00

brettmc merged commit 1753fbe into open-telemetry:main Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix conversion of non-utf8 sequences to `AnyValue`#1253

Fix conversion of non-utf8 sequences to `AnyValue`#1253
brettmc merged 1 commit intoopen-telemetry:mainfrom
Nevay:fix/non-utf8-sequences

Nevay commented Mar 8, 2024

Uh oh!

codecov bot commented Mar 8, 2024 •

edited

Loading

Uh oh!

agoallikmaa Mar 11, 2024

Uh oh!

brettmc Mar 13, 2024

Uh oh!

Nevay Mar 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

Nevay commented Mar 8, 2024

Uh oh!

codecov bot commented Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

agoallikmaa Mar 11, 2024

Choose a reason for hiding this comment

Uh oh!

brettmc Mar 13, 2024

Choose a reason for hiding this comment

Uh oh!

Nevay Mar 13, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Mar 8, 2024 •

edited

Loading