Skip to content

Conversation

@mabdinur
Copy link
Contributor

@mabdinur mabdinur commented Sep 25, 2023

Description

This PR ensures the instrumentation telemetry client is compatible with python3.12's threading module by:

  • Ensuring the telemetry writer thread is only started once per application (and app-started is sent once).
  • Ensuring the telemetry writer thread is disabled when an application is shutdown.
  • Ensuring telemetry metrics are queued SpanAggregator.shutdown without restarting the telemetry writer thread.

Motivation

The following change failed to support the telemetry client in python3.12: #6859. This PR will hopefully fix this 🤞.

Reproduction for python3.12 runtime errors

docker run --rm -it python:3.12.0rc3 bash
root@a1f3c1d307ec:/# pip install ddtrace==2.0.0rc2
root@a1f3c1d307ec:/# python -c "import ddtrace; _ = ddtrace.tracer.trace('foo'); raise Exception"

Output

Traceback (most recent call last):
  File "<string>", line 1, in <module>
Exception
Exception ignored in atexit callback: <bound method Tracer._atexit of <ddtrace.tracer.Tracer object at 0xffffbd967260>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/ddtrace/tracer.py", line 293, in _atexit
    self.shutdown(timeout=self.SHUTDOWN_TIMEOUT)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/tracer.py", line 1024, in shutdown
    processor.shutdown(timeout)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/processor/trace.py", line 270, in shutdown
    self._queue_span_count_metrics("spans_created", "integration_name", None)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/processor/trace.py", line 291, in _queue_span_count_metrics
    telemetry_writer.add_count_metric(
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/telemetry/writer.py", line 514, in add_count_metric
    if self.status == ServiceStatus.RUNNING or self.enable():
                                               ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/telemetry/writer.py", line 218, in enable
    self.start()
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/service.py", line 58, in start
    self._start_service(*args, **kwargs)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/periodic.py", line 135, in _start_service
    self._worker.start()
  File "/usr/local/lib/python3.12/threading.py", line 971, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't create new thread at interpreter shutdown

Risk

This PR reverts an optimization that ensured telemetry span creation metrics were queued in batches of 100. Without this optimization we can expect a 5-10% increase to span creation and span finish.

Checklist

  • Change(s) are motivated and described in the PR description.
  • Testing strategy is described if automated tests are not included in the PR.
  • Risk is outlined (performance impact, potential for breakage, maintainability, etc).
  • Change is maintainable (easy to change, telemetry, documentation).
  • Library release note guidelines are followed. If no release note is required, add label changelog/no-changelog.
  • Documentation is included (in-code, generated user docs, public corp docs).
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Title is accurate.
  • No unnecessary changes are introduced.
  • Description motivates each change.
  • Avoids breaking API changes unless absolutely necessary.
  • Testing strategy adequately addresses listed risk(s).
  • Change is maintainable (easy to change, telemetry, documentation).
  • Release note makes sense to a user of the library.
  • Reviewer has explicitly acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment.
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy
  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

@mabdinur mabdinur marked this pull request as ready for review September 25, 2023 21:07
@mabdinur mabdinur requested a review from a team as a code owner September 25, 2023 21:07
emmettbutler
emmettbutler previously approved these changes Sep 25, 2023
@emmettbutler emmettbutler added changelog/no-changelog A changelog entry is not required for this PR. backport 2.0 labels Sep 25, 2023
@mabdinur mabdinur enabled auto-merge (squash) September 25, 2023 21:24
ZStriker19
ZStriker19 previously approved these changes Sep 25, 2023
@mabdinur mabdinur force-pushed the munir/telemetry-py3.12-shutdown-exception branch from f3111b2 to b76c158 Compare September 25, 2023 23:21
@mabdinur mabdinur requested a review from ZStriker19 September 25, 2023 23:41
@DataDog DataDog deleted a comment from pr-commenter bot Sep 26, 2023
@majorgreys majorgreys added this to the v2.0.0 milestone Sep 26, 2023
@emmettbutler emmettbutler self-requested a review September 26, 2023 15:49
emmettbutler
emmettbutler previously approved these changes Sep 26, 2023
@pr-commenter
Copy link

pr-commenter bot commented Sep 26, 2023

Benchmarks

Benchmark execution time: 2023-09-26 17:14:27

Comparing candidate commit 28adebd in PR branch munir/telemetry-py3.12-shutdown-exception with baseline commit 7ccb806 in branch 2.x.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 90 metrics, 0 unstable metrics.

@mabdinur mabdinur merged commit 065784b into 2.x Sep 26, 2023
@mabdinur mabdinur deleted the munir/telemetry-py3.12-shutdown-exception branch September 26, 2023 18:11
github-actions bot pushed a commit that referenced this pull request Sep 26, 2023
## Description

This PR ensures the instrumentation telemetry client is compatible with
python3.12's threading module by:
- Removing `telemetry_writer.add_count_metric` from
`SpanAggregator.shutdown`. This ensures telemetry events are not queued
on tracer shutdown.
- Ensuring the telemetry writer thread is only started once per
application.
- Ensures the telemetry writer thread is disabled when an application is
shutdown.

## Motivation

The following change failed to support the telemetry client in
python3.12: #6859. This PR
will hopefully fix this 🤞.

### Reproduction for python3.12 runtime errors
```
docker run --rm -it python:3.12.0rc3 bash
root@a1f3c1d307ec:/# pip install ddtrace==2.0.0rc2
root@a1f3c1d307ec:/# python -c "import ddtrace; _ = ddtrace.tracer.trace('foo'); raise Exception"
```

### Output
```
Traceback (most recent call last):
  File "<string>", line 1, in <module>
Exception
Exception ignored in atexit callback: <bound method Tracer._atexit of <ddtrace.tracer.Tracer object at 0xffffbd967260>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/ddtrace/tracer.py", line 293, in _atexit
    self.shutdown(timeout=self.SHUTDOWN_TIMEOUT)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/tracer.py", line 1024, in shutdown
    processor.shutdown(timeout)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/processor/trace.py", line 270, in shutdown
    self._queue_span_count_metrics("spans_created", "integration_name", None)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/processor/trace.py", line 291, in _queue_span_count_metrics
    telemetry_writer.add_count_metric(
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/telemetry/writer.py", line 514, in add_count_metric
    if self.status == ServiceStatus.RUNNING or self.enable():
                                               ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/telemetry/writer.py", line 218, in enable
    self.start()
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/service.py", line 58, in start
    self._start_service(*args, **kwargs)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/periodic.py", line 135, in _start_service
    self._worker.start()
  File "/usr/local/lib/python3.12/threading.py", line 971, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't create new thread at interpreter shutdown
```

## Risk

This PR reverts an optimization that ensured telemetry span creation
metrics were queued in batches of 100. Without this optimization we can
expect a 2-5% increase to span creation and span finish.

## Checklist

- [x] Change(s) are motivated and described in the PR description.
- [x] Testing strategy is described if automated tests are not included
in the PR.
- [x] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

(cherry picked from commit 065784b)
@github-actions github-actions bot modified the milestones: v2.0.0, v2.1.0 Sep 26, 2023
mabdinur added a commit that referenced this pull request Sep 26, 2023
Backport 065784b from #7043 to 2.0.

## Description

This PR ensures the instrumentation telemetry client is compatible with
python3.12's threading module by:
- Ensuring the telemetry writer thread is only started once per
application.
- Ensures the telemetry writer thread is disabled when an application is
shutdown.
- Ensures telemetry metrics are queued `SpanAggregator.shutdown` without
restarting the telemetry writer thread.

## Motivation

The following change failed to support the telemetry client in
python3.12: #6859. This PR
will hopefully fix this 🤞.

### Reproduction for python3.12 runtime errors
```
docker run --rm -it python:3.12.0rc3 bash
root@a1f3c1d307ec:/# pip install ddtrace==2.0.0rc2
root@a1f3c1d307ec:/# python -c "import ddtrace; _ = ddtrace.tracer.trace('foo'); raise Exception"
```

### Output
```
Traceback (most recent call last):
  File "<string>", line 1, in <module>
Exception
Exception ignored in atexit callback: <bound method Tracer._atexit of <ddtrace.tracer.Tracer object at 0xffffbd967260>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/ddtrace/tracer.py", line 293, in _atexit
    self.shutdown(timeout=self.SHUTDOWN_TIMEOUT)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/tracer.py", line 1024, in shutdown
    processor.shutdown(timeout)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/processor/trace.py", line 270, in shutdown
    self._queue_span_count_metrics("spans_created", "integration_name", None)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/processor/trace.py", line 291, in _queue_span_count_metrics
    telemetry_writer.add_count_metric(
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/telemetry/writer.py", line 514, in add_count_metric
    if self.status == ServiceStatus.RUNNING or self.enable():
                                               ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/telemetry/writer.py", line 218, in enable
    self.start()
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/service.py", line 58, in start
    self._start_service(*args, **kwargs)
  File "/usr/local/lib/python3.12/site-packages/ddtrace/internal/periodic.py", line 135, in _start_service
    self._worker.start()
  File "/usr/local/lib/python3.12/threading.py", line 971, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't create new thread at interpreter shutdown
```

## Risk

~~This PR reverts an optimization that ensured telemetry span creation
metrics were queued in batches of 100. Without this optimization we can
expect a 5-10% increase to span creation and span finish.~~


## Checklist

- [x] Change(s) are motivated and described in the PR description.
- [x] Testing strategy is described if automated tests are not included
in the PR.
- [x] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

Co-authored-by: Munir Abdinur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/no-changelog A changelog entry is not required for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants