Technical Note on DTrace

DTrace was also used on the same code and data that was used to test cPyMemTrace. See Test Program and Data 1. This was traced with DTrace using toolkit/py_flow_malloc_free.d:

sudo dtrace -s toolkit/py_flow_malloc_free.d -p <PID> -C

Python Builds

Python 3.9 was configured and built with DTrace support.

Debug Build

configure --with-pydebug --without-pymalloc --with-valgrind --with-dtrace

Note

Since this bypasses Python’s small object allocator (pymalloc) then every malloc() and free() can be seen by DTrace. This makes the DTrace logs very large.

Release Build

configure --with-dtrace

Baseline: Python 3.9

This is using a standard build of Python 3.9 without DTrace support. It establishes a benchmark baseline:

Basic Python 3.9 (release) performance.

Using time gives:

real        35.49
user        29.72
sys          2.03

Python 3.9 Release with DTrace support, no Tracing

Python 3.9 (release) with DTrace support but not tracing with DTrace:

Python 3.9 (release) with DTrace capability.

Using time gives:

real        49.54
user        35.56
sys          2.45

So a DTrace capable build has roughly a 40% premium in real time even when not tracing.

Python 3.9 Release with DTrace support, DTrace Tracing

Python 3.9 (release) with DTrace support and DTrace running:

Python 3.9 (release) with DTrace capability, DTrace runnning.

Using time gives:

real      3220.38
user       902.51
sys       1949.83

Note the increase in sys time caused by DTrace. This is a x65 increase in runtime over a release build (not tracing) and a x91 increase over the non-DTrace baseline.

DTrace Log File

The log file 2 has 243,285 lines of which:

  • 94,882 calls to malloc()

  • 144,684 calls to free(). 74,254 of these are to free(0x0).

Python 3.9 Debug with DTrace support, no Tracing

This is running a debug, DTrace capable build:

Python 3.9 (debug) with DTrace capability, DTrace not tracing.

Using time gives:

real       148.55
user       139.99
sys          1.93

This is a x3 increase of runtime over a release DTrace capable build. This is typical for CPython debug builds.

Python 3.9 Debug with DTrace support, DTrace Tracing

This is running a debug DTrace capable build and tracing with DTrace:

Python 3.9 (debug) with DTrace capability, DTrace tracing.

Using time gives:

real      3520.61
user      1183.36
sys       2127.22

This is a x24 increase in runtime over a debug build not tracing or a x99 increase in a non-DTrace build.

DTrace Log File

This has 16m lines of which there are:

  • 8m calls to malloc()

  • 8m calls to free(). 39,000 of these are to free(0x0).

Summary

Here is a summary of the performance cost of using different builds and tracing with DTrace:

Task

real

user

sys

real ratio

Baseline

35.5

29.7

2.03

1.0

DTrace, no tracing Python release build using pymalloc.

49.5

35.6

2.45

x1.4

DTrace, trace malloc(), free(). Python release build using pymalloc.

3220

903

1950

x91

DTrace, no tracing. Debug, not using pymalloc

148

134

1.93

x4.2

DTrace, trace malloc(), free(). Python debug build, not using pymalloc.

3520

1180

2130

x99

DTrace Log File

Piping the DTrace output to a log file gives the following log files for this job.

Build

Release

Debug

Ratio Debug/Release

Size

16 Mb

11,000 Mb

x68

Lines

243k

16m

x68

malloc() entries

94,880

8,096,729

x85

free() entries

144,684

8,054,421

x56

free(0x0) entries

74,254

38,849

x0.52

Footnotes

1

This uses the LASToHTML from the TotalDepth project.

2

Removing garbage from the DTrace log can be done with grep -o "[[:print:][:space:]]*" <LOG>