Technical Note on DTrace¶
DTrace was also used on the same code and data that was used to test cPyMemTrace
.
See Test Program and Data [1].
This was traced with DTrace using toolkit/py_flow_malloc_free.d
:
sudo dtrace -s toolkit/py_flow_malloc_free.d -p <PID> -C
Python Builds¶
Python 3.9 was configured and built with DTrace support.
Debug Build¶
configure --with-pydebug --without-pymalloc --with-valgrind --with-dtrace
Note
Since this bypasses Python’s small object allocator (pymalloc
) then every malloc()
and free()
can be
seen by DTrace.
This makes the DTrace logs very large.
Release Build¶
configure --with-dtrace
Baseline: Python 3.9¶
This is using a standard build of Python 3.9 without DTrace support. It establishes a benchmark baseline:
Using time
gives:
real 35.49
user 29.72
sys 2.03
Python 3.9 Release with DTrace support, no Tracing¶
Python 3.9 (release) with DTrace support but not tracing with DTrace:
Using time
gives:
real 49.54
user 35.56
sys 2.45
So a DTrace capable build has roughly a 40% premium in real
time even when not tracing.
Python 3.9 Release with DTrace support, DTrace Tracing¶
Python 3.9 (release) with DTrace support and DTrace running:
Using time
gives:
real 3220.38
user 902.51
sys 1949.83
Note the increase in sys
time caused by DTrace.
This is a x65 increase in runtime over a release build (not tracing) and a x91 increase over the non-DTrace baseline.
DTrace Log File¶
The log file [2] has 243,285 lines of which:
94,882 calls to
malloc()
144,684 calls to
free()
. 74,254 of these are tofree(0x0)
.
Python 3.9 Debug with DTrace support, no Tracing¶
This is running a debug, DTrace capable build:
Using time
gives:
real 148.55
user 139.99
sys 1.93
This is a x3 increase of runtime over a release DTrace capable build. This is typical for CPython debug builds.
Python 3.9 Debug with DTrace support, DTrace Tracing¶
This is running a debug DTrace capable build and tracing with DTrace:
Using time
gives:
real 3520.61
user 1183.36
sys 2127.22
This is a x24 increase in runtime over a debug build not tracing or a x99 increase in a non-DTrace build.
DTrace Log File¶
This has 16m lines of which there are:
8m calls to
malloc()
8m calls to
free()
. 39,000 of these are tofree(0x0)
.
Summary¶
Here is a summary of the performance cost of using different builds and tracing with DTrace:
Task |
|
|
|
|
---|---|---|---|---|
Baseline |
35.5 |
29.7 |
2.03 |
1.0 |
DTrace, no tracing
Python release build using |
49.5 |
35.6 |
2.45 |
x1.4 |
DTrace, trace |
3220 |
903 |
1950 |
x91 |
DTrace, no tracing. Debug, not using |
148 |
134 |
1.93 |
x4.2 |
DTrace, trace |
3520 |
1180 |
2130 |
x99 |
DTrace Log File¶
Piping the DTrace output to a log file gives the following log files for this job.
Build |
Release |
Debug |
Ratio Debug/Release |
---|---|---|---|
Size |
16 Mb |
11,000 Mb |
x68 |
Lines |
243k |
16m |
x68 |
|
94,880 |
8,096,729 |
x85 |
|
144,684 |
8,054,421 |
x56 |
|
74,254 |
38,849 |
x0.52 |
Footnotes