43
44
the memory isn't ever actually lost -- a pointer remains to it --
44
45
but it's not in use. Programs that have leaks like this can
45
46
unnecessarily increase the amount of memory they are using over
50
<sect2 id="ms-manual.heapprof"
51
xreflabel="Why Use a Heap Profiler?">
52
<title>Why Use a Heap Profiler?</title>
54
<para>Everybody knows how useful time profilers are for speeding
55
up programs. They are particularly useful because people are
56
notoriously bad at predicting where are the bottlenecks in their
59
<para>But the story is different for heap profilers. Some
60
programming languages, particularly lazy functional languages
61
like <ulink url="http://www.haskell.org">Haskell</ulink>, have
62
quite sophisticated heap profilers. But there are few tools as
63
powerful for profiling C and C++ programs.</para>
65
<para>Why is this? Maybe it's because C and C++ programmers must
66
think that they know where the memory is being allocated. After
67
all, you can see all the calls to
68
<computeroutput>malloc()</computeroutput> and
69
<computeroutput>new</computeroutput> and
70
<computeroutput>new[]</computeroutput>, right? But, in a big
71
program, do you really know which heap allocations are being
72
executed, how many times, and how large each allocation is? Can
73
you give even a vague estimate of the memory footprint for your
74
program? Do you know this for all the libraries your program
75
uses? What about administration bytes required by the heap
76
allocator to track heap blocks -- have you thought about them?
77
What about the stack? If you are unsure about any of these
78
things, maybe you should think about heap profiling.</para>
80
<para>Massif can tell you these things.</para>
82
<para>Or maybe it's because it's relatively easy to add basic
83
heap profiling functionality into a program, to tell you how many
84
bytes you have allocated for certain objects, or similar. But
85
this information might only be simple like total counts for the
86
whole program's execution. What about space usage at different
87
points in the program's execution, for example? And
88
reimplementing heap profiling code for each project is a
91
<para>Massif can save you this effort.</para>
47
time. Massif can help identify these leaks.</para>
49
<para>Importantly, Massif tells you not only how much heap memory your
50
program is using, it also gives very detailed information that indicates
51
which parts of your program are responsible for allocating the heap memory.
99
57
<sect1 id="ms-manual.using" xreflabel="Using Massif">
100
58
<title>Using Massif</title>
103
<sect2 id="ms-manual.overview" xreflabel="Overview">
104
<title>Overview</title>
106
<para>First off, as for normal Valgrind use, you probably want to
107
compile with debugging info (the
108
<computeroutput>-g</computeroutput> flag). But, as opposed to
109
Memcheck, you probably <command>do</command> want to turn
110
optimisation on, since you should profile your program as it will
111
be normally run.</para>
113
<para>Then, run your program with <computeroutput>valgrind
114
--tool=massif</computeroutput> in front of the normal command
115
line invocation. When the program finishes, Massif will print
116
summary space statistics. It also creates a graph representing
117
the program's heap usage in a file called
118
<filename>massif.pid.ps</filename>, which can be read by any
119
PostScript viewer, such as Ghostview.</para>
121
<para>It also puts detailed information about heap consumption in
122
a file <filename>massif.pid.txt</filename> (text format) or
123
<filename>massif.pid.html</filename> (HTML format), where
124
<emphasis>pid</emphasis> is the program's process id.</para>
129
<sect2 id="ms-manual.basicresults" xreflabel="Basic Results of Profiling">
130
<title>Basic Results of Profiling</title>
132
<para>To gather heap profiling information about the program
61
<para>First off, as for the other Valgrind tools, you should compile with
62
debugging info (the <computeroutput>-g</computeroutput> flag). It shouldn't
63
matter much what optimisation level you compile your program with, as this
64
is unlikely to affect the heap memory usage.</para>
66
<para>Then, to gather heap profiling information about the program
133
67
<computeroutput>prog</computeroutput>, type:</para>
135
% valgrind --tool=massif prog]]></screen>
137
<para>The program will execute (slowly). Upon completion,
138
summary statistics that look like this will be printed:</para>
139
<programlisting><![CDATA[
140
==27519== Total spacetime: 2,258,106 ms.B
141
==27519== heap: 24.0%
142
==27519== heap admin: 2.2%
143
==27519== stack(s): 73.7%]]></programlisting>
145
<para>All measurements are done in
146
<emphasis>spacetime</emphasis>, i.e. space (in bytes) multiplied
147
by time (in milliseconds). Note that because Massif slows a
148
program down a lot, the actual spacetime figure is fairly
149
meaningless; it's the relative values that are
152
<para>Which entries you see in the breakdown depends on the
153
command line options given. The above example measures all the
154
possible parts of memory:</para>
157
<listitem><para>Heap: number of words allocated on the heap, via
158
<computeroutput>malloc()</computeroutput>,
159
<computeroutput>new</computeroutput> and
160
<computeroutput>new[]</computeroutput>.</para>
163
<para>Heap admin: each heap block allocated requires some
164
administration data, which lets the allocator track certain
165
things about the block. It is easy to forget about this, and
166
if your program allocates lots of small blocks, it can add
167
up. This value is an estimate of the space required for this
168
administration data.</para>
171
<para>Stack(s): the spacetime used by the programs' stack(s).
172
(Threaded programs can have multiple stacks.) This includes
173
signal handler stacks.</para>
180
<sect2 id="ms-manual.graphs" xreflabel="Spacetime Graphs">
181
<title>Spacetime Graphs</title>
183
<para>As well as printing summary information, Massif also
184
creates a file representing a spacetime graph,
185
<filename>massif.pid.hp</filename>. It will produce a file
186
called <filename>massif.pid.ps</filename>, which can be viewed in
187
a PostScript viewer.</para>
189
<para>Massif uses a program called
190
<computeroutput>hp2ps</computeroutput> to convert the raw data
191
into the PostScript graph. It's distributed with Massif, but
192
came originally from the
193
<ulink url="http://www.haskell.org/ghc/">Glasgow Haskell
194
Compiler</ulink>. You shouldn't need to worry about this at all.
195
However, if the graph creation fails for any reason, Massif will
196
tell you, and will leave behind a file named
197
<filename>massif.pid.hp</filename>, containing the raw heap
198
profiling data.</para>
200
<para>Here's an example graph:</para>
201
<mediaobject id="spacetime-graph">
203
<imagedata fileref="images/massif-graph-sm.png" format="PNG"/>
206
<phrase>Spacetime Graph</phrase>
210
<para>The graph is broken into several bands. Most bands
211
represent a single line of your program that does some heap
212
allocation; each such band represents all the allocations and
213
deallocations done from that line. Up to twenty bands are shown;
214
less significant allocation sites are merged into "other" and/or
215
"OTHER" bands. The accompanying text/HTML file produced by
216
Massif has more detail about these heap allocation bands. Then
217
there are single bands for the stack(s) and heap admin
222
<para>it's the height of a band that's important. Don't let the
223
ups and downs caused by other bands confuse you. For example,
224
the <computeroutput>read_alias_file</computeroutput> band in the
225
example has the same height all the time it's in existence.</para>
228
<para>The triangles on the x-axis show each point at which a
229
memory census was taken. These aren't necessarily evenly spread;
230
Massif only takes a census when memory is allocated or
231
deallocated. The time on the x-axis is wallclock time, which is
232
not ideal because you can get different graphs for different
233
executions of the same program, due to random OS delays. But
234
it's not too bad, and it becomes less of a problem the longer a
237
<para>Massif takes censuses at an appropriate timescale; censuses
238
take place less frequently as the program runs for longer. There
239
is no point having more than 100-200 censuses on a single
242
<para>The graphs give a good overview of where your program's
243
space use comes from, and how that varies over time. The
244
accompanying text/HTML file gives a lot more information about
253
<sect1 id="ms-manual.heapdetails"
254
xreflabel="Details of Heap Allocations">
255
<title>Details of Heap Allocations</title>
257
<para>The text/HTML file contains information to help interpret
258
the heap bands of the graph. It also contains a lot of extra
259
information about heap allocations that you don't see in the
263
<para>Here's part of the information that accompanies the above
267
<literallayout>== 0 ===========================</literallayout>
269
<para>Heap allocation functions accounted for 50.8% of measured
272
<para>Called from:</para>
274
<listitem id="a401767D1"><para>
275
<ulink url="#b401767D1">22.1%</ulink>: 0x401767D0:
276
_nl_intern_locale_data (in /lib/i686/libc-2.3.2.so)</para>
278
<listitem id="a4017C394"><para>
279
<ulink url="#b4017C394">8.6%</ulink>: 0x4017C393:
280
read_alias_file (in /lib/i686/libc-2.3.2.so)</para>
283
<para>... ... <emphasis>(several entries omitted)</emphasis></para>
286
<para>and 6 other insignificant places</para>
291
<para>The first part shows the total spacetime due to heap
292
allocations, and the places in the program where most memory was
293
allocated (Nb: if this program had been compiled with
294
<computeroutput>-g</computeroutput>, actual line numbers would be
295
given). These places are sorted, from most significant to least,
296
and correspond to the bands seen in the graph. Insignificant
297
sites (accounting for less than 0.5% of total spacetime) are
300
<para>That alone can be useful, but often isn't enough. What if
301
one of these functions was called from several different places
302
in the program? Which one of these is responsible for most of
304
<computeroutput>_nl_intern_locale_data()</computeroutput>, this
305
question is answered by clicking on the
306
<ulink url="#b401767D1">22.1%</ulink> link, which takes us to the
307
following part of the file:</para>
309
<blockquote id="b401767D1">
310
<literallayout>== 1 ===========================</literallayout>
312
<para>Context accounted for <ulink url="#a401767D1">22.1%</ulink>
313
of measured spacetime</para>
315
<para><computeroutput> 0x401767D0: _nl_intern_locale_data (in
316
/lib/i686/libc-2.3.2.so)</computeroutput></para>
318
<para>Called from:</para>
320
<listitem id="a40176F96"><para>
321
<ulink url="#b40176F96">22.1%</ulink>: 0x40176F95:
322
_nl_load_locale_from_archive (in
323
/lib/i686/libc-2.3.2.so)</para>
328
<para>At this level, we can see all the places from which
329
<computeroutput>_nl_load_locale_from_archive()</computeroutput>
330
was called such that it allocated memory at 0x401767D0. (We can
331
click on the top <ulink url="#a40176F96">22.1%</ulink> link to go back
332
to the parent entry.) At this level, we have moved beyond the
333
information presented in the graph. In this case, it is only
334
called from one place. We can again follow the link for more
335
detail, moving to the following part of the file.</para>
338
<literallayout>== 2 ===========================</literallayout>
339
<para id="b40176F96">
340
Context accounted for <ulink url="#a40176F96">22.1%</ulink> of
341
measured spacetime</para>
343
<para><computeroutput> 0x401767D0: _nl_intern_locale_data (in
344
/lib/i686/libc-2.3.2.so)</computeroutput> <computeroutput>
345
0x40176F95: _nl_load_locale_from_archive (in
346
/lib/i686/libc-2.3.2.so)</computeroutput></para>
348
<para>Called from:</para>
350
<listitem id="a40176185">
351
<para>22.1%: 0x40176184: _nl_find_locale (in
352
/lib/i686/libc-2.3.2.so)</para>
357
<para>In this way we can dig deeper into the call stack, to work
358
out exactly what sequence of calls led to some memory being
359
allocated. At this point, with a call depth of 3, the
360
information runs out (thus the address of the child entry,
361
0x40176184, isn't a link). We could rerun the program with a
362
greater <computeroutput>--depth</computeroutput> value if we
363
wanted more information.</para>
365
<para>Sometimes you will get a code location like this:</para>
366
<programlisting><![CDATA[
367
30.8% : 0xFFFFFFFF: ???]]></programlisting>
369
<para>The code address isn't really 0xFFFFFFFF -- that's
370
impossible. This is what Massif does when it can't work out what
371
the real code address is.</para>
373
<para>Massif produces this information in a plain text file by
374
default, or HTML with the
375
<computeroutput>--format=html</computeroutput> option. The plain
376
text version obviously doesn't have the links, but a similar
377
effect can be achieved by searching on the code addresses. (In
378
Vim, the '*' and '#' searches are ideal for this.)</para>
381
<sect2 id="ms-manual.accuracy" xreflabel="Accuracy">
382
<title>Accuracy</title>
384
<para>The information should be pretty accurate. Some
385
approximations made might cause some allocation contexts to be
386
attributed with less memory than they actually allocated, but the
387
amounts should be miniscule.</para>
389
<para>The heap admin spacetime figure is an approximation, as
390
described above. If anyone knows how to improve its accuracy,
391
please let us know.</para>
69
% valgrind --tool=massif prog
72
<para>The program will execute (slowly). Upon completion, no summary
73
statistics are printed to Valgrind's commentary; all of Massif's profiling
74
data is written to a file. By default, this file is called
75
<filename>massif.out.<pid></filename>, where
76
<filename><pid></filename> is the process ID.</para>
78
<para>To see the information gathered by Massif in an easy-to-read form, use
79
the ms_print script. If the output file's name is
80
<filename>massif.out.12345</filename>, type:</para>
82
% ms_print massif.out.12345]]></screen>
84
<para>ms_print will produce (a) a graph showing the memory consumption over
85
the program's execution, and (b) detailed information about the responsible
86
allocation sites at various points in the program, including the point of
87
peak memory allocation. The use of a separate script for presenting the
88
results is deliberate: it separates the data gathering from its
89
presentation, and means that new methods of presenting the data can be added in
92
<sect2 id="ms-manual.anexample" xreflabel="An Example">
93
<title>An Example Program</title>
95
<para>An example will make things clear. Consider the following C program
96
(annotated with line numbers) which allocates a number of different blocks
100
1 #include <stdlib.h>
118
19 for (i = 0; i < 10; i++) {
119
20 a[i] = malloc(1000);
126
27 for (i = 0; i < 10; i++) {
137
<sect2 id="ms-manual.theoutputpreamble" xreflabel="The Output Preamble">
138
<title>The Output Preamble</title>
140
<para>After running this program under Massif, the first part of ms_print's
141
output contains a preamble which just states how the program, Massif and
142
ms_print were each invoked:</para>
145
--------------------------------------------------------------------------------
147
Massif arguments: (none)
148
ms_print arguments: massif.out.12797
149
--------------------------------------------------------------------------------
155
<sect2 id="ms-manual.theoutputgraph" xreflabel="The Output Graph">
156
<title>The Output Graph</title>
158
<para>The next part is the graph that shows how memory consumption occurred
159
as the program executed:</para>
183
0 +----------------------------------------------------------------------->ki
186
Number of snapshots: 25
187
Detailed snapshots: [9, 14 (peak), 24]
190
<para>Why is most of the graph empty, with only a couple of bars at the very
191
end? By default, Massif uses "instructions executed" as the unit of time.
192
For very short-run programs such as the example, most of the executed
193
instructions involve the loading and dynamic linking of the program. The
194
execution of <computeroutput>main</computeroutput> (and thus the heap
195
allocations) only occur at the very end. For a short-running program like
196
this, we can use the <computeroutput>--time-unit=B</computeroutput> option
197
to specify that we want the time unit to instead be the number of bytes
198
allocated/deallocated on the heap and stack(s).</para>
200
<para>If we re-run the program under Massif with this option, and then
201
re-run ms_print, we get this more useful graph:</para>
213
| : : # : : : : : : : .
214
| : : # : : : : : : : : .
215
| : : : # : : : : : : : : : ,
216
| @ : : : # : : : : : : : : : @
217
| : @ : : : # : : : : : : : : : @
218
| : : @ : : : # : : : : : : : : : @
219
| : : : @ : : : # : : : : : : : : : @
220
| : : : : @ : : : # : : : : : : : : : @
221
| : : : : : @ : : : # : : : : : : : : : @
222
| : : : : : : @ : : : # : : : : : : : : : @
223
| : : : : : : : @ : : : # : : : : : : : : : @
224
| : : : : : : : : @ : : : # : : : : : : : : : @
225
0 +----------------------------------------------------------------------->KB
228
Number of snapshots: 25
229
Detailed snapshots: [9, 14 (peak), 24]
232
<para>Each vertical bar represents a snapshot, i.e. a measurement of the
233
memory usage at a certain point in time. The text at the bottom show that
234
25 snapshots were taken for this program, which is one per heap
235
allocation/deallocation, plus a couple of extras. Massif starts by taking
236
snapshots for every heap allocation/deallocation, but as a program runs for
237
longer, it takes snapshots less frequently. It also discards older
238
snapshots as the program goes on; when it reaches the maximum number of
239
snapshots (100 by default, although changeable with the
240
<computeroutput>--max-snapshots</computeroutput> option) half of them are
241
deleted. This means that a reasonable number of snapshots are always
244
<para>Most snapshots are <emphasis>normal</emphasis>, and only basic
245
information is recorded for them. Normal snapshots are represented in the
246
graph by bars consisting of ':' and '.' characters.</para>
248
<para>Some snapshots are <emphasis>detailed</emphasis>. Information about
249
where allocations happened are recorded for these snapshots, as we will see
250
shortly. Detailed snapshots are represented in the graph by bars consisting
251
of '@' and ',' characters. The text at the bottom show that 3 detailed
252
snapshots were taken for this program (snapshots 9, 14 and 24). By default,
253
every 10th snapshot is detailed, although this can be changed via the
254
<computeroutput>--detailed-freq</computeroutput> option.</para>
256
<para>Finally, there is at most one <emphasis>peak</emphasis> snapshot. The
257
peak snapshot is a detailed snapshot, and records the point where memory
258
consumption was greatest. The peak snapshot is represented in the graph by
259
a bar consisting of '#' and ',' characters. The text at the bottom shows
260
that snapshot 14 was the peak. Note that for tiny programs that never
261
deallocate heap memory, Massif will not record a peak snapshot.</para>
263
<para>Some more details about the peak: the peak is determined by looking
264
at every allocation, i.e. it is <emphasis>not</emphasis> just the peak among
265
the regular snapshots. However, recording the true peak is expensive, and
266
so by default Massif records a peak whose size is within 1% of the size of
267
the true peak. See the description of the
268
<computeroutput>--peak-inaccuracy</computeroutput> option below for more
271
<para>The following graph is from an execution of Konqueror, the KDE web
272
browser. It shows what graphs for larger programs look like.</para>
282
| : :@ :@@@ :: :@@#::
283
| ,: :@ :@@@ :: :@@#::
284
| ,:@: :@ :@@@ :: :@@#::.
285
| @@:@: :@ :@@@ :: :@@#:::
286
| ,,: .:: . , .::@@:@: :@ :@@@ :: :@@#:::
287
| .:@@: .: ::: ::: @ :::@@:@: :@ :@@@ :: :@@#:::
288
| ,: ::@@: ::: ::::::: @ :::@@:@: :@ :@@@ :: :@@#:::
289
| @: ::@@: ::: ::::::: @ :::@@:@: :@ :@@@ :: :@@#::.
290
| @: ::@@: ::: ::::::: @ :::@@:@: :@ :@@@ :: :@@#:::
291
| , @: ::@@:: ::: ::::::: @ :::@@:@: :@ :@@@ :: :@@#:::
292
| ::@ @: ::@@:: ::: ::::::: @ :::@@:@: :@ :@@@ :: :@@#:::
293
| , :::::@ @: ::@@:: ::: ::::::: @ :::@@:@: :@ :@@@ :: :@@#:::
294
| ..@ :::::@ @: ::@@:: ::: ::::::: @ :::@@:@: :@ :@@@ :: :@@#:::
295
0 +----------------------------------------------------------------------->Mi
298
Number of snapshots: 63
299
Detailed snapshots: [3, 4, 10, 11, 15, 16, 29, 33, 34, 36, 39, 41,
300
42, 43, 44, 49, 50, 51, 53, 55, 56, 57 (peak)]
303
<para>Note that the larger size units are KB, MB, GB, etc. As is typical
304
for memory measurements, these are based on a multiplier of 1024, rather
305
than the standard SI multiplier of 1000. Strictly speaking, they should be
306
written KiB, MiB, GiB, etc.</para>
311
<sect2 id="ms-manual.thesnapshotdetails" xreflabel="The Snapshot Details">
312
<title>The Snapshot Details</title>
314
<para>Returning to our example, the graph is followed by the detailed
315
information for each snapshot. The first nine snapshots are normal, so only
316
a small amount of information is recorded for each one:</para>
318
--------------------------------------------------------------------------------
319
n time(B) total(B) useful-heap(B) extra-heap(B) stacks(B)
320
--------------------------------------------------------------------------------
322
1 1,008 1,008 1,000 8 0
323
2 2,016 2,016 2,000 16 0
324
3 3,024 3,024 3,000 24 0
325
4 4,032 4,032 4,000 32 0
326
5 5,040 5,040 5,000 40 0
327
6 6,048 6,048 6,000 48 0
328
7 7,056 7,056 7,000 56 0
329
8 8,064 8,064 8,000 64 0
332
<para>Each normal snapshot records several things.</para>
335
<listitem><para>Its number.</para></listitem>
337
<listitem><para>The time it was taken. In this case, the time unit is
338
bytes, due to the use of
339
<computeroutput>--time-unit=B</computeroutput>.</para></listitem>
341
<listitem><para>The total memory consumption at that point.</para></listitem>
343
<listitem><para>The number of useful heap bytes allocated at that point.
344
This reflects the number of bytes asked for by the
345
program.</para></listitem>
347
<listitem><para>The number of extra heap bytes allocated at that point.
348
This reflects the number of bytes allocated in excess of what the program
349
asked for. There are two sources of extra heap bytes.</para>
351
<para>First, every heap block has administrative bytes associated with it.
352
The exact number of administrative bytes depends on the details of the
353
allocator. By default Massif assumes 8 bytes per block, as can be seen
354
from the example, but this number can be changed via the
355
<computeroutput>--heap-admin</computeroutput> option.</para>
357
<para>Second, allocators often round up the number of bytes asked for to a
358
larger number. By default, if N bytes are asked for, Massif rounds N up
359
to the nearest multiple of 8 that is equal to or greater than N. This is
360
typical behaviour for allocators, and is required to ensure that elements
361
within the block are suitably aligned. The rounding size can be changed
362
with the <computeroutput>--alignment</computeroutput> option, although it
363
cannot be less than 8, and must be a power of two.</para></listitem>
365
<listitem><para>The size of the stack(s). By default, stack profiling is
366
off as it slows Massif down greatly. Therefore, the stack column is zero
367
in the example.</para></listitem>
370
<para>The next snapshot is detailed. As well as the basic counts, it gives
371
an allocation tree which indicates exactly which pieces of code were
372
responsible for allocating heap memory:</para>
375
9 9,072 9,072 9,000 72 0
376
99.21% (9,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
377
->99.21% (9,000B) 0x804841A: main (example.c:20)
380
<para>The allocation tree can be read from the top down. The first line
381
indicates all heap allocation functions such as <function>malloc</function>
382
and C++ <function>new</function>. All heap allocations go through these
383
functions, and so all 9,000 useful bytes (which is 99.21% of all allocated
384
bytes) go through them. But how were <function>malloc</function> and new
385
called? At this point, every allocation so far has been due to line 21
386
inside <function>main</function>, hence the second line in the tree. The
387
<computeroutput>-></computeroutput> indicates that main (line 20) called
388
<function>malloc</function>.</para>
390
<para>Let's see what the subsequent output shows happened next:</para>
393
--------------------------------------------------------------------------------
394
n time(B) total(B) useful-heap(B) extra-heap(B) stacks(B)
395
--------------------------------------------------------------------------------
396
10 10,080 10,080 10,000 80 0
397
11 12,088 12,088 12,000 88 0
398
12 16,096 16,096 16,000 96 0
399
13 20,104 20,104 20,000 104 0
400
14 20,104 20,104 20,000 104 0
401
99.48% (20,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
402
->49.74% (10,000B) 0x804841A: main (example.c:20)
404
->39.79% (8,000B) 0x80483C2: g (example.c:5)
405
| ->19.90% (4,000B) 0x80483E2: f (example.c:11)
406
| | ->19.90% (4,000B) 0x8048431: main (example.c:23)
408
| ->19.90% (4,000B) 0x8048436: main (example.c:25)
410
->09.95% (2,000B) 0x80483DA: f (example.c:10)
411
->09.95% (2,000B) 0x8048431: main (example.c:23)
414
<para>The first four snapshots are similar to the previous ones. But then
415
the global allocation peak is reached, and a detailed snapshot is taken.
416
Its allocation tree shows that 20,000B of useful heap memory has been
417
allocated, and the lines and arrows indicate that this is from three
418
different code locations: line 20, which is responsible for 10,000B
419
(49.74%); line 5, which is responsible for 8,000B (39.79%); and line 10,
420
which is responsible for 2,000B (9.95%).</para>
422
<para>We can then drill down further in the allocation tree. For example,
423
of the 8,000B asked for by line 5, half of it was due to a call from line
424
11, and half was due to a call from line 25.</para>
426
<para>In short, Massif collates the stack trace of every single allocation
427
point in the program into a single tree, which gives a complete picture of
428
how and why all heap memory was allocated.</para>
430
<para>Note that the tree entries correspond not to functions, but to
431
individual code locations. For example, if function <function>A</function>
432
calls <function>malloc</function>, and function <function>B</function> calls
433
<function>A</function> twice, once on line 10 and once on line 11, then
434
the two calls will result in two distinct stack traces in the tree. In
435
contrast, if <function>B</function> calls <function>A</function> repeatedly
436
from line 15 (e.g. due to a loop), then each of those calls will be
437
represented by the same stack trace in the tree.</para>
439
<para>Note also that tree entry with children in the example satisfies an
440
invariant: the entry's size is equal to the sum of its children's sizes.
441
For example, the first entry has size 20,000B, and its children have sizes
442
10,000B, 8,000B, and 2,000B. In general, this invariant almost always
443
holds. However, in rare circumstances stack traces can be malformed, in
444
which case a stack trace can be a sub-trace of another stack trace. This
445
means that some entries in the tree may not satisfy the invariant -- the
446
entry's size will be greater than the sum of its children's sizes. Massif
447
can sometimes detect when this happens; if it does, it issues a
451
Warning: Malformed stack trace detected. In Massif's output,
452
the size of an entry's child entries may not sum up
453
to the entry's size as they normally do.
456
<para>However, Massif does not detect and warn about every such occurrence.
457
Fortunately, malformed stack traces are rare in practice.</para>
459
<para>Returning now to ms_print's output, the final part is similar:</para>
462
--------------------------------------------------------------------------------
463
n time(B) total(B) useful-heap(B) extra-heap(B) stacks(B)
464
--------------------------------------------------------------------------------
465
15 21,112 19,096 19,000 96 0
466
16 22,120 18,088 18,000 88 0
467
17 23,128 17,080 17,000 80 0
468
18 24,136 16,072 16,000 72 0
469
19 25,144 15,064 15,000 64 0
470
20 26,152 14,056 14,000 56 0
471
21 27,160 13,048 13,000 48 0
472
22 28,168 12,040 12,000 40 0
473
23 29,176 11,032 11,000 32 0
474
24 30,184 10,024 10,000 24 0
475
99.76% (10,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
476
->79.81% (8,000B) 0x80483C2: g (example.c:5)
477
| ->39.90% (4,000B) 0x80483E2: f (example.c:11)
478
| | ->39.90% (4,000B) 0x8048431: main (example.c:23)
480
| ->39.90% (4,000B) 0x8048436: main (example.c:25)
482
->19.95% (2,000B) 0x80483DA: f (example.c:10)
483
| ->19.95% (2,000B) 0x8048431: main (example.c:23)
485
->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
488
<para>The final detailed snapshot shows how the heap looked at termination.
489
The 00.00% entry represents the code locations for which memory was
490
allocated and then freed (line 20 in this case, the memory for which was
491
freed on line 28). However, no code location details are given for this
492
entry; by default, Massif only records the details for code locations
493
responsible for more than 1% of useful memory bytes, and ms_print likewise
494
only prints the details for code locations responsible for more than 1%.
495
The entries that do not meet this threshold are aggregated. This avoids
496
filling up the output with large numbers of unimportant entries. The
497
thresholds can be changed with the
498
<computeroutput>--threshold</computeroutput> option that both Massif and
499
ms_print support.</para>
503
<sect2 id="ms-manual.forkingprograms" xreflabel="Forking Programs">
504
<title>Forking Programs</title>
505
<para>If your program forks, the child will inherit all the profiling data that
506
has been gathered for the parent.</para>
508
<para>If the output file format string (controlled by
509
<option>--massif-out-file</option>) does not contain <option>%p</option>, then
510
the outputs from the parent and child will be intermingled in a single output
511
file, which will almost certainly make it unreadable by ms_print.</para>