5
5
A fairly random collection of things to work on next...
7
1) Coming up with a catchy or at least somewhat interesting name.
9
I suck at names. Currently "memory_dump" is the library, pymemdump is
10
the project. I don't mind a functional name, but I don't want people
11
going "ugh" when they think of using the tool. :)
13
When this happens, create an official project on Launchpad, and host it
16
2) (DONE @ revno 58) Tracking the memory consumed by the GC overhead.
18
Objects allocated in the garbage collector (just about everything,
19
strings being the notable exception) actually have a PyGC_Head
20
structure allocated first. So while a 1 entry tuple *looks* like it
21
is only 16 bytes, it actually has another probably 16-byte PyGC_Head
22
structure allocated for each one.
24
I haven't quite figured out how to tell if a given object is in the
25
gc. It may just be a bit-field in the type object.
27
3) Generating a Calltree output.
7
1) Generating a Calltree output.
29
9
I haven't yet understood the calltree syntax, nor how I want to
30
10
exactly match things. Certainly you don't have FILE/LINE to put into
35
15
.. _runsnakerun: http://www.vrplumber.com/programming/runsnakerun/
37
4) Other analysis tools, like walking the ref graph.
17
2) Other analysis tools, like walking the ref graph.
39
19
I'm thinking something similar to PDB, which could let you walk
40
20
up-and-down the reference graph, to let you figure out why that one
42
22
At the moment, you can do this using '*' in Vim, which is at least a
43
23
start, and one reason to use a text-compatible dump format.
45
5) Easier ways to hook this into existing processes...
47
I'm not really sure what to do here, but adding a function to make it
48
easier to write-out and load-in the memory info, when you aren't as
51
The dump file current takes ~ the same amount of memory as the actual
52
objects in ram, both on disk, and then when loaded back into memory.
54
6) Dump differencing utilities.
25
3) Dump differencing utilities.
56
27
This probably will make it a bit easier to see where memory is
57
28
increasing, rather than just where it is at right now.
59
7) Cheaper "dict" of MemObjects.
61
At the moment, loading a 2M object dump costs 50MB for just the dict
62
holding them. However each entry uses a simple object address as the
63
key, which it maintains on the object itself. So instead of 3-words
64
per entry, you could use 1. Further, the address isn't all that great
65
as a hash key. Namely 90% of your objects are aligned on a 16-byte
66
boundary, another 9% or so on a 8-byte boundary, and the random
67
Integer is allocated on a 4-byte boundary. Regardless, just using
68
"address & 0xFF" is going to have ~16x more collisions than doing
69
something a bit more sensible. (Rotate the bits a bit.)
71
Also, I'm thinking to allow you to load a dump file, and strip off
72
things that may not be as interesting. Like whether you want values
73
or not, or if you wanted to limit the maximum reference list to 100
74
or so. I figure at more that 100, you aren't all that interested in
75
an individual reference. At it might be nice to be able to analyze
76
big dump files without consuming all of your memory.
78
8) Full cross-platform and version compatibility.
30
4) Full cross-platform and version compatibility testing.
80
32
I'd like to support python2.4+, 32/64-bit, Win/Linux/Mac. I've tested
81
33
a couple variants, but I don't have all of them to make sure it works