~ubuntu-branches/debian/sid/meliae/sid

« back to all changes in this revision

Viewing changes to TODO.txt

Committer: Bazaar Package Importer
Author(s): Jelmer Vernooij
Date: 2009-12-19 18:23:37 UTC
Revision ID: james.westby@ubuntu.com-20091219182337-t09txw6ca1yfysn9

Tags: upstream-0.2.0

Import upstream version 0.2.0

Show diffs side-by-side

added added

removed removed

TODO.txt

============

Things to do

============

A fairly random collection of things to work on next...

1) Coming up with a catchy or at least somewhat interesting name.

I suck at names. Currently "memory_dump" is the library, pymemdump is

the project. I don't mind a functional name, but I don't want people

going "ugh" when they think of using the tool. :)

When this happens, create an official project on Launchpad, and host it

there.

2) (DONE @ revno 58) Tracking the memory consumed by the GC overhead.

Objects allocated in the garbage collector (just about everything,

strings being the notable exception) actually have a PyGC_Head

structure allocated first. So while a 1 entry tuple *looks* like it

is only 16 bytes, it actually has another probably 16-byte PyGC_Head

structure allocated for each one.

I haven't quite figured out how to tell if a given object is in the

gc. It may just be a bit-field in the type object.

3) Generating a Calltree output.

I haven't yet understood the calltree syntax, nor how I want to

exactly match things. Certainly you don't have FILE/LINE to put into

the output.

Also, look at generating `runsnakerun`_ output.

.. _runsnakerun: http://www.vrplumber.com/programming/runsnakerun/

4) Other analysis tools, like walking the ref graph.

I'm thinking something similar to PDB, which could let you walk

up-and-down the reference graph, to let you figure out why that one

string is being cached, by going through the 10 layers of references.

At the moment, you can do this using '*' in Vim, which is at least a

start, and one reason to use a text-compatible dump format.

5) Easier ways to hook this into existing processes...

I'm not really sure what to do here, but adding a function to make it

easier to write-out and load-in the memory info, when you aren't as

memory constrained.

The dump file current takes ~ the same amount of memory as the actual

objects in ram, both on disk, and then when loaded back into memory.

6) Dump differencing utilities.

This probably will make it a bit easier to see where memory is

increasing, rather than just where it is at right now.

7) Cheaper "dict" of MemObjects.

At the moment, loading a 2M object dump costs 50MB for just the dict

holding them. However each entry uses a simple object address as the

key, which it maintains on the object itself. So instead of 3-words

per entry, you could use 1. Further, the address isn't all that great

as a hash key. Namely 90% of your objects are aligned on a 16-byte

boundary, another 9% or so on a 8-byte boundary, and the random

Integer is allocated on a 4-byte boundary. Regardless, just using

"address & 0xFF" is going to have ~16x more collisions than doing

something a bit more sensible. (Rotate the bits a bit.)

Also, I'm thinking to allow you to load a dump file, and strip off

things that may not be as interesting. Like whether you want values

or not, or if you wanted to limit the maximum reference list to 100

or so. I figure at more that 100, you aren't all that interested in

an individual reference. At it might be nice to be able to analyze

big dump files without consuming all of your memory.

8) Full cross-platform and version compatibility.

I'd like to support python2.4+, 32/64-bit, Win/Linux/Mac. I've tested

a couple variants, but I don't have all of them to make sure it works

everywhere.

vim: ft=rst

Older »