Viewing all changes in revision 34.

Committer: John Arbash Meinel
Date: 2009-03-05 17:20:17 UTC
mfrom: (32.1.16 internal_index)
Revision ID: john@arbash-meinel.com-20090305172017-mefnbegtuk4vt99i

Another disk-format bump.

Move the labels/sha1 information into a pre-header. This also makes it
easier to decide to enable/disable the headers, as we can support
both with the same deserialising code (at least until we remove
the extra info from the indexes.)

This also makes a fulltext record stream start with 'f' and a delta
record stream start with 'd', which makes them more self describing.
The next step would probably be to write the base128 length of the
encoded bytes, which would make them fully independent, though
you wouldn't know what content they refer to.

This also brings in an update to .compress() which allows us to
see that we overflowed our group, roll back and start a new one.
This seems to give better compression in a 'more stable' manner.
Still open to tweaking, though.

Also introduce the 'gcc-chk255-big' which uses 64k leaf pages
rather than 4k leaf pages. Initial results show smaller compressed
size at a small (10%) increase in uncompressed size. Also shows
a full level decrease in the tree depth.

No-labels decreases the inv size approx 300kB, and big-page decreases
the inv size another 300kB, not to mention the 116k decrease in the
.cix index, just from not having the extra pages.

Having both no-labels and big inv pages brings a total drop of
11023k down to 9847k for the repo (1176kB savings, or 10% overall).

For now, leave the default with labels, but consider changing it.