HACKING Harvest

The code of harvest lives in Launchpad and makes use of python and Django.

Harvest regularly pulls data from URLs stored in this branch. The file layout is pretty simple:

daniel@bert:~/bzr/harvest-data$ ls
clues  opportunities
daniel@bert:~/bzr/harvest-data$ 

Before attempting to download the CSV (comma-separated values) file, Harvest will check the Last-Modified entry in the HTTP header and see if any modifications were made. This is done to reduce traffic.

Opportunities?

The opportunities file is in CSV and of the following format:

<url>,<description>

The URLs to CSV files must be reachable via HTTP(s). The description is optional.

The CSV file in turn needs to be of the following form:

<sourcepackage>,<url>,<description>

For example:

vdrift,http://launchpad.net/bugs/106854,106854

Opportunities can be anything:

Let your imagination go wild. :-)

Clues?

The clues file is in CSV and of the following format:

<url>,<score>,<description>

The URL specifies the link to another CSV file that should be pulled regularly. The score is a float value that describes how good or bad it is for the package to be on the list (eg. if a package is uninstallable that might be worth a -500, if 50% of the bugs are forwarded upstream that might be worth +300). The scores are summed up every time the HTML pages are generated and might indicate if the package is in a good shape.

The format of the CSV file containing the clues is the same as that of the opportunities, right now only the source package name is used.