The code of harvest lives in Launchpad and makes use of python and Django.
Harvest regularly pulls data from URLs stored in this branch. The file layout is pretty simple:
daniel@bert:~/bzr/harvest-data$ ls clues opportunities daniel@bert:~/bzr/harvest-data$
Before attempting to download the CSV (comma-separated values) file, Harvest will check the Last-Modified entry in the HTTP header and see if any modifications were made. This is done to reduce traffic.
The opportunities file is in CSV and of the following format:
<url>,<description>
The URLs to CSV files must be reachable via HTTP(s). The description is optional.
The CSV file in turn needs to be of the following form:
<sourcepackage>,<url>,<description>
For example:
vdrift,http://launchpad.net/bugs/106854,106854
Opportunities can be anything:
Let your imagination go wild. :-)
The clues file is in CSV and of the following format:
<url>,<score>,<description>
The URL specifies the link to another CSV file that should be pulled regularly. The score is a float value that describes how good or bad it is for the package to be on the list (eg. if a package is uninstallable that might be worth a -500, if 50% of the bugs are forwarded upstream that might be worth +300). The scores are summed up every time the HTML pages are generated and might indicate if the package is in a good shape.
The format of the CSV file containing the clues is the same as that of the opportunities, right now only the source package name is used.
© 2008-2009 Canonical Ltd.