← Back to branch summary

~oubiwann/webxcreta/trunk

~oubiwann/webxcreta/trunk

« back to all changes in this revision

Viewing changes to TODO

Committer: Duncan McGreggor
Date: 2008-12-19 17:01:58 UTC
mfrom: (38.1.8 trunk-orig)
Revision ID: duncan@canonical.com-20081219170158-84rwpofpk26ymg9g

* Updated the defs with the new location for this branch (old trunk).
* Updated defs with location for new branch (new trunk).
* Merged recent changes from old trunk to new trunk.
* Consolidated exception logging logic.
* Added skip feeds.
* More TODO updates.

files modified:
ChangeLog

TODO

admin/defs.sh

webxcreta/clients.py

webxcreta/harvesters.py

webxcreta/utils.py

Show diffs side-by-side

added added

removed removed

1

1

# General

2

<<<<<<< TREE

2

3

* Clean up code.

3

4

* Reduce the number of times data is copied/looped, etc.

4

5

* Break all the logic up int * funtional units, for example, the page-getting

8

9

. visit the URL and get the RSS link

9

10

. read post data from the most recent entry on RSS

10

11

* Add unit testing.

12

. add tests for figuring out word weights

11

13

* Add support for the following:

12

14

. Individual RSS feeds

13

15

. Web page scrapding (content)

14

16

. Getting content from emails tagged in gmail

15

17

* Turn off logging output during unit tests/doctests.

16

18

* Clean up word filtering

19

* Add the ability to continue getting source data when interrupted

20

. need to track feed list of dicts (url, order, rank)

21

. will require handler that is called during any fatal exception

22

. handler will need to pickle all data collected to that point

23

. a command line option -c --continue

17

24

18

25

# Scripts

19

26

* Unify the mbox and post-generating scripts into a single one

61

68

62

69

# Add more clients

63

70

* blogsearch.google.com client

64

* bloglines top 1000: http://beta.bloglines.com/topfeeds

Older »