~ubuntu-branches/debian/experimental/calibre/experimental

« back to all changes in this revision

Viewing changes to recipes/tagesspiegel.recipe

  • Committer: Package Import Robot
  • Author(s): Martin Pitt
  • Date: 2012-02-10 07:35:00 UTC
  • mfrom: (1.3.30)
  • Revision ID: package-import@ubuntu.com-20120210073500-9hx5hpketc9hb59i
Tags: 0.8.38+dfsg-1
* New upstream release.
* debian/control: Bump Standards-Version to 3.9.2. No changes necessary.

Show diffs side-by-side

added added

removed removed

Lines of Context:
14
14
    language = 'de'
15
15
    oldest_article = 7
16
16
    max_articles_per_feed = 100
 
17
    publication_type = 'newspaper'
17
18
 
18
19
    extra_css = '''
19
20
                .hcf-overline{color:#990000; font-family:Arial,Helvetica,sans-serif;font-size:xx-small;display:block}
33
34
    no_javascript = True
34
35
    remove_empty_feeds = True
35
36
    encoding = 'utf-8'
36
 
    remove_tags = [{'class':'hcf-header'}]
 
37
    remove_tags = [{'class':'hcf-header'}, {'class':'hcf-atlas'}, {'class':'hcf-date hcf-separate'}]
37
38
 
38
39
    def print_version(self, url):
39
40
        url = url.split('/')
40
41
        url[-1] = 'v_print,%s?p='%url[-1]
41
42
        return '/'.join(url)
42
43
 
 
44
    def get_masthead_url(self):
 
45
        return 'http://www.tagesspiegel.de/images/tsp_logo/3114/6.png'
 
46
 
43
47
    def parse_index(self):
44
48
        soup = self.index_to_soup('http://www.tagesspiegel.de/zeitung/')
45
49
 
51
55
        ans = []
52
56
        maincol = soup.find('div', attrs={'class':re.compile('hcf-main-col')})
53
57
 
54
 
        for div in maincol.findAll(True, attrs={'class':['hcf-teaser', 'hcf-header', 'story headline']}):
 
58
        for div in maincol.findAll(True, attrs={'class':['hcf-teaser', 'hcf-header', 'story headline', 'hcf-teaser hcf-last']}):
55
59
 
56
60
             if div['class'] == 'hcf-header':
57
61
                 try:
61
65
                 except:
62
66
                     continue
63
67
 
64
 
             elif div['class'] == 'hcf-teaser' and getattr(div.contents[0],'name','') == 'h2':
 
68
             elif div['class'] in ['hcf-teaser', 'hcf-teaser hcf-last'] and getattr(div.contents[0],'name','') == 'h2':
65
69
                 a = div.find('a', href=True)
66
70
                 if not a:
67
71
                     continue