~ubuntu-branches/ubuntu/oneiric/calibre/oneiric

« back to all changes in this revision

Viewing changes to resources/recipes/nytimes_sub.recipe

  • Committer: Bazaar Package Importer
  • Author(s): Martin Pitt
  • Date: 2010-06-21 10:18:08 UTC
  • mfrom: (1.3.12 upstream)
  • Revision ID: james.westby@ubuntu.com-20100621101808-aue828f532tmo4zt
Tags: 0.7.2+dfsg-1
* New major upstream version. See http://calibre-ebook.com/new-in/seven for
  details.
* Refresh patches to apply cleanly.
* debian/control: Bump python-cssutils to >= 0.9.7~ to ensure the existence
  of the CSSRuleList.rulesOfType attribute. This makes epub conversion work
  again. (Closes: #584756)
* Add debian/local/calibre-mount-helper: Simple and safe replacement for
  upstream's calibre-mount-helper, using udisks --mount and eject.
  (Closes: #584915, LP: #561958)

Show diffs side-by-side

added added

removed removed

Lines of Context:
74
74
                            'relatedSearchesModule',
75
75
                            'side_tool',
76
76
                            'singleAd',
 
77
                            'subNavigation tabContent active',
77
78
                            'subNavigation tabContent active clearfix',
78
79
                            ]}),
79
80
                   dict(id=[
82
83
                            'articleExtras',
83
84
                            'articleInline',
84
85
                            'blog_sidebar',
 
86
                            'businessSearchBar',
85
87
                            'cCol',
86
88
                            'entertainmentSearchBar',
87
89
                            'footer',
278
280
        return ans
279
281
 
280
282
    def preprocess_html(self, soup):
281
 
        '''
282
 
        refresh = soup.find('meta', {'http-equiv':'refresh'})
283
 
        if refresh is None:
284
 
            return soup
285
 
        content = refresh.get('content').partition('=')[2]
286
 
        raw = self.browser.open('http://www.nytimes.com'+content).read()
287
 
        return BeautifulSoup(raw.decode('cp1252', 'replace'))
288
 
        '''
 
283
        # Skip ad pages served before actual article
 
284
        skip_tag = soup.find(True, {'name':'skip'})
 
285
        if skip_tag is not None:
 
286
            self.log.error("Found forwarding link: %s" % skip_tag.parent['href'])
 
287
            url = 'http://www.nytimes.com' + re.sub(r'\?.*', '', skip_tag.parent['href'])
 
288
            url += '?pagewanted=all'
 
289
            self.log.error("Skipping ad to article at '%s'" % url)
 
290
            soup = self.index_to_soup(url)
289
291
        return self.strip_anchors(soup)
290
292
 
291
293
    def postprocess_html(self,soup, True):
 
294
        print "\npostprocess_html()\n"
292
295
 
293
296
        if self.one_picture_per_article:
294
297
            # Remove all images after first
411
414
        return soup
412
415
 
413
416
    def postprocess_book(self, oeb, opts, log) :
 
417
        print "\npostprocess_book()\n"
414
418
 
415
419
        def extract_byline(href) :
416
420
            # <meta name="byline" content=