~jbhannah/+junk/python-loadtest

« back to all changes in this revision

Viewing changes to src/PageParser.py

  • Committer: Jesse B. Hannah
  • Date: 2008-12-30 07:38:05 UTC
  • Revision ID: jesse@jbhannah.net-20081230073805-hrttiw1xeyudopw0
PageParser now compares directly against base URL given to LoadTest in finding internal links

Show diffs side-by-side

added added

removed removed

Lines of Context:
4
4
from urlparse import urlparse
5
5
 
6
6
class PageParser(HTMLParser):
7
 
    def __init__(self):
 
7
    def __init__(self, base_url):
 
8
        self.base_url = urlparse(base_url)
8
9
        self.links = []
9
10
    
10
11
    def handle_starttag(self, tag, attrs):
11
12
        if tag == 'a':
12
13
            url = urlparse(attrs[href])
13
 
            if re.match("^meteorites-dev\.asu\.edu", url.netloc()) and not\
 
14
            if base_url.netloc() == url.netloc() and not\
14
15
                    re.search("\.(pdf|jpg)", url.path()):
15
16
                self.links.append(url)
16
17