~soupmatchers-dev/soupmatchers/trunk

19 by James Westby
Add section titles to README.
1
Soupmatchers
2
============
3
6 by James Westby
Add a README.
4
This is a library to make writing tests for HTML content straightforward and
5
robust.
6
7
The naïve way of doing this would be to do things such as assert that your HTML
8
contains the string
9
12 by James Westby
Make the README in to a doctest.
10
  >>> html = ('<a href="https://launchpad.net/testtools" '
11
  ...     'class="awesome">testtools <b>rocks</b></a>')
6 by James Westby
Add a README.
12
13
which can easily break if you make small changes such as adding a CSS class
14
which is irrelevant to the test, or your templating library changes to
15
sort attributes in alphabetical order.
16
17
Obviously working on the parse tree would be better, and BeautifulSoup is
12 by James Westby
Make the README in to a doctest.
18
part of the way to do that.
19
19 by James Westby
Add section titles to README.
20
BeautifulSoup
21
-------------
22
12 by James Westby
Make the README in to a doctest.
23
  >>> import BeautifulSoup
24
  >>> root = BeautifulSoup.BeautifulSoup(html)
25
26
It is an HTML parsing library that includes
6 by James Westby
Add a README.
27
a way to search the document for matching tags. If you had a parsed
28
representation of your document you could find the above part by doing
29
12 by James Westby
Make the README in to a doctest.
30
  >>> import re
31
  >>> anchor_tags = root.findAll(
32
  ...    "a", attrs={"href": "https://launchpad.net/testtools",
33
  ...        "class": "awesome"})
34
  >>> print anchor_tags
35
  [<a href="https://launchpad.net/testtools" class="awesome">testtools <b>rocks</b></a>]
6 by James Westby
Add a README.
36
37
which would return you a list with (lets assume) a single entry, the
38
BeautifulSoup.Tag for the <a>. You can locate the nested tag with:
39
12 by James Westby
Make the README in to a doctest.
40
  >>> anchor_tag = anchor_tags[0]
41
  >>> anchor_tag.findAll("b")
42
  [<b>rocks</b>]
6 by James Westby
Add a README.
43
44
which will again return a single item list.
45
46
While this is useful to be able to more robustly identify parts of the document
47
it doesn't exactly lend itself to testing. For that we need some methods for
48
checking a document against a specification.
49
19 by James Westby
Add section titles to README.
50
Matchers
51
--------
52
6 by James Westby
Add a README.
53
Here's where the beauty of testtools comes in. Instead of providing yet
54
another TestCase subclass that you somehow have to work in to your test
55
class Hierarchy, we just define a set of testtools.Matcher classes.
56
57
If you use testtools then you can easily make use of these in your tests
58
with assertThat. If not then they have a simple interface that is easy to
59
make use of in your test classes.
60
61
Let's demonstrate.
62
63
First we'll show how to create a matcher that will check that our document
64
contains at least a link to the testtools Launchpad page, and this link
65
has a certain css class, and mentions testtools in the anchor text.
66
12 by James Westby
Make the README in to a doctest.
67
  >>> import soupmatchers
68
  >>> print soupmatchers.Tag(
39 by James Westby
Have the user give us a string to refer to the matcher by.
69
  ...     "link to testtols", "a",
70
  ...     attrs={"href": "https://launchpad.net/testtools",
12 by James Westby
Make the README in to a doctest.
71
  ...         "class": "awesome"})
39 by James Westby
Have the user give us a string to refer to the matcher by.
72
  Tag("link to testtols",
73
  <a class='awesome' href='https://launchpad.net/testtools' ...>...</a>)
6 by James Westby
Add a README.
74
75
This may look rather familiar.
76
17 by James Westby
Don't be precise in Tag.__str__, but try and be more readable.
77
Note that the text representation of the soupmatchers.Tag object isn't
78
what will be literally matched, it is just an attempt to express the things
79
that will be matched.
80
12 by James Westby
Make the README in to a doctest.
81
Further though, soupmatchers allows you to specify text that the
82
tag must contain to match.
83
84
  >>> print soupmatchers.Tag(
39 by James Westby
Have the user give us a string to refer to the matcher by.
85
  ...     "link to testtols", "a",
86
  ...     attrs={"href": "https://launchpad.net/testtools",
87
  ...            "class": "awesome"}, text=re.compile(r"testtools"))
88
  Tag("link to testtols",
17 by James Westby
Don't be precise in Tag.__str__, but try and be more readable.
89
  <a class='awesome' href='https://launchpad.net/testtools'
39 by James Westby
Have the user give us a string to refer to the matcher by.
90
  ...>re.compile('testtools') ...</a>)
12 by James Westby
Make the README in to a doctest.
91
6 by James Westby
Add a README.
92
Now lets define a create a matcher that will match the bold tag from above.
93
39 by James Westby
Have the user give us a string to refer to the matcher by.
94
  >>> print soupmatchers.Tag("bold rocks", "b", text="rocks")
95
  Tag("bold rocks", <b ...>rocks ...</b>)
6 by James Westby
Add a README.
96
97
Obviously this would allow the bold tag to be outside of the anchor tag, but
98
no fear, we can create a matcher that will check that one is inside the
36 by James Westby
Tweak the TagMismatch.describe() output.
99
other, simply use the Within matcher to combine the two.
6 by James Westby
Add a README.
100
34 by James Westby
Drop descendents in favour of Within.
101
  >>> print soupmatchers.Within(
102
  ...     soupmatchers.Tag(
39 by James Westby
Have the user give us a string to refer to the matcher by.
103
  ...         "link to testtools", "a",
104
  ...         attrs={"href": "https://launchpad.net/testtools",
105
  ...                "class": "awesome"}, text=re.compile(r"testtools")),
106
  ...     soupmatchers.Tag("bold rocks", "b", text="rocks"))
107
  Tag("bold rocks", <b ...>rocks ...</b>) within Tag("link to testtools",
108
  <a class='awesome' href='https://launchpad.net/testtools'
109
  ...>re.compile('testtools') ...</a>)
6 by James Westby
Add a README.
110
111
this will mean that the first matcher will only match if the second matcher
10 by James Westby
Add a test for the corner case in README and then delete it.
112
matches the part of the parse tree rooted at the first match.
6 by James Westby
Add a README.
113
114
These matchers are working on the parsed representation, but that doesn't
115
mean you have to go to the trouble of parsing every time you want to use
116
them. To simplify that you can use
117
39 by James Westby
Have the user give us a string to refer to the matcher by.
118
  >>> print soupmatchers.HTMLContains(soupmatchers.Tag("some image", "image"))
119
  HTML contains [Tag("some image", <image ...>...</image>)]
6 by James Westby
Add a README.
120
121
to create a matcher that will parse the string before checking the tag
122
against it.
123
124
Given that you will often want to check multiple things about the HTML
125
you can pass multiple soupmatchers.Tag objects to the constructor of
126
soupmatchers.HTMLContains, and the resulting matcher will only match
127
if all of the passed matchers match.
128
19 by James Westby
Add section titles to README.
129
Using Matchers
130
--------------
131
6 by James Westby
Add a README.
132
This hasn't explained how to use the matcher objects though, for that you
133
need to make use of their match() method.
134
12 by James Westby
Make the README in to a doctest.
135
  >>> import testtools
136
  >>> matcher = testtools.matchers.Equals(1)
137
  >>> match = matcher.match(1)
138
  >>> print match
139
  None
6 by James Westby
Add a README.
140
141
the returned match will be None if the matcher matches the content that
142
you passed, otherwise it will be a testtools.Mismatch object. To put
143
this in unittest language
144
145
  match = matcher.match(content)
12 by James Westby
Make the README in to a doctest.
146
  self.assertEquals(None, match)
6 by James Westby
Add a README.
147
148
or, if you subclass testtools.TestCase,
149
150
  self.assertThat(content, matcher)
151
19 by James Westby
Add section titles to README.
152
Testing Responses
153
-----------------
6 by James Westby
Add a README.
154
155
For those that use a framework that has test response objects, you can even
156
go a step further and check the whole response in one go.
157
158
The soupmatchers.ResponseHas matcher class will check the response_code
159
attribute of the passed object against an expected value, and also check
8 by James Westby
Fix text to do what is expected.
160
the content attribute against any matcher you wish to specify.
6 by James Westby
Add a README.
161
12 by James Westby
Make the README in to a doctest.
162
  >>> print soupmatchers.ResponseHas(
163
  ...     status_code=404,
39 by James Westby
Have the user give us a string to refer to the matcher by.
164
  ...     content_matches=soupmatchers.HTMLContains(soupmatchers.Tag(
165
  ...         "an anchor", "a")))
33 by James Westby
Make count=None mean match any number, and make that the default.
166
  ResponseHas(status_code=404, content_matches=HTML contains
39 by James Westby
Have the user give us a string to refer to the matcher by.
167
  [Tag("an anchor", <a ...>...</a>)])
12 by James Westby
Make the README in to a doctest.
168
6 by James Westby
Add a README.
169
where the status_code parameter defaults to 200.
170
11 by James Westby
Add a convenience class for ResponseHas that deals with HTML.
171
As working with HTML is very common, there's an easier way to write the
172
above.
173
12 by James Westby
Make the README in to a doctest.
174
  >>> print soupmatchers.HTMLResponseHas(
39 by James Westby
Have the user give us a string to refer to the matcher by.
175
  ...     status_code=404, html_matches=soupmatchers.Tag("an anchor", "a"))
17 by James Westby
Don't be precise in Tag.__str__, but try and be more readable.
176
  HTMLResponseHas(status_code=404, content_matches=HTML contains
39 by James Westby
Have the user give us a string to refer to the matcher by.
177
  [Tag("an anchor", <a ...>...</a>)])
11 by James Westby
Add a convenience class for ResponseHas that deals with HTML.
178
179
Later similar objects will be added for dealing with XML and JSON.
180
6 by James Westby
Add a README.
181
This matcher is designed to work with Django, but will work with any object
182
that has those two attributes.
8 by James Westby
Fix text to do what is expected.
183
184
Putting it all together we could do the original check using
185
16 by James Westby
Test a mismatch in the README too.
186
  >>> class ExpectedResponse(object):
15 by James Westby
Test the final part of the README.
187
  ...     status_code = 200
188
  ...     content = html
16 by James Westby
Test a mismatch in the README too.
189
  >>> class UnexpectedResponse(object):
190
  ...     status_code = 200
191
  ...     content = "<h1>This is some other response<h1>"
15 by James Westby
Test the final part of the README.
192
39 by James Westby
Have the user give us a string to refer to the matcher by.
193
  >>> child_matcher = soupmatchers.Tag("bold rocks", "b", text="rocks")
15 by James Westby
Test the final part of the README.
194
  >>> anchor_matcher = soupmatchers.Tag(
39 by James Westby
Have the user give us a string to refer to the matcher by.
195
  ...     "testtools link", "a",
196
  ...     attrs={"href": "https://launchpad.net/testtools",
197
  ...            "class": "awesome"},
34 by James Westby
Drop descendents in favour of Within.
198
  ...     text=re.compile(r"testtools"))
199
  >>> combined_matcher = soupmatchers.Within(anchor_matcher, child_matcher)
15 by James Westby
Test the final part of the README.
200
  >>> response_matcher = soupmatchers.HTMLResponseHas(
34 by James Westby
Drop descendents in favour of Within.
201
  ...     html_matches=combined_matcher)
15 by James Westby
Test the final part of the README.
202
  >>> #self.assertThat(response, response_matcher)
16 by James Westby
Test a mismatch in the README too.
203
  >>> match = response_matcher.match(ExpectedResponse())
15 by James Westby
Test the final part of the README.
204
  >>> print match
205
  None
16 by James Westby
Test a mismatch in the README too.
206
  >>> match = response_matcher.match(UnexpectedResponse())
17 by James Westby
Don't be precise in Tag.__str__, but try and be more readable.
207
  >>> print repr(match) #doctest: +ELLIPSIS
16 by James Westby
Test a mismatch in the README too.
208
  <soupmatchers.TagMismatch object at ...>
209
  >>> print match.describe()
36 by James Westby
Tweak the TagMismatch.describe() output.
210
  Matched 0 times
40 by James Westby
Implement a basic get_extra_info for Within.
211
  Here is some information that may be useful:
45 by James Westby
Restructure the layering to push more up in to DocumentPart.
212
    0 matches for "bold rocks" in the document.
41 by James Westby
Make close matches work inside a Within.
213
    0 matches for "testtools link" in the document.
8 by James Westby
Fix text to do what is expected.
214
215
which while verbose is checking lots of things, while being maintainable
216
due to not being overly tied to particular textual output.
27 by James Westby
Docuement the get_details() stuff in the README.
217
36 by James Westby
Tweak the TagMismatch.describe() output.
218
Checking the number of times a pattern is matched
219
-------------------------------------------------
220
221
Remember how findAll returned a list, and we just assumed that it only found
222
one tag in the example? Well, the matchers allow you to not just assume that,
223
they allow you to assert that. That means that you can assert that
224
a particular tag only occurs once by passing
225
226
  count=1
227
228
in the constructor.
229
39 by James Westby
Have the user give us a string to refer to the matcher by.
230
  >>> tag_matcher = soupmatchers.Tag("testtools link", "a",
36 by James Westby
Tweak the TagMismatch.describe() output.
231
  ...    attrs={"href": "https://launchpad.net/testtools"}, count=1)
232
  >>> html_matcher = soupmatchers.HTMLContains(tag_matcher)
233
  >>> content = '<a href="https://launchpad.net/testtools"></a>'
234
  >>> match = html_matcher.match(content)
235
  >>> print match
236
  None
237
  >>> match = html_matcher.match(content * 2)
238
  >>> print match.describe()
239
  Matched 2 times
240
  The matches were:
241
    <a href="https://launchpad.net/testtools"></a>
242
    <a href="https://launchpad.net/testtools"></a>
243
244
Similarly you can assert that a particular tag isn't present by
245
creating a soupmatchers.Tag with
246
39 by James Westby
Have the user give us a string to refer to the matcher by.
247
  count=0
36 by James Westby
Tweak the TagMismatch.describe() output.
248
39 by James Westby
Have the user give us a string to refer to the matcher by.
249
  >>> tag_matcher = soupmatchers.Tag("testtools link", "a",
36 by James Westby
Tweak the TagMismatch.describe() output.
250
  ...    attrs={"href": "https://launchpad.net/testtools"}, count=0)
251
  >>> html_matcher = soupmatchers.HTMLContains(tag_matcher)
252
  >>> content = '<a href="https://launchpad.net/testtools"></a>'
253
  >>> match = html_matcher.match(content)
254
  >>> print match.describe()
255
  Matched 1 time
256
  The match was:
257
    <a href="https://launchpad.net/testtools"></a>
258
259
If you wish to assert only that a tag matches at least a given number of
260
times, or at most a given number of times, then you will have to propose
261
a change to the code to allow that.
262
27 by James Westby
Docuement the get_details() stuff in the README.
263
Failure Messages
264
----------------
265
266
As Tag only specifies a pattern to match, when something goes wrong it is
267
hard to know what information will be useful to someone reading the output.
268
269
A bad thing to do is to print the entire HTML document, as it can often be
270
large and so obscure the failure message. Sometimes though looking at
271
the HTML is the best way to find the problem. For that reason the Mismatch
272
can provide the entire document to you. If you call get_details() on the
273
Mismatch you will get a dict that contains the html as the value for
274
the "html" key.
275
39 by James Westby
Have the user give us a string to refer to the matcher by.
276
  >>> matcher = soupmatchers.HTMLContains(soupmatchers.Tag("bold", "b"))
27 by James Westby
Docuement the get_details() stuff in the README.
277
  >>> mismatch = matcher.match("<image></image>")
278
  >>> print mismatch.get_details().keys()
279
  ['html']
280
  >>> print ''.join(list(mismatch.get_details()["html"].iter_bytes()))
281
  <image></image>
282
283
If you use assertThat then it will automatically call addDetails with this
284
information, so it is available to the TestResult. Your test runner can
285
then do something useful with this if it likes.
286
28 by James Westby
Print anything that matched in the describe() output.
287
That leaves the question of what to print in the failure message though.
288
289
If there are any matches at all then you want to see the string that matched.
290
This is particularly useful when there are too many matches, but also when
291
you expect multiple matches, but less are found then knowing which matched
292
can narrow the search.
293
39 by James Westby
Have the user give us a string to refer to the matcher by.
294
  >>> matcher = soupmatchers.HTMLContains(soupmatchers.Tag(
295
  ...        "no bold", "b", count=0))
28 by James Westby
Print anything that matched in the describe() output.
296
  >>> mismatch = matcher.match("<b>rocks</b>")
297
  >>> print mismatch.describe()
36 by James Westby
Tweak the TagMismatch.describe() output.
298
  Matched 1 time
28 by James Westby
Print anything that matched in the describe() output.
299
  The match was:
300
      <b>rocks</b>
29 by James Westby
Start to look for close matches when there weren't enough matches.
301
302
If there aren't enough matches then the failure message will attempt to
303
tell you about the closest matches, in the hope that one of them gives a
304
clue as to the problem.
305
306
  >>> matcher = soupmatchers.HTMLContains(
39 by James Westby
Have the user give us a string to refer to the matcher by.
307
  ...    soupmatchers.Tag("testtools link", "a",
29 by James Westby
Start to look for close matches when there weren't enough matches.
308
  ...        attrs={"href": "https://launchpad.net/testtools",
309
  ...               "class": "awesome"}))
310
  >>> mismatch = matcher.match(
311
  ...    "<a href='https://launchpad.net/testtools'></a>")
312
  >>> print mismatch.describe()
36 by James Westby
Tweak the TagMismatch.describe() output.
313
  Matched 0 times
37 by James Westby
Move from get_close_matches to get_extra_info, which is more generic.
314
  Here is some information that may be useful:
41 by James Westby
Make close matches work inside a Within.
315
     1 matches for "testtools link" when attribute class="awesome" is not a
316
     requirement.
29 by James Westby
Start to look for close matches when there weren't enough matches.
317
30 by James Westby
Vary based on text when looking for close matches.
318
  >>> matcher = soupmatchers.HTMLContains(
39 by James Westby
Have the user give us a string to refer to the matcher by.
319
  ...    soupmatchers.Tag("bold rocks", "b", text="rocks"))
30 by James Westby
Vary based on text when looking for close matches.
320
  >>> mismatch = matcher.match(
321
  ...    "<b>is awesome</b>")
322
  >>> print mismatch.describe()
36 by James Westby
Tweak the TagMismatch.describe() output.
323
  Matched 0 times
37 by James Westby
Move from get_close_matches to get_extra_info, which is more generic.
324
  Here is some information that may be useful:
41 by James Westby
Make close matches work inside a Within.
325
    1 matches for "bold rocks" when text="rocks" is not a requirement.
30 by James Westby
Vary based on text when looking for close matches.
326
29 by James Westby
Start to look for close matches when there weren't enough matches.
327
While this will often fail to tell you much that will help you diagnose the
328
problem it should be possible to write your matchers in such a way that the
329
output is generally useful.
32 by James Westby
Add Within for matching one tag inside another.
330
331
Restricting matches to particular areas of the document
332
-------------------------------------------------------
333
334
Often you want to assert that some HTML is contained within a particular
335
part of the document. At the simplest level you may want to check that
336
the HTML is within the <body> tag.
337
34 by James Westby
Drop descendents in favour of Within.
338
It is possible to specify that some Tag is within another by combining
339
them in the Within matcher.
32 by James Westby
Add Within for matching one tag inside another.
340
39 by James Westby
Have the user give us a string to refer to the matcher by.
341
  >>> child_matcher = soupmatchers.Tag("bold rocks", "b", text="rocks")
342
  >>> body_matcher = soupmatchers.Tag("the body", "body")
32 by James Westby
Add Within for matching one tag inside another.
343
  >>> matcher = soupmatchers.HTMLContains(
344
  ...     soupmatchers.Within(body_matcher, child_matcher))
345
  >>> print matcher
39 by James Westby
Have the user give us a string to refer to the matcher by.
346
  HTML contains [Tag("bold rocks", <b ...>rocks ...</b>)
347
  within Tag("the body", <body ...>...</body>)]
32 by James Westby
Add Within for matching one tag inside another.
348
  >>> mismatch = matcher.match("<b>rocks</b><body></body>")
349
  >>> print mismatch.describe()
37 by James Westby
Move from get_close_matches to get_extra_info, which is more generic.
350
  Matched 0 times
40 by James Westby
Implement a basic get_extra_info for Within.
351
  Here is some information that may be useful:
45 by James Westby
Restructure the layering to push more up in to DocumentPart.
352
    1 matches for "bold rocks" in the document.
41 by James Westby
Make close matches work inside a Within.
353
    1 matches for "the body" in the document.