|
19
by James Westby
Add section titles to README. |
1 |
Soupmatchers
|
2 |
============
|
|
3 |
||
|
6
by James Westby
Add a README. |
4 |
This is a library to make writing tests for HTML content straightforward and |
5 |
robust.
|
|
6 |
||
7 |
The naïve way of doing this would be to do things such as assert that your HTML |
|
8 |
contains the string |
|
9 |
||
|
12
by James Westby
Make the README in to a doctest. |
10 |
>>> html = ('<a href="https://launchpad.net/testtools" ' |
11 |
... 'class="awesome">testtools <b>rocks</b></a>') |
|
|
6
by James Westby
Add a README. |
12 |
|
13 |
which can easily break if you make small changes such as adding a CSS class |
|
14 |
which is irrelevant to the test, or your templating library changes to |
|
15 |
sort attributes in alphabetical order. |
|
16 |
||
17 |
Obviously working on the parse tree would be better, and BeautifulSoup is |
|
|
12
by James Westby
Make the README in to a doctest. |
18 |
part of the way to do that. |
19 |
||
|
19
by James Westby
Add section titles to README. |
20 |
BeautifulSoup
|
21 |
-------------
|
|
22 |
||
|
12
by James Westby
Make the README in to a doctest. |
23 |
>>> import BeautifulSoup |
24 |
>>> root = BeautifulSoup.BeautifulSoup(html) |
|
25 |
||
26 |
It is an HTML parsing library that includes |
|
|
6
by James Westby
Add a README. |
27 |
a way to search the document for matching tags. If you had a parsed |
28 |
representation of your document you could find the above part by doing |
|
29 |
||
|
12
by James Westby
Make the README in to a doctest. |
30 |
>>> import re |
31 |
>>> anchor_tags = root.findAll( |
|
32 |
... "a", attrs={"href": "https://launchpad.net/testtools", |
|
33 |
... "class": "awesome"}) |
|
34 |
>>> print anchor_tags |
|
35 |
[<a href="https://launchpad.net/testtools" class="awesome">testtools <b>rocks</b></a>] |
|
|
6
by James Westby
Add a README. |
36 |
|
37 |
which would return you a list with (lets assume) a single entry, the |
|
38 |
BeautifulSoup.Tag for the <a>. You can locate the nested tag with:
|
|
39 |
||
|
12
by James Westby
Make the README in to a doctest. |
40 |
>>> anchor_tag = anchor_tags[0] |
41 |
>>> anchor_tag.findAll("b")
|
|
42 |
[<b>rocks</b>] |
|
|
6
by James Westby
Add a README. |
43 |
|
44 |
which will again return a single item list. |
|
45 |
||
46 |
While this is useful to be able to more robustly identify parts of the document |
|
47 |
it doesn't exactly lend itself to testing. For that we need some methods for |
|
48 |
checking a document against a specification. |
|
49 |
||
|
19
by James Westby
Add section titles to README. |
50 |
Matchers |
51 |
-------- |
|
52 |
||
|
6
by James Westby
Add a README. |
53 |
Here's where the beauty of testtools comes in. Instead of providing yet |
54 |
another TestCase subclass that you somehow have to work in to your test |
|
55 |
class Hierarchy, we just define a set of testtools.Matcher classes. |
|
56 |
||
57 |
If you use testtools then you can easily make use of these in your tests |
|
58 |
with assertThat. If not then they have a simple interface that is easy to |
|
59 |
make use of in your test classes. |
|
60 |
||
61 |
Let's demonstrate. |
|
62 |
||
63 |
First we'll show how to create a matcher that will check that our document |
|
64 |
contains at least a link to the testtools Launchpad page, and this link |
|
65 |
has a certain css class, and mentions testtools in the anchor text. |
|
66 |
||
|
12
by James Westby
Make the README in to a doctest. |
67 |
>>> import soupmatchers |
68 |
>>> print soupmatchers.Tag( |
|
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
69 |
... "link to testtols", "a", |
70 |
... attrs={"href": "https://launchpad.net/testtools",
|
|
|
12
by James Westby
Make the README in to a doctest. |
71 |
... "class": "awesome"}) |
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
72 |
Tag("link to testtols",
|
73 |
<a class='awesome' href='https://launchpad.net/testtools' ...>...</a>) |
|
|
6
by James Westby
Add a README. |
74 |
|
75 |
This may look rather familiar. |
|
76 |
||
|
17
by James Westby
Don't be precise in Tag.__str__, but try and be more readable. |
77 |
Note that the text representation of the soupmatchers.Tag object isn't |
78 |
what will be literally matched, it is just an attempt to express the things |
|
79 |
that will be matched. |
|
80 |
||
|
12
by James Westby
Make the README in to a doctest. |
81 |
Further though, soupmatchers allows you to specify text that the |
82 |
tag must contain to match. |
|
83 |
||
84 |
>>> print soupmatchers.Tag( |
|
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
85 |
... "link to testtols", "a", |
86 |
... attrs={"href": "https://launchpad.net/testtools",
|
|
87 |
... "class": "awesome"}, text=re.compile(r"testtools")) |
|
88 |
Tag("link to testtols",
|
|
|
17
by James Westby
Don't be precise in Tag.__str__, but try and be more readable. |
89 |
<a class='awesome' href='https://launchpad.net/testtools' |
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
90 |
...>re.compile('testtools') ...</a>) |
|
12
by James Westby
Make the README in to a doctest. |
91 |
|
|
6
by James Westby
Add a README. |
92 |
Now lets define a create a matcher that will match the bold tag from above. |
93 |
||
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
94 |
>>> print soupmatchers.Tag("bold rocks", "b", text="rocks")
|
95 |
Tag("bold rocks", <b ...>rocks ...</b>)
|
|
|
6
by James Westby
Add a README. |
96 |
|
97 |
Obviously this would allow the bold tag to be outside of the anchor tag, but |
|
98 |
no fear, we can create a matcher that will check that one is inside the |
|
|
36
by James Westby
Tweak the TagMismatch.describe() output. |
99 |
other, simply use the Within matcher to combine the two. |
|
6
by James Westby
Add a README. |
100 |
|
|
34
by James Westby
Drop descendents in favour of Within. |
101 |
>>> print soupmatchers.Within( |
102 |
... soupmatchers.Tag( |
|
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
103 |
... "link to testtools", "a", |
104 |
... attrs={"href": "https://launchpad.net/testtools",
|
|
105 |
... "class": "awesome"}, text=re.compile(r"testtools")), |
|
106 |
... soupmatchers.Tag("bold rocks", "b", text="rocks"))
|
|
107 |
Tag("bold rocks", <b ...>rocks ...</b>) within Tag("link to testtools",
|
|
108 |
<a class='awesome' href='https://launchpad.net/testtools' |
|
109 |
...>re.compile('testtools') ...</a>) |
|
|
6
by James Westby
Add a README. |
110 |
|
111 |
this will mean that the first matcher will only match if the second matcher |
|
|
10
by James Westby
Add a test for the corner case in README and then delete it. |
112 |
matches the part of the parse tree rooted at the first match. |
|
6
by James Westby
Add a README. |
113 |
|
114 |
These matchers are working on the parsed representation, but that doesn't |
|
115 |
mean you have to go to the trouble of parsing every time you want to use |
|
116 |
them. To simplify that you can use |
|
117 |
||
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
118 |
>>> print soupmatchers.HTMLContains(soupmatchers.Tag("some image", "image"))
|
119 |
HTML contains [Tag("some image", <image ...>...</image>)] |
|
|
6
by James Westby
Add a README. |
120 |
|
121 |
to create a matcher that will parse the string before checking the tag |
|
122 |
against it. |
|
123 |
||
124 |
Given that you will often want to check multiple things about the HTML |
|
125 |
you can pass multiple soupmatchers.Tag objects to the constructor of |
|
126 |
soupmatchers.HTMLContains, and the resulting matcher will only match |
|
127 |
if all of the passed matchers match. |
|
128 |
||
|
19
by James Westby
Add section titles to README. |
129 |
Using Matchers |
130 |
-------------- |
|
131 |
||
|
6
by James Westby
Add a README. |
132 |
This hasn't explained how to use the matcher objects though, for that you |
133 |
need to make use of their match() method. |
|
134 |
||
|
12
by James Westby
Make the README in to a doctest. |
135 |
>>> import testtools |
136 |
>>> matcher = testtools.matchers.Equals(1) |
|
137 |
>>> match = matcher.match(1) |
|
138 |
>>> print match |
|
139 |
None |
|
|
6
by James Westby
Add a README. |
140 |
|
141 |
the returned match will be None if the matcher matches the content that |
|
142 |
you passed, otherwise it will be a testtools.Mismatch object. To put |
|
143 |
this in unittest language |
|
144 |
||
145 |
match = matcher.match(content) |
|
|
12
by James Westby
Make the README in to a doctest. |
146 |
self.assertEquals(None, match) |
|
6
by James Westby
Add a README. |
147 |
|
148 |
or, if you subclass testtools.TestCase, |
|
149 |
||
150 |
self.assertThat(content, matcher) |
|
151 |
||
|
19
by James Westby
Add section titles to README. |
152 |
Testing Responses |
153 |
----------------- |
|
|
6
by James Westby
Add a README. |
154 |
|
155 |
For those that use a framework that has test response objects, you can even |
|
156 |
go a step further and check the whole response in one go. |
|
157 |
||
158 |
The soupmatchers.ResponseHas matcher class will check the response_code |
|
159 |
attribute of the passed object against an expected value, and also check |
|
|
8
by James Westby
Fix text to do what is expected. |
160 |
the content attribute against any matcher you wish to specify. |
|
6
by James Westby
Add a README. |
161 |
|
|
12
by James Westby
Make the README in to a doctest. |
162 |
>>> print soupmatchers.ResponseHas( |
163 |
... status_code=404, |
|
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
164 |
... content_matches=soupmatchers.HTMLContains(soupmatchers.Tag( |
165 |
... "an anchor", "a"))) |
|
|
33
by James Westby
Make count=None mean match any number, and make that the default. |
166 |
ResponseHas(status_code=404, content_matches=HTML contains |
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
167 |
[Tag("an anchor", <a ...>...</a>)]) |
|
12
by James Westby
Make the README in to a doctest. |
168 |
|
|
6
by James Westby
Add a README. |
169 |
where the status_code parameter defaults to 200. |
170 |
||
|
11
by James Westby
Add a convenience class for ResponseHas that deals with HTML. |
171 |
As working with HTML is very common, there's an easier way to write the |
172 |
above. |
|
173 |
||
|
12
by James Westby
Make the README in to a doctest. |
174 |
>>> print soupmatchers.HTMLResponseHas( |
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
175 |
... status_code=404, html_matches=soupmatchers.Tag("an anchor", "a"))
|
|
17
by James Westby
Don't be precise in Tag.__str__, but try and be more readable. |
176 |
HTMLResponseHas(status_code=404, content_matches=HTML contains |
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
177 |
[Tag("an anchor", <a ...>...</a>)]) |
|
11
by James Westby
Add a convenience class for ResponseHas that deals with HTML. |
178 |
|
179 |
Later similar objects will be added for dealing with XML and JSON. |
|
180 |
||
|
6
by James Westby
Add a README. |
181 |
This matcher is designed to work with Django, but will work with any object |
182 |
that has those two attributes. |
|
|
8
by James Westby
Fix text to do what is expected. |
183 |
|
184 |
Putting it all together we could do the original check using |
|
185 |
||
|
16
by James Westby
Test a mismatch in the README too. |
186 |
>>> class ExpectedResponse(object): |
|
15
by James Westby
Test the final part of the README. |
187 |
... status_code = 200 |
188 |
... content = html |
|
|
16
by James Westby
Test a mismatch in the README too. |
189 |
>>> class UnexpectedResponse(object): |
190 |
... status_code = 200 |
|
191 |
... content = "<h1>This is some other response<h1>" |
|
|
15
by James Westby
Test the final part of the README. |
192 |
|
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
193 |
>>> child_matcher = soupmatchers.Tag("bold rocks", "b", text="rocks")
|
|
15
by James Westby
Test the final part of the README. |
194 |
>>> anchor_matcher = soupmatchers.Tag( |
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
195 |
... "testtools link", "a", |
196 |
... attrs={"href": "https://launchpad.net/testtools",
|
|
197 |
... "class": "awesome"}, |
|
|
34
by James Westby
Drop descendents in favour of Within. |
198 |
... text=re.compile(r"testtools")) |
199 |
>>> combined_matcher = soupmatchers.Within(anchor_matcher, child_matcher) |
|
|
15
by James Westby
Test the final part of the README. |
200 |
>>> response_matcher = soupmatchers.HTMLResponseHas( |
|
34
by James Westby
Drop descendents in favour of Within. |
201 |
... html_matches=combined_matcher) |
|
15
by James Westby
Test the final part of the README. |
202 |
>>> #self.assertThat(response, response_matcher) |
|
16
by James Westby
Test a mismatch in the README too. |
203 |
>>> match = response_matcher.match(ExpectedResponse()) |
|
15
by James Westby
Test the final part of the README. |
204 |
>>> print match |
205 |
None |
|
|
16
by James Westby
Test a mismatch in the README too. |
206 |
>>> match = response_matcher.match(UnexpectedResponse()) |
|
17
by James Westby
Don't be precise in Tag.__str__, but try and be more readable. |
207 |
>>> print repr(match) #doctest: +ELLIPSIS |
|
16
by James Westby
Test a mismatch in the README too. |
208 |
<soupmatchers.TagMismatch object at ...> |
209 |
>>> print match.describe() |
|
|
36
by James Westby
Tweak the TagMismatch.describe() output. |
210 |
Matched 0 times |
|
40
by James Westby
Implement a basic get_extra_info for Within. |
211 |
Here is some information that may be useful: |
|
45
by James Westby
Restructure the layering to push more up in to DocumentPart. |
212 |
0 matches for "bold rocks" in the document. |
|
41
by James Westby
Make close matches work inside a Within. |
213 |
0 matches for "testtools link" in the document. |
|
8
by James Westby
Fix text to do what is expected. |
214 |
|
215 |
which while verbose is checking lots of things, while being maintainable |
|
216 |
due to not being overly tied to particular textual output. |
|
|
27
by James Westby
Docuement the get_details() stuff in the README. |
217 |
|
|
36
by James Westby
Tweak the TagMismatch.describe() output. |
218 |
Checking the number of times a pattern is matched |
219 |
------------------------------------------------- |
|
220 |
||
221 |
Remember how findAll returned a list, and we just assumed that it only found |
|
222 |
one tag in the example? Well, the matchers allow you to not just assume that, |
|
223 |
they allow you to assert that. That means that you can assert that |
|
224 |
a particular tag only occurs once by passing |
|
225 |
||
226 |
count=1 |
|
227 |
||
228 |
in the constructor. |
|
229 |
||
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
230 |
>>> tag_matcher = soupmatchers.Tag("testtools link", "a",
|
|
36
by James Westby
Tweak the TagMismatch.describe() output. |
231 |
... attrs={"href": "https://launchpad.net/testtools"}, count=1)
|
232 |
>>> html_matcher = soupmatchers.HTMLContains(tag_matcher) |
|
233 |
>>> content = '<a href="https://launchpad.net/testtools"></a>' |
|
234 |
>>> match = html_matcher.match(content) |
|
235 |
>>> print match |
|
236 |
None |
|
237 |
>>> match = html_matcher.match(content * 2) |
|
238 |
>>> print match.describe() |
|
239 |
Matched 2 times |
|
240 |
The matches were: |
|
241 |
<a href="https://launchpad.net/testtools"></a> |
|
242 |
<a href="https://launchpad.net/testtools"></a> |
|
243 |
||
244 |
Similarly you can assert that a particular tag isn't present by |
|
245 |
creating a soupmatchers.Tag with |
|
246 |
||
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
247 |
count=0 |
|
36
by James Westby
Tweak the TagMismatch.describe() output. |
248 |
|
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
249 |
>>> tag_matcher = soupmatchers.Tag("testtools link", "a",
|
|
36
by James Westby
Tweak the TagMismatch.describe() output. |
250 |
... attrs={"href": "https://launchpad.net/testtools"}, count=0)
|
251 |
>>> html_matcher = soupmatchers.HTMLContains(tag_matcher) |
|
252 |
>>> content = '<a href="https://launchpad.net/testtools"></a>' |
|
253 |
>>> match = html_matcher.match(content) |
|
254 |
>>> print match.describe() |
|
255 |
Matched 1 time |
|
256 |
The match was: |
|
257 |
<a href="https://launchpad.net/testtools"></a> |
|
258 |
||
259 |
If you wish to assert only that a tag matches at least a given number of |
|
260 |
times, or at most a given number of times, then you will have to propose |
|
261 |
a change to the code to allow that. |
|
262 |
||
|
27
by James Westby
Docuement the get_details() stuff in the README. |
263 |
Failure Messages |
264 |
---------------- |
|
265 |
||
266 |
As Tag only specifies a pattern to match, when something goes wrong it is |
|
267 |
hard to know what information will be useful to someone reading the output. |
|
268 |
||
269 |
A bad thing to do is to print the entire HTML document, as it can often be |
|
270 |
large and so obscure the failure message. Sometimes though looking at |
|
271 |
the HTML is the best way to find the problem. For that reason the Mismatch |
|
272 |
can provide the entire document to you. If you call get_details() on the |
|
273 |
Mismatch you will get a dict that contains the html as the value for |
|
274 |
the "html" key. |
|
275 |
||
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
276 |
>>> matcher = soupmatchers.HTMLContains(soupmatchers.Tag("bold", "b"))
|
|
27
by James Westby
Docuement the get_details() stuff in the README. |
277 |
>>> mismatch = matcher.match("<image></image>")
|
278 |
>>> print mismatch.get_details().keys() |
|
279 |
['html'] |
|
280 |
>>> print ''.join(list(mismatch.get_details()["html"].iter_bytes())) |
|
281 |
<image></image>
|
|
282 |
||
283 |
If you use assertThat then it will automatically call addDetails with this |
|
284 |
information, so it is available to the TestResult. Your test runner can |
|
285 |
then do something useful with this if it likes. |
|
286 |
||
|
28
by James Westby
Print anything that matched in the describe() output. |
287 |
That leaves the question of what to print in the failure message though. |
288 |
||
289 |
If there are any matches at all then you want to see the string that matched. |
|
290 |
This is particularly useful when there are too many matches, but also when |
|
291 |
you expect multiple matches, but less are found then knowing which matched |
|
292 |
can narrow the search. |
|
293 |
||
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
294 |
>>> matcher = soupmatchers.HTMLContains(soupmatchers.Tag( |
295 |
... "no bold", "b", count=0)) |
|
|
28
by James Westby
Print anything that matched in the describe() output. |
296 |
>>> mismatch = matcher.match("<b>rocks</b>")
|
297 |
>>> print mismatch.describe() |
|
|
36
by James Westby
Tweak the TagMismatch.describe() output. |
298 |
Matched 1 time |
|
28
by James Westby
Print anything that matched in the describe() output. |
299 |
The match was: |
300 |
<b>rocks</b> |
|
|
29
by James Westby
Start to look for close matches when there weren't enough matches. |
301 |
|
302 |
If there aren't enough matches then the failure message will attempt to |
|
303 |
tell you about the closest matches, in the hope that one of them gives a |
|
304 |
clue as to the problem. |
|
305 |
||
306 |
>>> matcher = soupmatchers.HTMLContains( |
|
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
307 |
... soupmatchers.Tag("testtools link", "a",
|
|
29
by James Westby
Start to look for close matches when there weren't enough matches. |
308 |
... attrs={"href": "https://launchpad.net/testtools",
|
309 |
... "class": "awesome"})) |
|
310 |
>>> mismatch = matcher.match( |
|
311 |
... "<a href='https://launchpad.net/testtools'></a>") |
|
312 |
>>> print mismatch.describe() |
|
|
36
by James Westby
Tweak the TagMismatch.describe() output. |
313 |
Matched 0 times |
|
37
by James Westby
Move from get_close_matches to get_extra_info, which is more generic. |
314 |
Here is some information that may be useful: |
|
41
by James Westby
Make close matches work inside a Within. |
315 |
1 matches for "testtools link" when attribute class="awesome" is not a |
316 |
requirement. |
|
|
29
by James Westby
Start to look for close matches when there weren't enough matches. |
317 |
|
|
30
by James Westby
Vary based on text when looking for close matches. |
318 |
>>> matcher = soupmatchers.HTMLContains( |
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
319 |
... soupmatchers.Tag("bold rocks", "b", text="rocks"))
|
|
30
by James Westby
Vary based on text when looking for close matches. |
320 |
>>> mismatch = matcher.match( |
321 |
... "<b>is awesome</b>") |
|
322 |
>>> print mismatch.describe() |
|
|
36
by James Westby
Tweak the TagMismatch.describe() output. |
323 |
Matched 0 times |
|
37
by James Westby
Move from get_close_matches to get_extra_info, which is more generic. |
324 |
Here is some information that may be useful: |
|
41
by James Westby
Make close matches work inside a Within. |
325 |
1 matches for "bold rocks" when text="rocks" is not a requirement. |
|
30
by James Westby
Vary based on text when looking for close matches. |
326 |
|
|
29
by James Westby
Start to look for close matches when there weren't enough matches. |
327 |
While this will often fail to tell you much that will help you diagnose the |
328 |
problem it should be possible to write your matchers in such a way that the |
|
329 |
output is generally useful. |
|
|
32
by James Westby
Add Within for matching one tag inside another. |
330 |
|
331 |
Restricting matches to particular areas of the document |
|
332 |
------------------------------------------------------- |
|
333 |
||
334 |
Often you want to assert that some HTML is contained within a particular |
|
335 |
part of the document. At the simplest level you may want to check that |
|
336 |
the HTML is within the <body> tag.
|
|
337 |
||
|
34
by James Westby
Drop descendents in favour of Within. |
338 |
It is possible to specify that some Tag is within another by combining |
339 |
them in the Within matcher. |
|
|
32
by James Westby
Add Within for matching one tag inside another. |
340 |
|
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
341 |
>>> child_matcher = soupmatchers.Tag("bold rocks", "b", text="rocks")
|
342 |
>>> body_matcher = soupmatchers.Tag("the body", "body")
|
|
|
32
by James Westby
Add Within for matching one tag inside another. |
343 |
>>> matcher = soupmatchers.HTMLContains( |
344 |
... soupmatchers.Within(body_matcher, child_matcher)) |
|
345 |
>>> print matcher |
|
|
39
by James Westby
Have the user give us a string to refer to the matcher by. |
346 |
HTML contains [Tag("bold rocks", <b ...>rocks ...</b>) |
347 |
within Tag("the body", <body ...>...</body>)] |
|
|
32
by James Westby
Add Within for matching one tag inside another. |
348 |
>>> mismatch = matcher.match("<b>rocks</b><body></body>")
|
349 |
>>> print mismatch.describe() |
|
|
37
by James Westby
Move from get_close_matches to get_extra_info, which is more generic. |
350 |
Matched 0 times |
|
40
by James Westby
Implement a basic get_extra_info for Within. |
351 |
Here is some information that may be useful: |
|
45
by James Westby
Restructure the layering to push more up in to DocumentPart. |
352 |
1 matches for "bold rocks" in the document. |
|
41
by James Westby
Make close matches work inside a Within. |
353 |
1 matches for "the body" in the document. |