~leonardr/beautifulsoup/bs4

214 by Leonard Richardson
Fixed a bug that made the HTMLParser treebuilder generate XML definitions ending with two question marks instead of one. [bug=984258]
1
Additions
2
---------
3
226 by Leonard Richardson
Removed completed TODO.
4
More of the jQuery API: nextUntil?
214 by Leonard Richardson
Fixed a bug that made the HTMLParser treebuilder generate XML definitions ending with two question marks instead of one. [bug=984258]
5
106 by Leonard Richardson
Cleaned up the TODO.
6
Optimizations
7
-------------
102 by Leonard Richardson
Committed minor changes made while writing docs.
8
173 by Leonard Richardson
Warn when SoupStrainer is used with the html5lib tree builder.
9
The html5lib tree builder doesn't use the standard tree-building API,
210 by Leonard Richardson
Attribute values are now run through the provided output formatter. Previously they were always run through the 'minimal' formatter. [bug=980237]
10
which worries me and has resulted in a number of bugs.
173 by Leonard Richardson
Warn when SoupStrainer is used with the html5lib tree builder.
11
88 by Leonard Richardson
A big patch from Aaron that brings in features from 3.0.8 and makes the code more PEP-8 compliant.
12
markup_attr_map can be optimized since it's always a map now.
106 by Leonard Richardson
Cleaned up the TODO.
13
222 by Leonard Richardson
Fixed a bug in decoding data that contained a byte-order mark, such as data encoded in UTF-16LE. [bug=988980]
14
Upon encountering UTF-16LE data or some other uncommon serialization
15
of Unicode, UnicodeDammit will convert the data to Unicode, then
16
encode it at UTF-8. This is wasteful because it will just get decoded
17
back to Unicode.
18
106 by Leonard Richardson
Cleaned up the TODO.
19
CDATA
20
-----
57.1.8 by Leonard Richardson
Figured out the deal with CDATA sections in lxml and html5lib, and added comments and tests.
21
22
The elementtree XMLParser has a strip_cdata argument that, when set to
23
False, should allow Beautiful Soup to preserve CDATA sections instead
106 by Leonard Richardson
Cleaned up the TODO.
24
of treating them as text. Except it doesn't. (This argument is also
25
present for HTMLParser, and also does nothing there.)
57.1.8 by Leonard Richardson
Figured out the deal with CDATA sections in lxml and html5lib, and added comments and tests.
26
27
Currently, htm5lib converts CDATA sections into comments. An
28
as-yet-unreleased version of html5lib changes the parser's handling of
29
CDATA sections to allow CDATA sections in tags like <svg> and
30
<math>. The HTML5TreeBuilder will need to be updated to create CData
31
objects instead of Comment objects in this situation.