~leonardr/beautifulsoup/bs4 : contents of CHANGELOG at revision 644

~leonardr/beautifulsoup/bs4 : (revision 644)

639 by Leonard Richardson Some cleanup work to get more consistent and complete about what gets packaged with the Beautiful Soup release.	1	Beautiful Soup's official support for Python 2 ended on January 1st,
	2	2021. The final release to support Python 2 was Beautiful Soup
606 by Leonard Richardson Goodbye, Python 2. [bug=1942919]	3	4.9.3. In the Launchpad Bazaar repository, the final revision to support
	4	Python 2 was revision 605.
	5
641 by Leonard Richardson Fixed another crash when overriding multi_valued_attributes and using the	6	= Unreleased
	7
642 by Leonard Richardson Fixed a test failure when cchardet is not installed but	8	* Fixed a test failure when cchardet is not installed but
	9	charset_normalizer is. [bug=1973072]
	10
641 by Leonard Richardson Fixed another crash when overriding multi_valued_attributes and using the	11	* Fixed another crash when overriding multi_valued_attributes and using the
	12	html5lib parser. [bug=1948488]
	13
639 by Leonard Richardson Some cleanup work to get more consistent and complete about what gets packaged with the Beautiful Soup release.	14	= 4.11.1 (20220408)
	15
	16	This release was done to ensure that the unit tests are packaged along
	17	with the released source. There are no functionality changes in this
	18	release, but there are a few other packaging changes:
	19
	20	* The Japanese and Korean translations of the documentation are included.
	21	* The changelog is now packaged as CHANGELOG, and the license file is
	22	packaged as LICENSE. NEWS.txt and COPYING.txt are still present,
	23	but may be removed in the future.
	24	* TODO.txt is no longer packaged, since a TODO is not relevant for released
	25	code.
	26
636 by Leonard Richardson Omit untrusted input when issuing warnings.	27	= 4.11.0 (20220407)
608 by Leonard Richardson Ported unit tests to use pytest.	28
	29	* Ported unit tests to use pytest.
	30
614 by Leonard Richardson Added special string classes, RubyParenthesisString and RubyTextString,	31	* Added special string classes, RubyParenthesisString and RubyTextString,
	32	to make it possible to treat ruby text specially in get_text() calls.
	33	[bug=1941980]
	34
629 by Leonard Richardson It's now possible to customize the way output is indented by	35	* It's now possible to customize the way output is indented by
	36	providing a value for the 'indent' argument to the Formatter
	37	constructor. The 'indent' argument works very similarly to the
	38	argument of the same name in the Python standard library's
630 by Leonard Richardson I guess that's not a method.	39	json.dump() function. [bug=1955497]
629 by Leonard Richardson It's now possible to customize the way output is indented by	40
626 by Leonard Richardson If the charset-normalizer Python module	41	* If the charset-normalizer Python module
	42	(https://pypi.org/project/charset-normalizer/) is installed, Beautiful
	43	Soup will use it to detect the character sets of incoming documents.
	44	This is also the module used by newer versions of the Requests library.
	45	For the sake of backwards compatibility, chardet and cchardet both take
	46	precedence if installed. [bug=1955346]
617 by Leonard Richardson Fixed a crash when overriding multi_valued_attributes and using the	47
618 by Leonard Richardson Added a workaround for an lxml bug (https://bugs.launchpad.net/lxml/+bug/1948551) that caused	48	* Added a workaround for an lxml bug
622 by Leonard Richardson Issue a warning when an HTML parser is used to parse a document that	49	(https://bugs.launchpad.net/lxml/+bug/1948551) that causes
618 by Leonard Richardson Added a workaround for an lxml bug (https://bugs.launchpad.net/lxml/+bug/1948551) that caused	50	problems when parsing a Unicode string beginning with BYTE ORDER MARK.
	51	[bug=1947768]
	52
622 by Leonard Richardson Issue a warning when an HTML parser is used to parse a document that	53	* Issue a warning when an HTML parser is used to parse a document that
	54	looks like XML but not XHTML. [bug=1939121]
	55
624 by Leonard Richardson Do a better job of keeping track of namespaces as an XML document is	56	* Do a better job of keeping track of namespaces as an XML document is
	57	parsed, so that CSS selectors that use namespaces will do the right
	58	thing more often. [bug=1946243]
	59
619 by Leonard Richardson Renamed the 'text' field to 'string' for real. Tests are not changed in this commit to demonstrate that the renaming doesn't break anything. [bug=1947038]	60	* Some time ago, the misleadingly named "text" argument to find-type
	61	methods was renamed to the more accurate "string." But this supposed
	62	"renaming" didn't make it into important places like the method
	63	signatures or the docstrings. That's corrected in this
622 by Leonard Richardson Issue a warning when an HTML parser is used to parse a document that	64	version. "text" still works, but will give a DeprecationWarning.
	65	[bug=1947038]
619 by Leonard Richardson Renamed the 'text' field to 'string' for real. Tests are not changed in this commit to demonstrate that the renaming doesn't break anything. [bug=1947038]	66
626 by Leonard Richardson If the charset-normalizer Python module	67	* Fixed a crash when pickling a BeautifulSoup object that has no
625 by Leonard Richardson Fix a crash when pickling a BeautifulSoup object that has no	68	tree builder. [bug=1934003]
	69
626 by Leonard Richardson If the charset-normalizer Python module	70	* Fixed a crash when overriding multi_valued_attributes and using the
	71	html5lib parser. [bug=1948488]
	72
633 by Leonard Richardson Corrected typo.	73	* Standardized the wording of the MarkupResemblesLocatorWarning
636 by Leonard Richardson Omit untrusted input when issuing warnings.	74	warnings to omit untrusted input and make the warnings less
	75	judgmental about what you ought to be doing. [bug=1955450]
632 by Leonard Richardson Standardized the wording of the MarkupResemblesLocatorWarning	76
627 by Leonard Richardson Removed support for the iconv_codec library, which doesn't seem	77	* Removed support for the iconv_codec library, which doesn't seem
	78	to exist anymore and was never put up on PyPI. (The closest
628 by Leonard Richardson Remove a huge list of HTML entities that was only necessary under Python 2.	79	replacement on PyPI, iconv_codecs, is GPL-licensed, so we can't use
	80	it--it's also quite old.)
627 by Leonard Richardson Removed support for the iconv_codec library, which doesn't seem	81
606 by Leonard Richardson Goodbye, Python 2. [bug=1942919]	82	= 4.10.0 (20210907)
	83
	84	* This is the first release of Beautiful Soup to only support Python
	85	3. I dropped Python 2 support to maintain support for newer versions
	86	(58 and up) of setuptools. See:
	87	https://github.com/pypa/setuptools/issues/2769 [bug=1942919]
602 by Leonard Richardson NavigableString and its subclasses now implement the get_text()	88
600 by Leonard Richardson The behavior of methods like .get_text() and .strings now differs	89	* The behavior of methods like .get_text() and .strings now differs
	90	depending on the type of tag. The change is visible with HTML tags
	91	like <script>, <style>, and <template>. Starting in 4.9.0, methods
	92	like get_text() returned no results on such tags, because the
	93	contents of those tags are not considered 'text' within the document
	94	as a whole.
	95
	96	But a user who calls script.get_text() is working from a different
	97	definition of 'text' than a user who calls div.get_text()--otherwise
	98	there would be no need to call script.get_text() at all. In 4.10.0,
	99	the contents of (e.g.) a <script> tag are considered 'text' during a
	100	get_text() call on the tag itself, but not considered 'text' during
	101	a get_text() call on the tag's parent.
	102
	103	Because of this change, calling get_text() on each child of a tag
	104	may now return a different result than calling get_text() on the tag
	105	itself. That's because different tags now have different
	106	understandings of what counts as 'text'. [bug=1906226] [bug=1868861]
601 by Leonard Richardson The 'html5' formatter now treats attributes whose values are the	107
602 by Leonard Richardson NavigableString and its subclasses now implement the get_text()	108	* NavigableString and its subclasses now implement the get_text()
	109	method, as well as the properties .strings and
	110	.stripped_strings. These methods will either return the string
	111	itself, or nothing, so the only reason to use this is when iterating
	112	over a list of mixed Tag and NavigableString objects. [bug=1904309]
	113
601 by Leonard Richardson The 'html5' formatter now treats attributes whose values are the	114	* The 'html5' formatter now treats attributes whose values are the
	115	empty string as HTML boolean attributes. Previously (and in other
	116	formatters), an attribute value must be set as None to be treated as
	117	a boolean attribute. In a future release, I plan to also give this
	118	behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424]
	119
605 by Leonard Richardson The 'replace_with()' method now takes a variable number of arguments,	120	* The 'replace_with()' method now takes a variable number of arguments,
	121	and can be used to replace a single element with a sequence of elements.
	122	Patch by Bill Chandos. [rev=605]
	123
595 by Leonard Richardson Corrected output when the namespace prefix associated with a	124	* Corrected output when the namespace prefix associated with a
	125	namespaced attribute is the empty string, as opposed to
	126	None. [bug=1915583]
	127
597 by Leonard Richardson Performance improvement when processing tags that speeds up overall	128	* Performance improvement when processing tags that speeds up overall
	129	tree construction by 2%. Patch by Morotti. [bug=1899358]
	130
599 by Leonard Richardson Corrected the use of special string container classes in cases when a	131	* Corrected the use of special string container classes in cases when a
	132	single tag may contain strings with different containers; such as
	133	the <template> tag, which may contain both TemplateString objects
	134	and Comment objects. [bug=1913406]
	135
605 by Leonard Richardson The 'replace_with()' method now takes a variable number of arguments,	136	* The html.parser tree builder can now handle named entities
604 by Leonard Richardson The html.parser tree builder can now handles named entities	137	found in the HTML5 spec in much the same way that the html5lib
605 by Leonard Richardson The 'replace_with()' method now takes a variable number of arguments,	138	tree builder does. Note that the lxml HTML tree builder doesn't handle
	139	named entities this way. [bug=1924908]
604 by Leonard Richardson The html.parser tree builder can now handles named entities	140
598 by Leonard Richardson Added a second way to pass specify encodings to UnicodeDammit and	141	* Added a second way to pass specify encodings to UnicodeDammit and
	142	EncodingDetector, based on the order of precedence defined in the
	143	HTML5 spec, starting at:
	144	https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding
	145
	146	Encodings in 'known_definite_encodings' are tried first, then
	147	byte-order-mark sniffing is run, then encodings in 'user_encodings'
	148	are tried. The old argument, 'override_encodings', is now a
	149	deprecated alias for 'known_definite_encodings'.
	150
	151	This changes the default behavior of the html.parser and lxml tree
	152	builders, in a way that may slightly improve encoding
	153	detection but will probably have no effect. [bug=1889014]
	154
596 by Leonard Richardson Improve the warning issued when a directory name (as opposed to	155	* Improve the warning issued when a directory name (as opposed to
	156	the name of a regular file) is passed as markup into the BeautifulSoup
	157	constructor. [bug=1913628]
	158
592 by Leonard Richardson Prepare for release.	159	= 4.9.3 (20201003)
591 by Leonard Richardson Implemented a significant performance optimization to the process of	160
	161	* Implemented a significant performance optimization to the process of
	162	searching the parse tree. Patch by Morotti. [bug=1898212]
	163
588 by Leonard Richardson Increment version number.	164	= 4.9.2 (20200926)
579 by Leonard Richardson Fixed a bug that caused too many tags to be popped from the tag	165
	166	* Fixed a bug that caused too many tags to be popped from the tag
	167	stack during tree building, when encountering a closing tag that had
	168	no matching opening tag. [bug=1880420]
	169
587 by Leonard Richardson Fixed a bug that inconsistently moved elements over when passing	170	* Fixed a bug that inconsistently moved elements over when passing
	171	a Tag, rather than a list, into Tag.extend(). [bug=1885710]
	172
585 by Leonard Richardson Specify the soupsieve dependency in a way that complies with	173	* Specify the soupsieve dependency in a way that complies with
586 by Leonard Richardson Change the signatures for BeautifulSoup.insert_before and insert_after	174	PEP 508. Patch by Mike Nerone. [bug=1893696]
	175
	176	* Change the signatures for BeautifulSoup.insert_before and insert_after
	177	(which are not implemented) to match PageElement.insert_before and
	178	insert_after, quieting warnings in some IDEs. [bug=1897120]
585 by Leonard Richardson Specify the soupsieve dependency in a way that complies with	179
577 by Leonard Richardson Prep for release.	180	= 4.9.1 (20200517)
568 by Leonard Richardson Fixed test failures when run against soupselect 2.0. Patch by Tomáš	181
573 by Leonard Richardson Added a keyword argument on_duplicate_attribute to the	182	* Added a keyword argument 'on_duplicate_attribute' to the
	183	BeautifulSoupHTMLParser constructor (used by the html.parser tree
	184	builder) which lets you customize the handling of markup that
	185	contains the same attribute more than once, as in:
575 by Leonard Richardson Documented some recently added customization features.	186	<a href="url1" href="url2"> [bug=1878209]
573 by Leonard Richardson Added a keyword argument on_duplicate_attribute to the	187
570 by Leonard Richardson Fixed typo.	188	* Added a distinct subclass, GuessedAtParserWarning, for the warning
569 by Leonard Richardson Added two distinct UserWarning subclasses for warnings issued from the BeautifulSoup constructor which a caller may want to filter out. [bug=1873787]	189	issued when BeautifulSoup is instantiated without a parser being
	190	specified. [bug=1873787]
	191
	192	* Added a distinct subclass, MarkupResemblesLocatorWarning, for the
	193	warning issued when BeautifulSoup is instantiated with 'markup' that
	194	actually seems to be a URL or the path to a file on
	195	disk. [bug=1873787]
	196
568 by Leonard Richardson Fixed test failures when run against soupselect 2.0. Patch by Tomáš	197	* The new NavigableString subclasses (Stylesheet, Script, and
	198	TemplateString) can now be imported directly from the bs4 package.
	199
571 by Leonard Richardson If you encode a document with a Python-specific encoding like	200	* If you encode a document with a Python-specific encoding like
	201	'unicode_escape', that encoding is no longer mentioned in the final
	202	XML or HTML document. Instead, encoding information is omitted or
	203	left blank. [bug=1874955]
	204
568 by Leonard Richardson Fixed test failures when run against soupselect 2.0. Patch by Tomáš	205	* Fixed test failures when run against soupselect 2.0. Patch by Tomáš
	206	Chvátal. [bug=1872279]
	207
564 by Leonard Richardson Embedded CSS and Javascript is now stored in distinct Stylesheet and	208	= 4.9.0 (20200405)
554 by Leonard Richardson API CHANGE - Added PageElement.decomposed, a new property which lets you	209
	210	* Added PageElement.decomposed, a new property which lets you
	211	check whether you've already called decompose() on a Tag or
	212	NavigableString.
553 by Leonard Richardson Fixed an unhandled exception when formatting a Tag that had been decomposed.[bug=1857767]	213
564 by Leonard Richardson Embedded CSS and Javascript is now stored in distinct Stylesheet and	214	* Embedded CSS and Javascript is now stored in distinct Stylesheet and
566 by Leonard Richardson Added a notice about the new behavior of .text to the documentation.	215	Script tags, which are ignored by methods like get_text() since most
	216	people don't consider this sort of content to be 'text'. This
564 by Leonard Richardson Embedded CSS and Javascript is now stored in distinct Stylesheet and	217	feature is not supported by the html5lib treebuilder. [bug=1868861]
	218
561 by Leonard Richardson Added a Russian translation by 'authoress' to the repository.	219	* Added a Russian translation by 'authoress' to the repository.
	220
553 by Leonard Richardson Fixed an unhandled exception when formatting a Tag that had been decomposed.[bug=1857767]	221	* Fixed an unhandled exception when formatting a Tag that had been
	222	decomposed.[bug=1857767]
	223
559 by Leonard Richardson Fixed a bug that happened when passing a Unicode filename containing	224	* Fixed a bug that happened when passing a Unicode filename containing
	225	non-ASCII characters as markup into Beautiful Soup, on a system that
	226	allows Unicode filenames. [bug=1866717]
	227
556 by Leonard Richardson Added a performance optimization to PageElement.extract(). Patch by Arthur Darcet.	228	* Added a performance optimization to PageElement.extract(). Patch by
	229	Arthur Darcet.
	230
544 by Leonard Richardson Wrote docstrings for formatter.py.	231	= 4.8.2 (20191224)
534 by Leonard Richardson Fixed a deprecation warning on Python 3.7. Patch by Colin	232
546 by Leonard Richardson Added docstrings for some but not all tree buidlers.	233	* Added Python docstrings to all public methods of the most commonly
	234	used classes.
540 by Leonard Richardson Added Python docstrings to all public methods in element.py.	235
543 by Leonard Richardson Fixed deprecation warning. [bug=1855301]	236	* Added a Chinese translation by Deron Wang and a Brazilian Portuguese
	237	translation by Cezar Peixeiro to the repository.
	238
	239	* Fixed two deprecation warnings. Patches by Colin
	240	Watson and Nicholas Neumann. [bug=1847592] [bug=1855301]
	241
538 by Leonard Richardson The html.parser tree builder now correctly handles DOCTYPEs that are	242	* The html.parser tree builder now correctly handles DOCTYPEs that are
	243	not uppercase. [bug=1848401]
	244
543 by Leonard Richardson Fixed deprecation warning. [bug=1855301]	245	* PageElement.select() now returns a ResultSet rather than a regular
	246	list, making it consistent with methods like find_all().
540 by Leonard Richardson Added Python docstrings to all public methods in element.py.	247
528 by Leonard Richardson Added section on Python 2 sunsetting.	248	= 4.8.1 (20191006)
515 by Leonard Richardson Adapt Chris Mayo's code to track line number and position when using html.parser.	249
516 by Leonard Richardson Implemented line number tracking for html5lib.	250	* When the html.parser or html5lib parsers are in use, Beautiful Soup
	251	will, by default, record the position in the original document where
	252	each tag was encountered. This includes line number (Tag.sourceline)
	253	and position within a line (Tag.sourcepos). Based on code by Chris
517 by Leonard Richardson Added a section about project support to the README.	254	Mayo. [bug=1742921]
515 by Leonard Richardson Adapt Chris Mayo's code to track line number and position when using html.parser.	255
527 by Leonard Richardson Avoid a crash when unpickling certain parse trees generated using html5lib on Python 3. [bug=1843545]	256	* When instantiating a BeautifulSoup object, it's now possible to
528 by Leonard Richardson Added section on Python 2 sunsetting.	257	provide a dictionary ('element_classes') of the classes you'd like to be
	258	instantiated instead of Tag, NavigableString, etc.
527 by Leonard Richardson Avoid a crash when unpickling certain parse trees generated using html5lib on Python 3. [bug=1843545]	259
524 by Leonard Richardson Fixed the definition of the default XML namespace when using	260	* Fixed the definition of the default XML namespace when using
	261	lxml 4.4. Patch by Isaac Muse. [bug=1840141]
	262
520 by Leonard Richardson Copying a Tag preserves information that was originally obtained from	263	* Fixed a crash when pretty-printing tags that were not created
	264	during initial parsing. [bug=1838903]
	265
	266	* Copying a Tag preserves information that was originally obtained from
	267	the TreeBuilder used to build the original Tag. [bug=1838903]
518 by Leonard Richardson Fixed a crash when pretty-printing tags that were not created	268
526 by Leonard Richardson Avoid a crash when trying to detect the declared encoding of a	269	* Raise an explanatory exception when the underlying parser
	270	completely rejects the incoming markup. [bug=1838877]
	271
	272	* Avoid a crash when trying to detect the declared encoding of a
	273	Unicode document. [bug=1838877]
	274
527 by Leonard Richardson Avoid a crash when unpickling certain parse trees generated using html5lib on Python 3. [bug=1843545]	275	* Avoid a crash when unpickling certain parse trees generated
	276	using html5lib on Python 3. [bug=1843545]
	277
513 by Leonard Richardson Clarified the changelog.	278	= 4.8.0 (20190720, "One Small Soup")
501 by Leonard Richardson It's now possible to customize the TreeBuilder object by passing	279
514 by Leonard Richardson Minor changes to docs and CHANGELOG.	280	This release focuses on making it easier to customize Beautiful Soup's
	281	input mechanism (the TreeBuilder) and output mechanism (the Formatter).
	282
	283	* You can customize the TreeBuilder object by passing keyword
	284	arguments into the BeautifulSoup constructor. Those keyword
	285	arguments will be passed along into the TreeBuilder constructor.
	286
	287	The main reason to do this right now is to change how which
	288	attributes are treated as multi-valued attributes (the way 'class'
	289	is treated by default). You can do this with the
	290	'multi_valued_attributes' argument. [bug=1832978]
511 by Leonard Richardson Added documentation for Tag.smooth().	291
512 by Leonard Richardson Prep for release.	292	* The role of Formatter objects has been greatly expanded. The Formatter
512 by Leonard Richardson Prep for release.	293	class now controls the following:
511 by Leonard Richardson Added documentation for Tag.smooth().	294
	295	- The function to call to perform entity substitution. (This was
	296	previously Formatter's only job.)
	297	- Which tags should be treated as containing CDATA and have their
	298	contents exempt from entity substitution.
	299	- The order in which a tag's attributes are output. [bug=1812422]
	300	- Whether or not to put a '/' inside a void element, e.g. '<br/>' vs '<br>'
	301
	302	All preexisting code should work as before.
	303
	304	* Added a new method to the API, Tag.smooth(), which consolidates
514 by Leonard Richardson Minor changes to docs and CHANGELOG.	305	multiple adjacent NavigableString elements. [bug=1697296]
511 by Leonard Richardson Added documentation for Tag.smooth().	306
514 by Leonard Richardson Minor changes to docs and CHANGELOG.	307	* ' (which is valid in XML, XHTML, and HTML 5, but not HTML 4) is always
511 by Leonard Richardson Added documentation for Tag.smooth().	308	recognized as a named entity and converted to a single quote. [bug=1818721]
504 by Leonard Richardson ' (which is valid in XML and XHTML, but not HTML 4) is now	309
496 by Leonard Richardson Tried even harder to avoid the deprecation warning originally fixed in	310	= 4.7.1 (20190106)
495 by Leonard Richardson Fixed an incorrectly raised exception when inserting a tag before or	311
	312	* Fixed a significant performance problem introduced in 4.7.0. [bug=1810617]
	313
	314	* Fixed an incorrectly raised exception when inserting a tag before or
	315	after an identical tag. [bug=1810692]
	316
	317	* Beautiful Soup will no longer try to keep track of namespaces that
	318	are not defined with a prefix; this can confuse soupselect. [bug=1810680]
	319
496 by Leonard Richardson Tried even harder to avoid the deprecation warning originally fixed in	320	* Tried even harder to avoid the deprecation warning originally fixed in
	321	4.6.1. [bug=1778909]
	322
488 by Leonard Richardson Prep for release.	323	= 4.7.0 (20181231)
477 by Leonard Richardson Merged in next_previous_fixes from Isaac Muse. [bug=1782928,1798699]	324
481 by Leonard Richardson Issue a warning and raise a more useful exception if someone tries to call Tag.select() without SoupSieve installed.	325	* Beautiful Soup's CSS Selector implementation has been replaced by a
	326	dependency on Isaac Muse's SoupSieve project (the soupsieve package
	327	on PyPI). The good news is that SoupSieve has a much more robust and
	328	complete implementation of CSS selectors, resolving a large number
	329	of longstanding issues. The bad news is that from this point onward,
	330	SoupSieve must be installed if you want to use the select() method.
	331
	332	You don't have to change anything lf you installed Beautiful Soup
	333	through pip (SoupSieve will be automatically installed when you
	334	upgrade Beautiful Soup) or if you don't use CSS selectors from
	335	within Beautiful Soup.
	336
	337	SoupSieve documentation: https://facelessuser.github.io/soupsieve/
	338
490 by Leonard Richardson Added information to CHANGELOG I forgot to add earlier.	339	* Added the PageElement.extend() method, which works like list.append().
	340	[bug=1514970]
	341
	342	* PageElement.insert_before() and insert_after() now take a variable
	343	number of arguments. [bug=1514970]
	344
477 by Leonard Richardson Merged in next_previous_fixes from Isaac Muse. [bug=1782928,1798699]	345	* Fix a number of problems with the tree builder that caused
	346	trees that were superficially okay, but which fell apart when bits
483 by Leonard Richardson Merging the linkage checker and html5lib fixes by Isaac Muse found in https://code.launchpad.net/~facelessuser/beautifulsoup/html5lib-fix/+merge/361282. [bug=1809910]	347	were extracted. Patch by Isaac Muse. [bug=1782928,1809910]
477 by Leonard Richardson Merged in next_previous_fixes from Isaac Muse. [bug=1782928,1798699]	348
	349	* Fixed a problem with the tree builder in which elements that
	350	contained no content (such as empty comments and all-whitespace
	351	elements) were not being treated as part of the tree. Patch by Isaac
	352	Muse. [bug=1798699]
	353
484 by Leonard Richardson Fixed a problem with multi-valued attributes where the value	354	* Fixed a problem with multi-valued attributes where the value
	355	contained whitespace. Thanks to Jens Svalgaard for the
	356	fix. [bug=1787453]
	357
482 by Leonard Richardson Clarified the software license.	358	* Clarified ambiguous license statements in the source code. Beautiful
484 by Leonard Richardson Fixed a problem with multi-valued attributes where the value	359	Soup is released under the MIT license, and has been since 4.4.0.
482 by Leonard Richardson Clarified the software license.	360
488 by Leonard Richardson Prep for release.	361	* This file has been renamed from NEWS.txt to CHANGELOG.
488 by Leonard Richardson Prep for release.	362
476 by Leonard Richardson Bump up to version 4.6.3 so I can re-release.	363	= 4.6.3 (20180812)
	364
	365	* Exactly the same as 4.6.2. Re-released to make the README file
	366	render properly on PyPI.
	367
475 by Leonard Richardson Converted README to Markdown format.	368	= 4.6.2 (20180812)
474 by Leonard Richardson Fix an exception when a custom formatter was asked to format a void	369
	370	* Fix an exception when a custom formatter was asked to format a void
	371	element. [bug=1784408]
	372
473 by Leonard Richardson Prep for release.	373	= 4.6.1 (20180728)
451 by Leonard Richardson Improve the warning given when no parser is specified. [bug=1780571]	374
459 by Leonard Richardson Stop data loss when encountering an empty numeric entity, and	375	* Stop data loss when encountering an empty numeric entity, and
	376	possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503]
	377
465 by Leonard Richardson Preserve XML namespaces when they are introduced inside an XML	378	* Preserve XML namespaces introduced inside an XML document, not just
	379	the ones introduced at the top level. [bug=1718787]
	380
466 by Leonard Richardson Fixed a bug where find_all() was not working when asked to find a	381	* Added a new formatter, "html5", which represents void elements
469 by Leonard Richardson Fixed a problem where the html.parser tree builder interpreted	382	as "<element>" rather than "<element/>". [bug=1716272]
	383
	384	* Fixed a problem where the html.parser tree builder interpreted
	385	a string like "&foo " as the character entity "&foo;" [bug=1728706]
466 by Leonard Richardson Fixed a bug where find_all() was not working when asked to find a	386
471 by Leonard Richardson Correctly handle invalid HTML numeric character entities like	387	* Correctly handle invalid HTML numeric character entities like
	388	which reference code points that are not Unicode code points. Note
	389	that this is only fixed when Beautiful Soup is used with the
	390	html.parser parser -- html5lib already worked and I couldn't fix it
	391	with lxml. [bug=1782933]
	392
452 by Leonard Richardson Fixed code that was causing deprecation warnings in recent Python 3	393	* Improved the warning given when no parser is specified. [bug=1780571]
	394
472 by Leonard Richardson When markup contains duplicate elements, a select() call that	395	* When markup contains duplicate elements, a select() call that
	396	includes multiple match clauses will match all relevant
	397	elements. [bug=1770596]
	398
452 by Leonard Richardson Fixed code that was causing deprecation warnings in recent Python 3	399	* Fixed code that was causing deprecation warnings in recent Python 3
	400	versions. Includes a patch from Ville Skyttä. [bug=1778909] [bug=1689496]
451 by Leonard Richardson Improve the warning given when no parser is specified. [bug=1780571]	401
453 by Leonard Richardson Fixed a Windows crash in diagnose() when checking whether a long	402	* Fixed a Windows crash in diagnose() when checking whether a long
	403	markup string is a filename. [bug=1737121]
	404
454 by Leonard Richardson Stopped HTMLParser from raising an exception in very rare cases of	405	* Stopped HTMLParser from raising an exception in very rare cases of
	406	bad markup. [bug=1708831]
	407
466 by Leonard Richardson Fixed a bug where find_all() was not working when asked to find a	408	* Fixed a bug where find_all() was not working when asked to find a
	409	tag with a namespaced name in an XML document that was parsed as
	410	HTML. [bug=1723783]
462 by Leonard Richardson Introduced the Formatter system. [bug=1716272].	411
	412	* You can get finer control over formatting by subclassing
	413	bs4.element.Formatter and passing a Formatter instance into (e.g.)
	414	encode(). [bug=1716272]
461 by Leonard Richardson It's possible for a TreeBuilder subclass to specify that void	415
464 by Leonard Richardson You can pass a dictionary of into	416	* You can pass a dictionary of `attrs` into
	417	BeautifulSoup.new_tag. This makes it possible to create a tag with
	418	an attribute like 'name' that would otherwise be masked by another
	419	argument of new_tag. [bug=1779276]
	420
470 by Leonard Richardson Clarified the deprecation warning when accessing tag.fooTag, to cover	421	* Clarified the deprecation warning when accessing tag.fooTag, to cover
	422	the possibility that you might really have been looking for a tag
	423	called 'fooTag'.
	424
450 by Leonard Richardson Prep for 4.6.0 release.	425	= 4.6.0 (20170507) =
444 by Leonard Richardson Added the method, which acts like for	426
447 by Leonard Richardson Replace get_attribute_text with get_attribute_list.	427	* Added the `Tag.get_attribute_list` method, which acts like `Tag.get` for
	428	getting the value of an attribute, but which always returns a list,
	429	whether or not the attribute is a multi-value attribute. [bug=1678589]
442 by Leonard Richardson It's now possible to use a tag's namespace prefix when searching,	430
443 by Leonard Richardson HTML parsers treat all HTML4 and HTML5 empty element tags (aka void element tags) correctly. [bug=1656909]	431	* It's now possible to use a tag's namespace prefix when searching,
	432	e.g. soup.find('namespace:tag') [bug=1655332]
	433
446 by Leonard Richardson Improved the handling of empty-element tags like <br> when using the	434	* Improved the handling of empty-element tags like <br> when using the
	435	html.parser parser. [bug=1676935]
	436
443 by Leonard Richardson HTML parsers treat all HTML4 and HTML5 empty element tags (aka void element tags) correctly. [bug=1656909]	437	* HTML parsers treat all HTML4 and HTML5 empty element tags (aka void
	438	element tags) correctly. [bug=1656909]
442 by Leonard Richardson It's now possible to use a tag's namespace prefix when searching,	439
449 by Leonard Richardson Namespace prefix is preserved when an XML tag is copied. Thanks	440	* Namespace prefix is preserved when an XML tag is copied. Thanks
	441	to Vikas for a patch and test. [bug=1685172]
	442
439 by Leonard Richardson I need to do another release because of an error while running the release script.	443	= 4.5.3 (20170102) =
434 by Leonard Richardson Fixed yet another problem that caused the html5lib tree builder to	444
436 by Leonard Richardson Fixed foster parenting when html5lib is the tree builder. Thanks to Geoffrey Sneddon for a patch and test.	445	* Fixed foster parenting when html5lib is the tree builder. Thanks to
	446	Geoffrey Sneddon for a patch and test.
439 by Leonard Richardson I need to do another release because of an error while running the release script.	447
434 by Leonard Richardson Fixed yet another problem that caused the html5lib tree builder to	448	* Fixed yet another problem that caused the html5lib tree builder to
	449	create a disconnected parse tree. [bug=1629825]
	450
439 by Leonard Richardson I need to do another release because of an error while running the release script.	451	= 4.5.2 (20170102) =
	452
	453	* Apart from the version number, this release is identical to
	454	4.5.3. Due to user error, it could not be completely uploaded to
	455	PyPI. Use 4.5.3 instead.
	456
430 by Leonard Richardson Bump version number.	457	= 4.5.1 (20160802) =
428 by Leonard Richardson Fixed a reported (but not duplicated) bug involving processing instructions fed into the lxml HTML parser.	458
429 by Leonard Richardson Explained why we test both unicode and bytestring processing instructions.	459	* Fixed a crash when passing Unicode markup that contained a
	460	processing instruction into the lxml HTML parser on Python
	461	3. [bug=1608048]
428 by Leonard Richardson Fixed a reported (but not duplicated) bug involving processing instructions fed into the lxml HTML parser.	462
419 by Leonard Richardson Updated NEWS in preparation for release.	463	= 4.5.0 (20160719) =
	464
	465	* Beautiful Soup is no longer compatible with Python 2.6. This
	466	actually happened a few releases ago, but it's now official.
400 by Leonard Richardson Fixed a Python 3 ByteWarning when a URL was passed in as though it	467
406 by Leonard Richardson Beautiful Soup will now work with versions of html5lib greater than	468	* Beautiful Soup will now work with versions of html5lib greater than
	469	0.99999999. [bug=1603299]
	470
417 by Leonard Richardson If a search against each individual value of a multi-valued	471	* If a search against each individual value of a multi-valued
	472	attribute fails, the search will be run one final time against the
	473	complete attribute value considered as a single string. That is, if
	474	a tag has class="foo bar" and neither "foo" nor "bar" matches, but
	475	"foo bar" does, the tag is now considered a match.
	476
	477	This happened in previous versions, but only when the value being
419 by Leonard Richardson Updated NEWS in preparation for release.	478	searched for was a string. Now it also works when that value is
	479	a regular expression, a list of strings, etc. [bug=1476868]
417 by Leonard Richardson If a search against each individual value of a multi-valued	480
410 by Leonard Richardson Although the previously fixed problem only occurs when using the html5lib tree builder, it's not actually a problem with the tree builder itself.	481	* Fixed a bug that deranged the tree when a whitespace element was
	482	reparented into a tag that contained an identical whitespace
	483	element. [bug=1505351]
409 by Leonard Richardson Fixed a bug in the html5lib treebuilder that deranged the tree	484
415 by Leonard Richardson Added support for CSS selector values that contain quoted spaces,	485	* Added support for CSS selector values that contain quoted spaces,
	486	such as tag[style="display: foo"]. [bug=1540588]
	487
400 by Leonard Richardson Fixed a Python 3 ByteWarning when a URL was passed in as though it	488	* Corrected handling of XML processing instructions. [bug=1504393]
	489
416 by Leonard Richardson Corrected an encoding error that happened when a BeautifulSoup	490	* Corrected an encoding error that happened when a BeautifulSoup
	491	object was copied. [bug=1554439]
	492
401 by Leonard Richardson The contents of <textarea> tags will no longer be modified when the	493	* The contents of <textarea> tags will no longer be modified when the
	494	tree is prettified. [bug=1555829]
	495
411 by Leonard Richardson When a BeautifulSoup object is pickled but its tree builder cannot	496	* When a BeautifulSoup object is pickled but its tree builder cannot
	497	be pickled, its .builder attribute is set to None instead of being
	498	destroyed. This avoids a performance problem once the object is
	499	unpickled. [bug=1523629]
	500
402 by Leonard Richardson Specify the file and line number when warning about a	501	* Specify the file and line number when warning about a
	502	BeautifulSoup object being instantiated without a parser being
	503	specified. [bug=1574647]
	504
414 by Leonard Richardson The argument to now works correctly, though it's	505	* The `limit` argument to `select()` now works correctly, though it's
	506	not implemented very efficiently. [bug=1520530]
	507
400 by Leonard Richardson Fixed a Python 3 ByteWarning when a URL was passed in as though it	508	* Fixed a Python 3 ByteWarning when a URL was passed in as though it
	509	were markup. Thanks to James Salter for a patch and
	510	test. [bug=1533762]
	511
405 by Leonard Richardson We don't run the check for a filename passed in as markup if the	512	* We don't run the check for a filename passed in as markup if the
	513	'filename' contains a less-than character; the less-than character
	514	indicates it's most likely a very small document. [bug=1577864]
	515
392 by Leonard Richardson Fixed a bug that deranged the tree when part of it was	516	= 4.4.1 (20150928) =
390 by Leonard Richardson Fixed the test_detect_utf8 test so that it works when chardet is	517
392 by Leonard Richardson Fixed a bug that deranged the tree when part of it was	518	* Fixed a bug that deranged the tree when part of it was
	519	removed. Thanks to Eric Weiser for the patch and John Wiseman for a
	520	test. [bug=1481520]
	521
395 by Leonard Richardson Fixed a parse bug with the html5lib tree-builder. Thanks to Roel	522	* Fixed a parse bug with the html5lib tree-builder. Thanks to Roel
	523	Kramer for the patch. [bug=1483781]
	524
394 by Leonard Richardson Improved the implementation of CSS selector grouping. Thanks to Orangain for the patch. [bug=1484543]	525	* Improved the implementation of CSS selector grouping. Thanks to
	526	Orangain for the patch. [bug=1484543]
	527
393 by Leonard Richardson Corrected the output of Declaration objects. [bug=1477847]	528	* Fixed the test_detect_utf8 test so that it works when chardet is
	529	installed. [bug=1471359]
	530
	531	* Corrected the output of Declaration objects. [bug=1477847]
	532
394 by Leonard Richardson Improved the implementation of CSS selector grouping. Thanks to Orangain for the patch. [bug=1484543]	533
386 by Leonard Richardson Change setup.py to focus on creating wheels.	534	= 4.4.0 (20150703) =
358 by Leonard Richardson Started using a standard MIT license. [bug=1294662]	535
379 by Leonard Richardson Reorganized changelog.	536	Especially important changes:
	537
	538	* Added a warning when you instantiate a BeautifulSoup object without
	539	explicitly naming a parser. [bug=1398866]
	540
366 by Leonard Richardson In Python 3, __str__ now returns a Unicode string instead	541	* __repr__ now returns an ASCII bytestring in Python 2, and a Unicode
	542	string in Python 3, instead of a UTF8-encoded bytestring in both
	543	versions. In Python 3, __str__ now returns a Unicode string instead
	544	of a bytestring. [bug=1420131]
	545
379 by Leonard Richardson Reorganized changelog.	546	* The `text` argument to the find_* methods is now called `string`,
	547	which is more accurate. `text` still works, but `string` is the
	548	argument described in the documentation. `text` may eventually
	549	change its meaning, but not for a very long time. [bug=1366856]
	550
381 by Leonard Richardson Changed the way soup objects work under copy.copy(). Copying a	551	* Changed the way soup objects work under copy.copy(). Copying a
	552	NavigableString or a Tag will give you a new NavigableString that's
	553	equal to the old one but not connected to the parse tree. Patch by
	554	Martijn Peters. [bug=1307490]
380 by Leonard Richardson Copying a NavigableString will give you a new NavigableString that is not connected to the parse tree.	555
379 by Leonard Richardson Reorganized changelog.	556	* Started using a standard MIT license. [bug=1294662]
	557
	558	* Added a Chinese translation of the documentation by Delong .w.
	559
	560	New features:
	561
371 by Leonard Richardson Introduced the select_one() method, which uses a CSS selector but	562	* Introduced the select_one() method, which uses a CSS selector but
	563	only returns the first match, instead of a list of
	564	matches. [bug=1349367]
	565
376 by Leonard Richardson Raise a NotImplementedError whenever an unsupported CSS pseudoclass	566	* You can now create a Tag object without specifying a
	567	TreeBuilder. Patch by Martijn Pieters. [bug=1307471]
	568
	569	* You can now create a NavigableString or a subclass just by invoking
	570	the constructor. [bug=1294315]
	571
373 by Leonard Richardson Added an exclude_encodings argument to UnicodeDammit and to the	572	* Added an `exclude_encodings` argument to UnicodeDammit and to the
	573	Beautiful Soup constructor, which lets you prohibit the detection of
	574	an encoding that you know is wrong. [bug=1469408]
	575
379 by Leonard Richardson Reorganized changelog.	576	* The select() method now supports selector grouping. Patch by
	577	Francisco Canas [bug=1191917]
	578
	579	Bug fixes:
	580
338 by Leonard Richardson Fixed yet another problem that caused the html5lib tree builder to	581	* Fixed yet another problem that caused the html5lib tree builder to
	582	create a disconnected parse tree. [bug=1237763]
	583
359 by Leonard Richardson Improved docstring for encode_contents() and decode_contents(). [bug=1441543]	584	* Force object_was_parsed() to keep the tree intact even when an element
	585	from later in the document is moved into place. [bug=1430633]
	586
	587	* Fixed yet another bug that caused a disconnected tree when html5lib
	588	copied an element from one part of the tree to another. [bug=1270611]
	589
378 by Leonard Richardson Fixed a bug where Element.extract() could create an infinite loop in	590	* Fixed a bug where Element.extract() could create an infinite loop in
	591	the remaining tree.
	592
352 by Leonard Richardson The select() method can now find tags whose names contain	593	* The select() method can now find tags whose names contain
360 by Leonard Richardson The select() method can now find tags with attributes whose names	594	dashes. Patch by Francisco Canas. [bug=1276211]
	595
	596	* The select() method can now find tags with attributes whose names
	597	contain dashes. Patch by Marek Kapolka. [bug=1304007]
352 by Leonard Richardson The select() method can now find tags whose names contain	598
353 by Leonard Richardson Improved the lxml tree builder's handling of processing	599	* Improved the lxml tree builder's handling of processing
	600	instructions. [bug=1294645]
	601
337 by Leonard Richardson Restored the helpful syntax error that happens when you try to	602	* Restored the helpful syntax error that happens when you try to
	603	import the Python 2 edition of Beautiful Soup under Python
	604	3. [bug=1213387]
	605
347 by Leonard Richardson In Python 3.4 and above, set the new convert_charrefs argument to	606	* In Python 3.4 and above, set the new convert_charrefs argument to
	607	the html.parser constructor to avoid a warning and future
	608	failures. Patch by Stefano Revera. [bug=1375721]
	609
350 by Leonard Richardson The warning when you pass in a filename or URL as markup will now be	610	* The warning when you pass in a filename or URL as markup will now be
	611	displayed correctly even if the filename or URL is a Unicode
	612	string. [bug=1268888]
342 by Leonard Richardson Added a Chinese translation of the documentation by Delong .w.	613
360.1.1 by Leonard Richardson If the initial <html> tag contains a CDATA list attribute such as	614	* If the initial <html> tag contains a CDATA list attribute such as
	615	'class', the html5lib tree builder will now turn its value into a
	616	list, as it would with any other tag. [bug=1296481]
	617
360.1.3 by Leonard Richardson Fixed an import error in Python 3.5 caused by the removal of the	618	* Fixed an import error in Python 3.5 caused by the removal of the
	619	HTMLParseError class. [bug=1420063]
	620
359 by Leonard Richardson Improved docstring for encode_contents() and decode_contents(). [bug=1441543]	621	* Improved docstring for encode_contents() and
	622	decode_contents(). [bug=1441543]
357 by Leonard Richardson Fixed yet another bug that caused a disconnected tree when html5lib	623
364 by Leonard Richardson Fixed a crash in Unicode, Dammit's encoding detector when the name	624	* Fixed a crash in Unicode, Dammit's encoding detector when the name
	625	of the encoding itself contained invalid bytes. [bug=1360913]
	626
367 by Leonard Richardson Improved the exception raised when you call .unwrap() or	627	* Improved the exception raised when you call .unwrap() or
	628	.replace_with() on an element that's not attached to a tree.
	629
376 by Leonard Richardson Raise a NotImplementedError whenever an unsupported CSS pseudoclass	630	* Raise a NotImplementedError whenever an unsupported CSS pseudoclass
	631	is used in select(). Previously some cases did not result in a
	632	NotImplementedError.
368 by Leonard Richardson You can now create a NavigableString or a subclass just by invoking	633
382 by Leonard Richardson It's now possible to pickle a BeautifulSoup object no matter which	634	* It's now possible to pickle a BeautifulSoup object no matter which
	635	tree builder was used to create it. However, the only tree builder
	636	that survives the pickling process is the HTMLParserTreeBuilder
	637	('html.parser'). If you unpickle a BeautifulSoup object created with
	638	some other tree builder, soup.builder will be None. [bug=1231545]
	639
336 by Leonard Richardson Prep for release.	640	= 4.3.2 (20131002) =
331 by Leonard Richardson Combined two tests to stop a spurious test failure when tests are	641
333 by Leonard Richardson Fixed a bug in which short Unicode input was improperly encoded to ASCII when checking whether or not it was a file on	642	* Fixed a bug in which short Unicode input was improperly encoded to
336 by Leonard Richardson Prep for release.	643	ASCII when checking whether or not it was the name of a file on
333 by Leonard Richardson Fixed a bug in which short Unicode input was improperly encoded to ASCII when checking whether or not it was a file on	644	disk. [bug=1227016]
	645
334 by Leonard Richardson Fixed a crash when a short input contains data not valid in	646	* Fixed a crash when a short input contains data not valid in
	647	filenames. [bug=1232604]
	648
335 by Leonard Richardson Fixed a bug that caused Unicode data put into UnicodeDammit to	649	* Fixed a bug that caused Unicode data put into UnicodeDammit to
	650	return None instead of the original data. [bug=1214983]
	651
331 by Leonard Richardson Combined two tests to stop a spurious test failure when tests are	652	* Combined two tests to stop a spurious test failure when tests are
332 by Leonard Richardson Fixed typo.	653	run by nosetests. [bug=1212445]
331 by Leonard Richardson Combined two tests to stop a spurious test failure when tests are	654
329 by Leonard Richardson Updated NEWS.	655	= 4.3.1 (20130815) =
327 by Leonard Richardson * Fixed yet another problem with the html5lib tree builder, caused by	656
	657	* Fixed yet another problem with the html5lib tree builder, caused by
	658	html5lib's tendency to rearrange the tree during
	659	parsing. [bug=1189267]
	660
329 by Leonard Richardson Updated NEWS.	661	* Fixed a bug that caused the optimized version of find_all() to
	662	return nothing. [bug=1212655]
	663
326 by Leonard Richardson Prep for release.	664	= 4.3.0 (20130812) =
305 by Leonard Richardson Merged in big encoding-detection refactoring branch.	665
	666	* Instead of converting incoming data to Unicode and feeding it to the
324 by Leonard Richardson All find_all calls should now return a ResultSet object. Patch by	667	lxml tree builder in chunks, Beautiful Soup now makes successive
	668	guesses at the encoding of the incoming data, and tells lxml to
	669	parse the data as that encoding. Giving lxml more control over the
	670	parsing process improves performance and avoids a number of bugs and
	671	issues with the lxml parser which had previously required elaborate
	672	workarounds:
323 by Leonard Richardson A little cleanup.	673
324 by Leonard Richardson All find_all calls should now return a ResultSet object. Patch by	674	- An issue in which lxml refuses to parse Unicode strings on some
	675	systems. [bug=1180527]
323 by Leonard Richardson A little cleanup.	676
	677	- A returning bug that truncated documents longer than a (very
	678	small) size. [bug=963880]
	679
	680	- A returning bug in which extra spaces were added to a document if
	681	the document defined a charset other than UTF-8. [bug=972466]
305 by Leonard Richardson Merged in big encoding-detection refactoring branch.	682
	683	This required a major overhaul of the tree builder architecture. If
	684	you wrote your own tree builder and didn't tell me, you'll need to
	685	modify your prepare_markup() method.
	686
	687	* The UnicodeDammit code that makes guesses at encodings has been
	688	split into its own class, EncodingDetector. A lot of apparently
	689	redundant code has been removed from Unicode, Dammit, and some
	690	undocumented features have also been removed.
	691
306 by Leonard Richardson Beautiful Soup will issue a warning if instead of markup you pass it	692	* Beautiful Soup will issue a warning if instead of markup you pass it
324 by Leonard Richardson All find_all calls should now return a ResultSet object. Patch by	693	a URL or the name of a file on disk (a common beginner's mistake).
306 by Leonard Richardson Beautiful Soup will issue a warning if instead of markup you pass it	694
317 by Leonard Richardson Added raw html5lib to the list of parsers that get tested.	695	* A number of optimizations improve the performance of the lxml tree
322 by Leonard Richardson Updated NEWS.	696	builder by about 33%, the html.parser tree builder by about 20%, and
322 by Leonard Richardson Updated NEWS.	697	the html5lib tree builder by about 15%.
317 by Leonard Richardson Added raw html5lib to the list of parsers that get tested.	698
324 by Leonard Richardson All find_all calls should now return a ResultSet object. Patch by	699	* All find_all calls should now return a ResultSet object. Patch by
	700	Aaron DeVore. [bug=1194034]
	701
302 by Leonard Richardson Reverted the patch that gives NavigableString a .name property, because that's too big an API change for a bugfix release.	702	= 4.2.1 (20130531) =
295 by Leonard Richardson html5lib now supports Python 3. Fixed some Python 2-specific	703
301 by Leonard Richardson The default XML formatter will now replace ampersands even if they appear to be part of entities. That is, "<" will become "&lt;".[bug=1182183]	704	* The default XML formatter will now replace ampersands even if they
	705	appear to be part of entities. That is, "<" will become
	706	"&lt;". The old code was left over from Beautiful Soup 3, which
	707	didn't always turn entities into Unicode characters.
	708
	709	If you really want the old behavior (maybe because you add new
	710	strings to the tree, those strings include entities, and you want
	711	the formatter to leave them alone on output), it can be found in
	712	EntitySubstitution.substitute_xml_containing_entities(). [bug=1182183]
	713
296 by Leonard Richardson Gave new_string() the ability to create subclasses of	714	* Gave new_string() the ability to create subclasses of
	715	NavigableString. [bug=1181986]
	716
297 by Leonard Richardson Fixed another bug by which the html5lib tree builder could create a	717	* Fixed another bug by which the html5lib tree builder could create a
	718	disconnected tree. [bug=1182089]
	719
299 by Leonard Richardson The .previous_element of a BeautifulSoup object is now always None,	720	* The .previous_element of a BeautifulSoup object is now always None,
	721	not the last element to be parsed. [bug=1182089]
	722
295 by Leonard Richardson html5lib now supports Python 3. Fixed some Python 2-specific	723	* Fixed test failures when lxml is not installed. [bug=1181589]
	724
	725	* html5lib now supports Python 3. Fixed some Python 2-specific
	726	code in the html5lib test suite. [bug=1181624]
	727
303 by Leonard Richardson The html.parser treebuilder can now handle numeric attributes in	728	* The html.parser treebuilder can now handle numeric attributes in
	729	text when the hexidecimal name of the attribute starts with a
	730	capital X. Patch by Tim Shirley. [bug=1186242]
	731
288.1.1 by Leonard Richardson Added a deprecation warning to has_key().	732	= 4.2.0 (20130514) =
272 by Leonard Richardson In an HTML document, the contents of a <script> or <style> tag will	733
282.1.12 by Leonard Richardson Updated news.	734	* The Tag.select() method now supports a much wider variety of CSS
282.1.12 by Leonard Richardson Updated news.	735	selectors.
282.1.11 by Leonard Richardson Moved select() to Tag. It was always an error to call select() on a string, so there's no reason for it to be in PageElement.	736
	737	- Added support for the adjacent sibling combinator (+) and the
	738	general sibling combinator (~). Tests by "liquider". [bug=1082144]
	739
282.1.13 by Leonard Richardson Fixed terminology.	740	- The combinators (>, +, and ~) can now combine with any supported
282.1.12 by Leonard Richardson Updated news.	741	selector, not just one that selects based on tag name.
282.1.12 by Leonard Richardson Updated news.	742
282.1.11 by Leonard Richardson Moved select() to Tag. It was always an error to call select() on a string, so there's no reason for it to be in PageElement.	743	- Added limited support for the "nth-of-type" pseudo-class. Code
	744	by Sven Slootweg. [bug=1109952]
	745
274.1.3 by Leonard Richardson Aliased the BeautifulSoup class to the easier-to-type "_s" and "_soup".	746	* The BeautifulSoup class is now aliased to "_s" and "_soup", making
278 by Leonard Richardson Added support for the "nth-of-type" CSS selector. The CSS selector ">" can now find a tag by means other than the tag name. Code by Sven Slootweg.	747	it quicker to type the import statement in an interactive session:
274.1.3 by Leonard Richardson Aliased the BeautifulSoup class to the easier-to-type "_s" and "_soup".	748
	749	from bs4 import _s
	750	or
	751	from bs4 import _soup
	752
282 by Leonard Richardson Fixed up diagnose() and added it to the docs.	753	The alias may change in the future, so don't use this in code you're
	754	going to run more than once.
	755
	756	* Added the 'diagnose' submodule, which includes several useful
	757	functions for reporting problems and doing tech support.
	758
282.1.11 by Leonard Richardson Moved select() to Tag. It was always an error to call select() on a string, so there's no reason for it to be in PageElement.	759	- diagnose(data) tries the given markup on every installed parser,
282 by Leonard Richardson Fixed up diagnose() and added it to the docs.	760	reporting exceptions and displaying successes. If a parser is not
	761	installed, diagnose() mentions this fact.
	762
282.1.11 by Leonard Richardson Moved select() to Tag. It was always an error to call select() on a string, so there's no reason for it to be in PageElement.	763	- lxml_trace(data, html=True) runs the given markup through lxml's
282 by Leonard Richardson Fixed up diagnose() and added it to the docs.	764	XML parser or HTML parser, and prints out the parser events as
	765	they happen. This helps you quickly determine whether a given
	766	problem occurs in lxml code or Beautiful Soup code.
	767
282.1.11 by Leonard Richardson Moved select() to Tag. It was always an error to call select() on a string, so there's no reason for it to be in PageElement.	768	- htmlparser_trace(data) is the same thing, but for Python's
282 by Leonard Richardson Fixed up diagnose() and added it to the docs.	769	built-in HTMLParser class.
278 by Leonard Richardson Added support for the "nth-of-type" CSS selector. The CSS selector ">" can now find a tag by means other than the tag name. Code by Sven Slootweg.	770
282.1.12 by Leonard Richardson Updated news.	771	* In an HTML document, the contents of a <script> or <style> tag will
	772	no longer undergo entity substitution by default. XML documents work
	773	the same way they did before. [bug=1085953]
	774
	775	* Methods like get_text() and properties like .strings now only give
	776	you strings that are visible in the document--no comments or
	777	processing commands. [bug=1050164]
	778
277 by Leonard Richardson The prettify() method now leaves the contents of <pre> tags	779	* The prettify() method now leaves the contents of <pre> tags
	780	alone. [bug=1095654]
	781
264 by Leonard Richardson Added bug reference.	782	* Fix a bug in the html5lib treebuilder which sometimes created
	783	disconnected trees. [bug=1039527]
	784
265.1.1 by Leonard Richardson Fix a bug in the lxml treebuilder which crashed when a tag included	785	* Fix a bug in the lxml treebuilder which crashed when a tag included
	786	an attribute from the predefined "xml:" namespace. [bug=1065617]
	787
273 by Leonard Richardson Fix a bug by which keyword arguments to find_parent() were not being passed on. [bug=1126734]	788	* Fix a bug by which keyword arguments to find_parent() were not
	789	being passed on. [bug=1126734]
	790
275 by Leonard Richardson Stop a crash when unwisely messing with a tag that's been	791	* Stop a crash when unwisely messing with a tag that's been
	792	decomposed. [bug=1097699]
	793
288.1.1 by Leonard Richardson Added a deprecation warning to has_key().	794	* Now that lxml's segfault on invalid doctype has been fixed, fixed a
274.1.1 by Leonard Richardson Now that lxml's segfault on invalid doctype has been fixed, fix a	795	corresponding problem on the Beautiful Soup end that was previously
	796	invisible. [bug=984936]
	797
279 by Leonard Richardson Fixed an exception when an overspecified CSS selector didn't match	798	* Fixed an exception when an overspecified CSS selector didn't match
	799	anything. Code by Stefaan Lippens. [bug=1168167]
	800
258 by Leonard Richardson Skipped a test under Python 2.6 to avoid a spurious test failure. [bug=1038503]	801	= 4.1.3 (20120820) =
	802
260 by Leonard Richardson Python 3.1 also needs to skip the unicode attribute name test.	803	* Skipped a test under Python 2.6 and Python 3.1 to avoid a spurious
	804	test failure caused by the lousy HTMLParser in those
	805	versions. [bug=1038503]
258 by Leonard Richardson Skipped a test under Python 2.6 to avoid a spurious test failure. [bug=1038503]	806
259 by Leonard Richardson Raise a more specific error (FeatureNotFound) when a requested	807	* Raise a more specific error (FeatureNotFound) when a requested
	808	parser or parser feature is not installed. Raise NotImplementedError
	809	instead of ValueError when the user calls insert_before() or
	810	insert_after() on the BeautifulSoup object itself. Patch by Aaron
	811	Devore. [bug=1038301]
258 by Leonard Richardson Skipped a test under Python 2.6 to avoid a spurious test failure. [bug=1038503]	812
252 by Leonard Richardson Prep for release.	813	= 4.1.2 (20120817) =
245 by Leonard Richardson Use logging.warning() instead of warning.warn() to notify the user that characters were replaced with REPLACEMENT CHARACTER. [bug=1013862]	814
251 by Leonard Richardson As per PEP-8, allow searching by CSS class using the 'class_'	815	* As per PEP-8, allow searching by CSS class using the 'class_'
	816	keyword argument. [bug=1037624]
	817
255 by Leonard Richardson Fixed a crash on encoding when an attribute name contained	818	* Display namespace prefixes for namespaced attribute names, instead of
250 by Leonard Richardson Use namespace prefixes for namespaced attribute names, instead of	819	the fully-qualified names given by the lxml parser. [bug=1037597]
	820
255 by Leonard Richardson Fixed a crash on encoding when an attribute name contained	821	* Fixed a crash on encoding when an attribute name contained
	822	non-ASCII characters.
	823
251 by Leonard Richardson As per PEP-8, allow searching by CSS class using the 'class_'	824	* When sniffing encodings, if the cchardet library is installed,
258 by Leonard Richardson Skipped a test under Python 2.6 to avoid a spurious test failure. [bug=1038503]	825	Beautiful Soup uses it instead of chardet. cchardet is much
251 by Leonard Richardson As per PEP-8, allow searching by CSS class using the 'class_'	826	faster. [bug=1020748]
246 by Leonard Richardson When sniffing encodings, if the cchardet library is installed, use it instead of chardet. It's much faster. [bug=1020748]	827
245 by Leonard Richardson Use logging.warning() instead of warning.warn() to notify the user that characters were replaced with REPLACEMENT CHARACTER. [bug=1013862]	828	* Use logging.warning() instead of warning.warn() to notify the user
	829	that characters were replaced with REPLACEMENT
	830	CHARACTER. [bug=1013862]
	831
243 by Leonard Richardson get_text() now returns an empty Unicode string if there is no text, rather than an empty bytestring. [bug=1020387]	832	= 4.1.1 (20120703) =
239 by Leonard Richardson Fixed an html5lib tree builder crash which happened when html5lib	833
241 by Leonard Richardson Fixed a typo that made parsing much slower than it should have been. [bug=1020268]	834	* Fixed an html5lib tree builder crash which happened when html5lib
243 by Leonard Richardson get_text() now returns an empty Unicode string if there is no text, rather than an empty bytestring. [bug=1020387]	835	moved a tag with a multivalued attribute from one part of the tree
	836	to another. [bug=1019603]
239 by Leonard Richardson Fixed an html5lib tree builder crash which happened when html5lib	837
243 by Leonard Richardson get_text() now returns an empty Unicode string if there is no text, rather than an empty bytestring. [bug=1020387]	838	* Correctly display closing tags with an XML namespace declared. Patch
241 by Leonard Richardson Fixed a typo that made parsing much slower than it should have been. [bug=1020268]	839	by Andreas Kostyrka. [bug=1019635]
	840
	841	* Fixed a typo that made parsing significantly slower than it should
243 by Leonard Richardson get_text() now returns an empty Unicode string if there is no text, rather than an empty bytestring. [bug=1020387]	842	have been, and also waited too long to close tags with XML
	843	namespaces. [bug=1020268]
	844
	845	* get_text() now returns an empty Unicode string if there is no text,
	846	rather than an empty bytestring. [bug=1020387]
241 by Leonard Richardson Fixed a typo that made parsing much slower than it should have been. [bug=1020268]	847
236 by Leonard Richardson Prep for release.	848	= 4.1.0 (20120529) =
228 by Leonard Richardson Added experimental support for fixing Windows-1252 characters embedded in UTF-8 documents.	849
	850	* Added experimental support for fixing Windows-1252 characters
232 by Leonard Richardson Fixed a bug with the lxml treebuilder that prevented the user from adding attributes to a tag that didn't originally have any. [bug=1002378] Thanks to Oliver Beattie for the patch.	851	embedded in UTF-8 documents. (UnicodeDammit.detwingle())
228 by Leonard Richardson Added experimental support for fixing Windows-1252 characters embedded in UTF-8 documents.	852
230 by Leonard Richardson Fixed the handling of " with the built-in parser. [bug=993871]	853	* Fixed the handling of " with the built-in parser. [bug=993871]
	854
231 by Leonard Richardson Comments, processing instructions, document type declarations, and markup declarations are now treated as preformatted strings, the way CData blocks are. [bug=1001025] Also in this commit: renamed detwingle method to detwingle().	855	* Comments, processing instructions, document type declarations, and
	856	markup declarations are now treated as preformatted strings, the way
	857	CData blocks are. [bug=1001025]
	858
232 by Leonard Richardson Fixed a bug with the lxml treebuilder that prevented the user from adding attributes to a tag that didn't originally have any. [bug=1002378] Thanks to Oliver Beattie for the patch.	859	* Fixed a bug with the lxml treebuilder that prevented the user from
	860	adding attributes to a tag that didn't originally have
236 by Leonard Richardson Prep for release.	861	attributes. [bug=1002378] Thanks to Oliver Beattie for the patch.
232 by Leonard Richardson Fixed a bug with the lxml treebuilder that prevented the user from adding attributes to a tag that didn't originally have any. [bug=1002378] Thanks to Oliver Beattie for the patch.	862
233 by Leonard Richardson Fixed some edge-case bugs having to do with inserting an element	863	* Fixed some edge-case bugs having to do with inserting an element
	864	into a tag it's already inside, and replacing one of a tag's
	865	children with another. [bug=997529]
	866
236 by Leonard Richardson Prep for release.	867	* Added the ability to search for attribute values specified in UTF-8. [bug=1003974]
235 by Leonard Richardson Fixed the inability to search for non-ASCII attribute	868
	869	This caused a major refactoring of the search code. All the tests
	870	pass, but it's possible that some searches will behave differently.
234 by Leonard Richardson Fixed the basic failure in [bug=1003974], but not more advanced cases.	871
225 by Leonard Richardson Prep for release.	872	= 4.0.5 (20120427) =
214 by Leonard Richardson Fixed a bug that made the HTMLParser treebuilder generate XML definitions ending with two question marks instead of one. [bug=984258]	873
229 by Leonard Richardson Fixed NEWS.	874	* Added a new method, wrap(), which wraps an element in a tag.
224 by Leonard Richardson Added a new method, wrap().	875
223 by Leonard Richardson Renamed replace_with_children() to the jQuery name, unwrap().	876	* Renamed replace_with_children() to unwrap(), which is easier to
	877	understand and also the jQuery name of the function.
	878
217 by Leonard Richardson Made encoding substitution in <meta> tags completely transparent (no more %SOUP-ENCODING%).	879	* Made encoding substitution in <meta> tags completely transparent (no
	880	more %SOUP-ENCODING%).
	881
222 by Leonard Richardson Fixed a bug in decoding data that contained a byte-order mark, such as data encoded in UTF-16LE. [bug=988980]	882	* Fixed a bug in decoding data that contained a byte-order mark, such
	883	as data encoded in UTF-16LE. [bug=988980]
	884
214 by Leonard Richardson Fixed a bug that made the HTMLParser treebuilder generate XML definitions ending with two question marks instead of one. [bug=984258]	885	* Fixed a bug that made the HTMLParser treebuilder generate XML
	886	definitions ending with two question marks instead of
	887	one. [bug=984258]
	888
221 by Leonard Richardson Upon document generation, CData objects are no longer run through the formatter. [bug=988905]	889	* Upon document generation, CData objects are no longer run through
	890	the formatter. [bug=988905]
	891
220 by Leonard Richardson The test suite now passes when lxml is not installed, whether or not html5lib is installed. [bug=987004]	892	* The test suite now passes when lxml is not installed, whether or not
	893	html5lib is installed. [bug=987004]
	894
215 by Leonard Richardson Print a warning on HTMLParseErrors to let people know they should install an external parser.	895	* Print a warning on HTMLParseErrors to let people know they should
	896	install a better parser library.
	897
213 by Leonard Richardson Prep for release.	898	= 4.0.4 (20120416) =
205 by Leonard Richardson Have objects_was_parsed set the previous element's next_element if possible. [bug=975926]	899
	900	* Fixed a bug that sometimes created disconnected trees.
	901
209 by Leonard Richardson Fixed a bug with the string setter that moved a string around the	902	* Fixed a bug with the string setter that moved a string around the
	903	tree instead of copying it. [bug=983050]
	904
210 by Leonard Richardson Attribute values are now run through the provided output formatter. Previously they were always run through the 'minimal' formatter. [bug=980237]	905	* Attribute values are now run through the provided output formatter.
	906	Previously they were always run through the 'minimal' formatter. In
	907	the future I may make it possible to specify different formatters
	908	for attribute values and strings, but for now, consistent behavior
	909	is better than inconsistent behavior. [bug=980237]
	910
206 by Leonard Richardson Added renderContents back.	911	* Added the missing renderContents method from Beautiful Soup 3. Also
	912	added an encode_contents() method to go along with decode_contents().
	913
208 by Leonard Richardson Give a more useful error when the user tries to run the Python 2 version of BS under Python 3.	914	* Give a more useful error when the user tries to run the Python 2
	915	version of BS under Python 3.
	916
211 by Leonard Richardson Unicode, Dammit now has an option to turn MS smart quotes into ASCII characters.	917	* UnicodeDammit can now convert Microsoft smart quotes to ASCII with
	918	UnicodeDammit(markup, smart_quotes_to="ascii").
	919
204 by Leonard Richardson Prep for release.	920	= 4.0.3 (20120403) =
197 by Leonard Richardson Fixed a typo that caused some versions of Python 3 to convert the Beautiful Soup codebase incorrectly.	921
	922	* Fixed a typo that caused some versions of Python 3 to convert the
	923	Beautiful Soup codebase incorrectly.
	924
203 by Leonard Richardson Got rid of the 4.0.2 workaround for HTML documents--it was unnecessary and the workaround was triggering a (possibly different, but related) bug in lxml. [bug=972466]	925	* Got rid of the 4.0.2 workaround for HTML documents--it was
	926	unnecessary and the workaround was triggering a (possibly different,
	927	but related) bug in lxml. [bug=972466]
	928
196 by Leonard Richardson Prep for release.	929	= 4.0.2 (20120326) =
194 by Leonard Richardson Fixed a bug where specifying 'text' while searching for a tag only worked if 'text' specified an exact string match. [bug=955942]	930
195 by Leonard Richardson Pass data into XMLParser.feed() in chunks. [bug=963880]	931	* Worked around a possible bug in lxml that prevents non-tiny XML
	932	documents from being parsed. [bug=963880, bug=963936]
	933
196 by Leonard Richardson Prep for release.	934	* Fixed a bug where specifying `text` while also searching for a tag
196 by Leonard Richardson Prep for release.	935	only worked if `text` wanted an exact string match. [bug=955942]
194 by Leonard Richardson Fixed a bug where specifying 'text' while searching for a tag only worked if 'text' specified an exact string match. [bug=955942]	936
188 by Leonard Richardson Bumped version number.	937	= 4.0.1 (20120314) =
	938
	939	* This is the first official release of Beautiful Soup 4. There is no
	940	4.0.0 release, to eliminate any possibility that packaging software
	941	might treat "4.0.0" as being an earlier version than "4.0.0b10".
187 by Leonard Richardson Brought the soupselect port up to date.	942
	943	* Brought BS up to date with the latest release of soupselect, adding
	944	CSS selector support for direct descendant matches and multiple CSS
	945	class matches.
	946
185 by Leonard Richardson Fixed a bug that caused calling a tag to sometimes call find_all() with the wrong arguments. [bug=944426]	947	= 4.0.0b10 (20120302) =
179.1.3 by Leonard Richardson Test that CSS selectors work within the tree as well as at the top level.	948
179.1.4 by Leonard Richardson Updated docs.	949	* Added support for simple CSS selectors, taken from the soupselect project.
179.1.3 by Leonard Richardson Test that CSS selectors work within the tree as well as at the top level.	950
185 by Leonard Richardson Fixed a bug that caused calling a tag to sometimes call find_all() with the wrong arguments. [bug=944426]	951	* Fixed a crash when using html5lib. [bug=943246]
	952
182 by Leonard Richardson In HTML5-style <meta charset="foo"> tags, the value of the "charset" attribute is now replaced with the appropriate encoding on output. [bug=942714]	953	* In HTML5-style <meta charset="foo"> tags, the value of the "charset"
185 by Leonard Richardson Fixed a bug that caused calling a tag to sometimes call find_all() with the wrong arguments. [bug=944426]	954	attribute is now replaced with the appropriate encoding on
	955	output. [bug=942714]
	956
	957	* Fixed a bug that caused calling a tag to sometimes call find_all()
	958	with the wrong arguments. [bug=944426]
182 by Leonard Richardson In HTML5-style <meta charset="foo"> tags, the value of the "charset" attribute is now replaced with the appropriate encoding on output. [bug=942714]	959
184 by Leonard Richardson For backwards compatibility, brought back the BeautifulStoneSoup class as a deprecated wrapper around BeautifulSoup.	960	* For backwards compatibility, brought back the BeautifulStoneSoup
	961	class as a deprecated wrapper around BeautifulSoup.
	962
185 by Leonard Richardson Fixed a bug that caused calling a tag to sometimes call find_all() with the wrong arguments. [bug=944426]	963	= 4.0.0b9 (20120228) =
175 by Leonard Richardson Renamed Tag.nsprefix to Tag.prefix, for consistency with NamespacedAttribute.	964
177 by Leonard Richardson Fixed DOCTYPE handling.	965	* Fixed the string representation of DOCTYPEs that have both a public
	966	ID and a system ID.
	967
179 by Leonard Richardson Fixed the generated XML declaration.	968	* Fixed the generated XML declaration.
	969
175 by Leonard Richardson Renamed Tag.nsprefix to Tag.prefix, for consistency with NamespacedAttribute.	970	* Renamed Tag.nsprefix to Tag.prefix, for consistency with
	971	NamespacedAttribute.
	972
421.1.1 by Ville Skyttä Spelling fixes	973	* Fixed a test failure that occurred on Python 3.x when chardet was
176 by Leonard Richardson Fixed a test failure that occured on Python 3.x when chardet was installed.	974	installed.
	975
178 by Leonard Richardson Make prettify() return Unicode by default, so it will look nice when passed into print() under Python 3.	976	* Made prettify() return Unicode by default, so it will look nice on
	977	Python 3 when passed into print().
	978
185 by Leonard Richardson Fixed a bug that caused calling a tag to sometimes call find_all() with the wrong arguments. [bug=944426]	979	= 4.0.0b8 (20120224) =
158.1.10 by Leonard Richardson Bumped version number.	980
158.1.10 by Leonard Richardson Bumped version number.	981	* All tree builders now preserve namespace information in the
174 by Leonard Richardson I keep typing assertEquals.	982	documents they parse. If you use the html5lib parser or lxml's XML
174 by Leonard Richardson I keep typing assertEquals.	983	parser, you can access the namespace URL for a tag as tag.namespace.
158.1.10 by Leonard Richardson Bumped version number.	984
	985	However, there is no special support for namespace-oriented
	986	searching or tree manipulation. When you search the tree, you need
	987	to use namespace prefixes exactly as they're used in the original
	988	document.
	989
158.1.11 by Leonard Richardson Fixed handling of the closing of namespaced tags.	990	* The string representation of a DOCTYPE always ends in a newline.
	991
173 by Leonard Richardson Warn when SoupStrainer is used with the html5lib tree builder.	992	* Issue a warning if the user tries to use a SoupStrainer in
	993	conjunction with the html5lib tree builder, which doesn't support
	994	them.
	995
185 by Leonard Richardson Fixed a bug that caused calling a tag to sometimes call find_all() with the wrong arguments. [bug=944426]	996	= 4.0.0b7 (20120223) =
157 by Leonard Richardson Issue a warning if characters were replaced with REPLACEMENT CHARACTER during Unicode conversion.	997
158 by Leonard Richardson By default, turn unrecognized characters into numeric XML entity refs.	998	* Upon decoding to string, any characters that can't be represented in
	999	your chosen encoding will be converted into numeric XML entity
	1000	references.
	1001
157 by Leonard Richardson Issue a warning if characters were replaced with REPLACEMENT CHARACTER during Unicode conversion.	1002	* Issue a warning if characters were replaced with REPLACEMENT
	1003	CHARACTER during Unicode conversion.
	1004
160 by Leonard Richardson Added code from 2.7's standard library so that the tests will run on Python 2.6.	1005	* Restored compatibility with Python 2.6.
	1006
421.1.1 by Ville Skyttä Spelling fixes	1007	* The install process no longer installs docs or auxiliary text files.
169 by Leonard Richardson It's now possible to copy a BeautifulSoup object created with the html.parser treebuilder.	1008
	1009	* It's now possible to deepcopy a BeautifulSoup object created with
	1010	Python's built-in HTML parser.
	1011
169.1.6 by Leonard Richardson Updated NEWS.	1012	* About 100 unit tests that "test" the behavior of various parsers on
	1013	invalid markup have been removed. Legitimate changes to those
	1014	parsers caused these tests to fail, indicating that perhaps
	1015	Beautiful Soup should not test the behavior of foreign
	1016	libraries.
	1017
	1018	The problematic unit tests have been reformulated as informational
	1019	comparisons generated by the script
	1020	scripts/demonstrate_parser_differences.py.
	1021
	1022	This makes Beautiful Soup compatible with html5lib version 0.95 and
	1023	future versions of HTMLParser.
	1024
185 by Leonard Richardson Fixed a bug that caused calling a tag to sometimes call find_all() with the wrong arguments. [bug=944426]	1025	= 4.0.0b6 (20120216) =
150.1.8 by Leonard Richardson Added to NEWS.	1026
157 by Leonard Richardson Issue a warning if characters were replaced with REPLACEMENT CHARACTER during Unicode conversion.	1027	* Multi-valued attributes like "class" always have a list of values,
	1028	even if there's only one value in the list.
	1029
	1030	* Added a number of multi-valued attributes defined in HTML5.
154 by Leonard Richardson The value of multi-valued attributes like class are always turned into a list, even if there's only one value.	1031
155 by Leonard Richardson Added a kind of hacky way to interpret the restriction class='foo bar'. Stop generating a space before the slash that closes an empty-element tag.	1032	* Stopped generating a space before the slash that closes an
	1033	empty-element tag. This may come back if I add a special XHTML mode
	1034	(http://www.w3.org/TR/xhtml1/#C_2), but right now it's pretty
	1035	useless.
	1036
152 by Leonard Richardson Better defined behavior when the user wants to search for a combination of text and tag-specific arguments. [bug=695312]	1037	* Passing text along with tag-specific arguments to a find* method:
	1038
	1039	find("a", text="Click here")
	1040
	1041	will find tags that contain the given text as their
	1042	.string. Previously, the tag-specific arguments were ignored and
	1043	only strings were searched.
	1044
150.1.8 by Leonard Richardson Added to NEWS.	1045	* Fixed a bug that caused the html5lib tree builder to build a
	1046	partially disconnected tree. Generally cleaned up the html5lib tree
	1047	builder.
	1048
155 by Leonard Richardson Added a kind of hacky way to interpret the restriction class='foo bar'. Stop generating a space before the slash that closes an empty-element tag.	1049	* If you restrict a multi-valued attribute like "class" to a string
	1050	that contains spaces, Beautiful Soup will only consider it a match
	1051	if the values correspond to that specific string.
	1052
149 by Leonard Richardson Bumped version number.	1053	= 4.0.0b5 (20120209) =
138 by Leonard Richardson Rationalized the treatment of multi-valued HTML attributes such as 'class'	1054
	1055	* Rationalized Beautiful Soup's treatment of CSS class. A tag
	1056	belonging to multiple CSS classes is treated as having a list of
	1057	values for the 'class' attribute. Searching for a CSS class will
	1058	match any of the CSS classes.
	1059
	1060	This actually affects all attributes that the HTML standard defines
	1061	as taking multiple values (class, rel, rev, archive, accept-charset,
148 by Leonard Richardson Added bug reference.	1062	and headers), but 'class' is by far the most common. [bug=41034]
138 by Leonard Richardson Rationalized the treatment of multi-valued HTML attributes such as 'class'	1063
	1064	* If you pass anything other than a dictionary as the second argument
	1065	to one of the find* methods, it'll assume you want to use that
	1066	object to search against a tag's CSS classes. Previously this only
	1067	worked if you passed in a string.
	1068
140 by Leonard Richardson Fixed a bug that caused a crash when you passed a dictionary as an attribute value (possibly because you mistyped attrs). [bug=842419]	1069	* Fixed a bug that caused a crash when you passed a dictionary as an
	1070	attribute value (possibly because you mistyped "attrs"). [bug=842419]
	1071
144 by Leonard Richardson Unicode, Dammit now detects the encoding in HTML 5-style <meta> tags like <meta charset="utf-8" />. [bug=837268]	1072	* Unicode, Dammit now detects the encoding in HTML 5-style <meta> tags
	1073	like <meta charset="utf-8" />. [bug=837268]
	1074
146 by Leonard Richardson As a last-ditch attempt to turn data into Unicode, use errors=replace instead of errors=strict.	1075	* If Unicode, Dammit can't figure out a consistent encoding for a
	1076	page, it will try each of its guesses again, with errors="replace"
	1077	instead of errors="strict". This may mean that some data gets
	1078	replaced with REPLACEMENT CHARACTER, but at least most of it will
	1079	get turned into Unicode. [bug=754903]
	1080
145 by Leonard Richardson Patched over a bug in html5lib (?) that was crashing Beautiful Soup on certain kinds of markup. [bug=838800]	1081	* Patched over a bug in html5lib (?) that was crashing Beautiful Soup
	1082	on certain kinds of markup. [bug=838800]
	1083
141 by Leonard Richardson Fixed a bug that wrecked the tree if you replaced an element with an empty string. [bug=728697]	1084	* Fixed a bug that wrecked the tree if you replaced an element with an
	1085	empty string. [bug=728697]
	1086
142 by Leonard Richardson Improved Unicode, Dammit's behavior when you give it Unicode to begin with.	1087	* Improved Unicode, Dammit's behavior when you give it Unicode to
	1088	begin with.
	1089
134 by Leonard Richardson Moved the historical changelog into NEWS.	1090	= 4.0.0b4 (20120208) =
131 by Leonard Richardson Moved around a bunch of metadata.	1091
	1092	* Added BeautifulSoup.new_string() to go along with BeautifulSoup.new_tag()
	1093
	1094	* BeautifulSoup.new_tag() will follow the rules of whatever
	1095	tree-builder was used to create the original BeautifulSoup object. A
	1096	new <p> tag will look like "<p />" if the soup object was created to
	1097	parse XML, but it will look like "<p></p>" if the soup object was
	1098	created to parse HTML.
	1099
	1100	* We pass in strict=False to html.parser on Python 3, greatly
	1101	improving html.parser's ability to handle bad HTML.
	1102
	1103	* We also monkeypatch a serious bug in html.parser that made
	1104	strict=False disastrous on Python 3.2.2.
	1105
	1106	* Replaced the "substitute_html_entities" argument with the
133 by Leonard Richardson Added more detail to the NEWS.	1107	more general "formatter" argument.
131 by Leonard Richardson Moved around a bunch of metadata.	1108
	1109	* Bare ampersands and angle brackets are always converted to XML
	1110	entities unless the user prevents it.
	1111
133 by Leonard Richardson Added more detail to the NEWS.	1112	* Added PageElement.insert_before() and PageElement.insert_after(),
	1113	which let you put an element into the parse tree with respect to
	1114	some other element.
131 by Leonard Richardson Moved around a bunch of metadata.	1115
	1116	* Raise an exception when the user tries to do something nonsensical
	1117	like insert a tag into itself.
	1118
122 by Leonard Richardson Documented today's changes.	1119
134 by Leonard Richardson Moved the historical changelog into NEWS.	1120	= 4.0.0b3 (20120203) =
126 by Leonard Richardson Package the docs with the code.	1121
	1122	Beautiful Soup 4 is a nearly-complete rewrite that removes Beautiful
	1123	Soup's custom HTML parser in favor of a system that lets you write a
	1124	little glue code and plug in any HTML or XML parser you want.
	1125
	1126	Beautiful Soup 4.0 comes with glue code for four parsers:
	1127
	1128	* Python's standard HTMLParser (html.parser in Python 3)
	1129	* lxml's HTML and XML parsers
	1130	* html5lib's HTML parser
	1131
	1132	HTMLParser is the default, but I recommend you install lxml if you
	1133	can.
	1134
	1135	For complete documentation, see the Sphinx documentation in
	1136	bs4/doc/source/. What follows is a summary of the changes from
	1137	Beautiful Soup 3.
	1138
	1139	=== The module name has changed ===
	1140
	1141	Previously you imported the BeautifulSoup class from a module also
	1142	called BeautifulSoup. To save keystrokes and make it clear which
	1143	version of the API is in use, the module is now called 'bs4':
	1144
	1145	>>> from bs4 import BeautifulSoup
	1146
	1147	=== It works with Python 3 ===
	1148
	1149	Beautiful Soup 3.1.0 worked with Python 3, but the parser it used was
	1150	so bad that it barely worked at all. Beautiful Soup 4 works with
	1151	Python 3, and since its parser is pluggable, you don't sacrifice
	1152	quality.
	1153
	1154	Special thanks to Thomas Kluyver and Ezio Melotti for getting Python 3
	1155	support to the finish line. Ezio Melotti is also to thank for greatly
	1156	improving the HTML parser that comes with Python 3.2.
	1157
	1158	=== CDATA sections are normal text, if they're understood at all. ===
	1159
	1160	Currently, the lxml and html5lib HTML parsers ignore CDATA sections in
	1161	markup:
	1162
	1163	<p><![CDATA[foo]]></p> => <p></p>
	1164
	1165	A future version of html5lib will turn CDATA sections into text nodes,
	1166	but only within tags like <svg> and <math>:
	1167
	1168	<svg><![CDATA[foo]]></svg> => <p>foo</p>
	1169
	1170	The default XML parser (which uses lxml behind the scenes) turns CDATA
	1171	sections into ordinary text elements:
	1172
	1173	<p><![CDATA[foo]]></p> => <p>foo</p>
	1174
	1175	In theory it's possible to preserve the CDATA sections when using the
	1176	XML parser, but I don't see how to get it to work in practice.
	1177
	1178	=== Miscellaneous other stuff ===
	1179
	1180	If the BeautifulSoup instance has .is_xml set to True, an appropriate
	1181	XML declaration will be emitted when the tree is transformed into a
	1182	string:
	1183
	1184	<?xml version="1.0" encoding="utf-8">
1185	<markup>
1186	...
1187	</markup>
1188
1189	The ['lxml', 'xml'] tree builder sets .is_xml to True; the other tree
1190	builders set it to False. If you want to parse XHTML with an HTML
1191	parser, you can set it manually.
1192
75.1.4 by Leonard Richardson Emit an XML declaration when appropriate.	1193
92 by Leonard Richardson Prep for beta release.	1194	= 3.2.0 =
	1195
	1196	The 3.1 series wasn't very useful, so I renamed the 3.0 series to 3.2
	1197	to make it obvious which one you should use.
	1198
1 by Leonard Richardson Initial (manual) import.	1199	= 3.1.0 =
	1200
	1201	A hybrid version that supports 2.4 and can be automatically converted
	1202	to run under Python 3.0. There are three backwards-incompatible
	1203	changes you should be aware of, but no new features or deliberate
	1204	behavior changes.
	1205
	1206	1. str() may no longer do what you want. This is because the meaning
	1207	of str() inverts between Python 2 and 3; in Python 2 it gives you a
	1208	byte string, in Python 3 it gives you a Unicode string.
	1209
	1210	The effect of this is that you can't pass an encoding to .__str__
	1211	anymore. Use encode() to get a string and decode() to get Unicode, and
	1212	you'll be ready (well, readier) for Python 3.
	1213
	1214	2. Beautiful Soup is now based on HTMLParser rather than SGMLParser,
	1215	which is gone in Python 3. There's some bad HTML that SGMLParser
	1216	handled but HTMLParser doesn't, usually to do with attribute values
	1217	that aren't closed or have brackets inside them:
	1218
	1219	<a href="foo</a>, </a><a href="bar">baz</a>
	1220	<a b="<a>">', '<a b="<a>"></a><a>"></a>
	1221
	1222	A later version of Beautiful Soup will allow you to plug in different
	1223	parsers to make tradeoffs between speed and the ability to handle bad
	1224	HTML.
	1225
87.1.3 by Aaron DeVore Changelog for attribute renames	1226	3. In Python 3 (but not Python 2), HTMLParser converts entities within
1 by Leonard Richardson Initial (manual) import.	1227	attributes to the corresponding Unicode characters. In Python 2 it's
	1228	possible to parse this string and leave the é intact.
	1229
	1230	<a href="http://crummy.com?sacré&bleu">
	1231
	1232	In Python 3, the é is always converted to \xe9 during
	1233	parsing.
	1234
	1235
	1236	= 3.0.7a =
	1237
	1238	Added an import that makes BS work in Python 2.3.
	1239
	1240
	1241	= 3.0.7 =
	1242
	1243	Fixed a UnicodeDecodeError when unpickling documents that contain
	1244	non-ASCII characters.
	1245
421.1.1 by Ville Skyttä Spelling fixes	1246	Fixed a TypeError that occurred in some circumstances when a tag
1 by Leonard Richardson Initial (manual) import.	1247	contained no text.
	1248
	1249	Jump through hoops to avoid the use of chardet, which can be extremely
	1250	slow in some circumstances. UTF-8 documents should never trigger the
	1251	use of chardet.
	1252
	1253	Whitespace is preserved inside <pre> and <textarea> tags that contain
	1254	nothing but whitespace.
	1255
	1256	Beautiful Soup can now parse a doctype that's scoped to an XML namespace.
	1257
	1258
	1259	= 3.0.6 =
	1260
	1261	Got rid of a very old debug line that prevented chardet from working.
	1262
	1263	Added a Tag.decompose() method that completely disconnects a tree or a
	1264	subset of a tree, breaking it up into bite-sized pieces that are
	1265	easy for the garbage collecter to collect.
	1266
	1267	Tag.extract() now returns the tag that was extracted.
	1268
	1269	Tag.findNext() now does something with the keyword arguments you pass
	1270	it instead of dropping them on the floor.
	1271
	1272	Fixed a Unicode conversion bug.
	1273
	1274	Fixed a bug that garbled some <meta> tags when rewriting them.
	1275
	1276
	1277	= 3.0.5 =
	1278
	1279	Soup objects can now be pickled, and copied with copy.deepcopy.
	1280
	1281	Tag.append now works properly on existing BS objects. (It wasn't
	1282	originally intended for outside use, but it can be now.) (Giles
	1283	Radford)
	1284
	1285	Passing in a nonexistent encoding will no longer crash the parser on
	1286	Python 2.4 (John Nagle).
	1287
	1288	Fixed an underlying bug in SGMLParser that thinks ASCII has 255
	1289	characters instead of 127 (John Nagle).
	1290
	1291	Entities are converted more consistently to Unicode characters.
	1292
	1293	Entity references in attribute values are now converted to Unicode
	1294	characters when appropriate. Numeric entities are always converted,
	1295	because SGMLParser always converts them outside of attribute values.
	1296
	1297	ALL_ENTITIES happens to just be the XHTML entities, so I renamed it to
	1298	XHTML_ENTITIES.
	1299
	1300	The regular expression for bare ampersands was too loose. In some
	1301	cases ampersands were not being escaped. (Sam Ruby?)
	1302
	1303	Non-breaking spaces and other special Unicode space characters are no
	1304	longer folded to ASCII spaces. (Robert Leftwich)
	1305
	1306	Information inside a TEXTAREA tag is now parsed literally, not as HTML
	1307	tags. TEXTAREA now works exactly the same way as SCRIPT. (Zephyr Fang)
	1308
	1309	= 3.0.4 =
	1310
1311	Fixed a bug that crashed Unicode conversion in some cases.
1312
1313	Fixed a bug that prevented UnicodeDammit from being used as a
1314	general-purpose data scrubber.
1315
1316	Fixed some unit test failures when running against Python 2.5.
1317
1318	When considering whether to convert smart quotes, UnicodeDammit now
1319	looks at the original encoding in a case-insensitive way.
134 by Leonard Richardson Moved the historical changelog into NEWS.	1320
	1321	= 3.0.3 (20060606) =
	1322
	1323	Beautiful Soup is now usable as a way to clean up invalid XML/HTML (be
	1324	sure to pass in an appropriate value for convertEntities, or XML/HTML
	1325	entities might stick around that aren't valid in HTML/XML). The result
	1326	may not validate, but it should be good enough to not choke a
	1327	real-world XML parser. Specifically, the output of a properly
	1328	constructed soup object should always be valid as part of an XML
	1329	document, but parts may be missing if they were missing in the
	1330	original. As always, if the input is valid XML, the output will also
	1331	be valid.
	1332
	1333	= 3.0.2 (20060602) =
	1334
	1335	Previously, Beautiful Soup correctly handled attribute values that
	1336	contained embedded quotes (sometimes by escaping), but not other kinds
	1337	of XML character. Now, it correctly handles or escapes all special XML
	1338	characters in attribute values.
	1339
	1340	I aliased methods to the 2.x names (fetch, find, findText, etc.) for
	1341	backwards compatibility purposes. Those names are deprecated and if I
	1342	ever do a 4.0 I will remove them. I will, I tell you!
	1343
	1344	Fixed a bug where the findAll method wasn't passing along any keyword
	1345	arguments.
	1346
	1347	When run from the command line, Beautiful Soup now acts as an HTML
	1348	pretty-printer, not an XML pretty-printer.
	1349
	1350	= 3.0.1 (20060530) =
	1351
	1352	Reintroduced the "fetch by CSS class" shortcut. I thought keyword
	1353	arguments would replace it, but they don't. You can't call soup('a',
	1354	class='foo') because class is a Python keyword.
	1355
	1356	If Beautiful Soup encounters a meta tag that declares the encoding,
	1357	but a SoupStrainer tells it not to parse that tag, Beautiful Soup will
	1358	no longer try to rewrite the meta tag to mention the new
	1359	encoding. Basically, this makes SoupStrainers work in real-world
	1360	applications instead of crashing the parser.
	1361
	1362	= 3.0.0 "Who would not give all else for two p" (20060528) =
	1363
	1364	This release is not backward-compatible with previous releases. If
	1365	you've got code written with a previous version of the library, go
	1366	ahead and keep using it, unless one of the features mentioned here
	1367	really makes your life easier. Since the library is self-contained,
	1368	you can include an old copy of the library in your old applications,
	1369	and use the new version for everything else.
	1370
	1371	The documentation has been rewritten and greatly expanded with many
	1372	more examples.
	1373
	1374	Beautiful Soup autodetects the encoding of a document (or uses the one
	1375	you specify), and converts it from its native encoding to
	1376	Unicode. Internally, it only deals with Unicode strings. When you
	1377	print out the document, it converts to UTF-8 (or another encoding you
	1378	specify). [Doc reference]
	1379
	1380	It's now easy to make large-scale changes to the parse tree without
	1381	screwing up the navigation members. The methods are extract,
	1382	replaceWith, and insert. [Doc reference. See also Improving Memory
	1383	Usage with extract]
1384
1385	Passing True in as an attribute value gives you tags that have any
1386	value for that attribute. You don't have to create a regular
1387	expression. Passing None for an attribute value gives you tags that
1388	don't have that attribute at all.
1389
1390	Tag objects now know whether or not they're self-closing. This avoids
1391	the problem where Beautiful Soup thought that tags like <BR /> were
1392	self-closing even in XML documents. You can customize the self-closing
1393	tags for a parser object by passing them in as a list of
1394	selfClosingTags: you don't have to subclass anymore.
1395
1396	There's a new built-in parser, MinimalSoup, which has most of
1397	BeautifulSoup's HTML-specific rules, but no tag nesting rules. [Doc
1398	reference]
1399
1400	You can use a SoupStrainer to tell Beautiful Soup to parse only part
1401	of a document. This saves time and memory, often making Beautiful Soup
1402	about as fast as a custom-built SGMLParser subclass. [Doc reference,
1403	SoupStrainer reference]
1404
1405	You can (usually) use keyword arguments instead of passing a
1406	dictionary of attributes to a search method. That is, you can replace
1407	soup(args={"id" : "5"}) with soup(id="5"). You can still use args if
1408	(for instance) you need to find an attribute whose name clashes with
1409	the name of an argument to findAll. [Doc reference: **kwargs attrs]
1410
1411	The method names have changed to the better method names used in
1412	Rubyful Soup. Instead of find methods and fetch methods, there are
1413	only find methods. Instead of a scheme where you can't remember which
1414	method finds one element and which one finds them all, we have find
1415	and findAll. In general, if the method name mentions All or a plural
1416	noun (eg. findNextSiblings), then it finds many elements
1417	method. Otherwise, it only finds one element. [Doc reference]
1418
1419	Some of the argument names have been renamed for clarity. For instance
1420	avoidParserProblems is now parserMassage.
1421
1422	Beautiful Soup no longer implements a feed method. You need to pass a
1423	string or a filehandle into the soup constructor, not with feed after
1424	the soup has been created. There is still a feed method, but it's the
1425	feed method implemented by SGMLParser and calling it will bypass
1426	Beautiful Soup and cause problems.
1427
1428	The NavigableText class has been renamed to NavigableString. There is
1429	no NavigableUnicodeString anymore, because every string inside a
1430	Beautiful Soup parse tree is a Unicode string.
1431
1432	findText and fetchText are gone. Just pass a text argument into find
1433	or findAll.
1434
1435	Null was more trouble than it was worth, so I got rid of it. Anything
1436	that used to return Null now returns None.
1437
1438	Special XML constructs like comments and CDATA now have their own
1439	NavigableString subclasses, instead of being treated as oddly-formed
1440	data. If you parse a document that contains CDATA and write it back
1441	out, the CDATA will still be there.
1442
1443	When you're parsing a document, you can get Beautiful Soup to convert
1444	XML or HTML entities into the corresponding Unicode characters. [Doc
1445	reference]
1446
1447	= 2.1.1 (20050918) =
1448
1449	Fixed a serious performance bug in BeautifulStoneSoup which was
1450	causing parsing to be incredibly slow.
1451
1452	Corrected several entities that were previously being incorrectly
1453	translated from Microsoft smart-quote-like characters.
1454
1455	Fixed a bug that was breaking text fetch.
1456
1457	Fixed a bug that crashed the parser when text chunks that look like
1458	HTML tag names showed up within a SCRIPT tag.
1459
1460	THEAD, TBODY, and TFOOT tags are now nestable within TABLE
1461	tags. Nested tables should parse more sensibly now.
1462
1463	BASE is now considered a self-closing tag.
1464
1465	= 2.1.0 "Game, or any other dish?" (20050504) =
1466
1467	Added a wide variety of new search methods which, given a starting
1468	point inside the tree, follow a particular navigation member (like
1469	nextSibling) over and over again, looking for Tag and NavigableText
1470	objects that match certain criteria. The new methods are findNext,
1471	fetchNext, findPrevious, fetchPrevious, findNextSibling,
1472	fetchNextSiblings, findPreviousSibling, fetchPreviousSiblings,
1473	findParent, and fetchParents. All of these use the same basic code
1474	used by first and fetch, so you can pass your weird ways of matching
1475	things into these methods.
1476
1477	The fetch method and its derivatives now accept a limit argument.
1478
1479	You can now pass keyword arguments when calling a Tag object as though
1480	it were a method.
1481
1482	Fixed a bug that caused all hand-created tags to share a single set of
1483	attributes.
1484
1485	= 2.0.3 (20050501) =
1486
1487	Fixed Python 2.2 support for iterators.
1488
1489	Fixed a bug that gave the wrong representation to tags within quote
1490	tags like <script>.
1491
1492	Took some code from Mark Pilgrim that treats CDATA declarations as
1493	data instead of ignoring them.
1494
1495	Beautiful Soup's setup.py will now do an install even if the unit
1496	tests fail. It won't build a source distribution if the unit tests
1497	fail, so I can't release a new version unless they pass.
1498
1499	= 2.0.2 (20050416) =
1500
1501	Added the unit tests in a separate module, and packaged it with
1502	distutils.
1503
1504	Fixed a bug that sometimes caused renderContents() to return a Unicode
1505	string even if there was no Unicode in the original string.
1506
1507	Added the done() method, which closes all of the parser's open
1508	tags. It gets called automatically when you pass in some text to the
1509	constructor of a parser class; otherwise you must call it yourself.
1510
1511	Reinstated some backwards compatibility with 1.x versions: referencing
1512	the string member of a NavigableText object returns the NavigableText
1513	object instead of throwing an error.
1514
1515	= 2.0.1 (20050412) =
1516
1517	Fixed a bug that caused bad results when you tried to reference a tag
1518	name shorter than 3 characters as a member of a Tag, eg. tag.table.td.
1519
1520	Made sure all Tags have the 'hidden' attribute so that an attempt to
1521	access tag.hidden doesn't spawn an attempt to find a tag named
1522	'hidden'.
1523
1524	Fixed a bug in the comparison operator.
1525
1526	= 2.0.0 "Who cares for fish?" (20050410)
1527
1528	Beautiful Soup version 1 was very useful but also pretty stupid. I
1529	originally wrote it without noticing any of the problems inherent in
1530	trying to build a parse tree out of ambiguous HTML tags. This version
1531	solves all of those problems to my satisfaction. It also adds many new
1532	clever things to make up for the removal of the stupid things.
1533
1534	== Parsing ==
1535
1536	The parser logic has been greatly improved, and the BeautifulSoup
1537	class should much more reliably yield a parse tree that looks like
1538	what the page author intended. For a particular class of odd edge
1539	cases that now causes problems, there is a new class,
1540	ICantBelieveItsBeautifulSoup.
1541
1542	By default, Beautiful Soup now performs some cleanup operations on
1543	text before parsing it. This is to avoid common problems with bad
1544	definitions and self-closing tags that crash SGMLParser. You can
1545	provide your own set of cleanup operations, or turn it off
1546	altogether. The cleanup operations include fixing self-closing tags
1547	that don't close, and replacing Microsoft smart quotes and similar
1548	characters with their HTML entity equivalents.
1549
1550	You can now get a pretty-print version of parsed HTML to get a visual
1551	picture of how Beautiful Soup parses it, with the Tag.prettify()
1552	method.
1553
1554	== Strings and Unicode ==
1555
1556	There are separate NavigableText subclasses for ASCII and Unicode
1557	strings. These classes directly subclass the corresponding base data
1558	types. This means you can treat NavigableText objects as strings
1559	instead of having to call methods on them to get the strings.
1560
1561	str() on a Tag always returns a string, and unicode() always returns
1562	Unicode. Previously it was inconsistent.
1563
1564	== Tree traversal ==
1565
1566	In a first() or fetch() call, the tag name or the desired value of an
1567	attribute can now be any of the following:
1568
1569	* A string (matches that specific tag or that specific attribute value)
1570	* A list of strings (matches any tag or attribute value in the list)
1571	* A compiled regular expression object (matches any tag or attribute
1572	value that matches the regular expression)
1573	* A callable object that takes the Tag object or attribute value as a
1574	string. It returns None/false/empty string if the given string
1575	doesn't match, and any other value if it does.
1576
1577	This is much easier to use than SQL-style wildcards (see, regular
1578	expressions are good for something). Because of this, I took out
1579	SQL-style wildcards. I'll put them back if someone complains, but
1580	their removal simplifies the code a lot.
1581
1582	You can use fetch() and first() to search for text in the parse tree,
1583	not just tags. There are new alias methods fetchText() and firstText()
1584	designed for this purpose. As with searching for tags, you can pass in
1585	a string, a regular expression object, or a method to match your text.
1586
1587	If you pass in something besides a map to the attrs argument of
1588	fetch() or first(), Beautiful Soup will assume you want to match that
1589	thing against the "class" attribute. When you're scraping
1590	well-structured HTML, this makes your code a lot cleaner.
1591
1592	1.x and 2.x both let you call a Tag object as a shorthand for
1593	fetch(). For instance, foo("bar") is a shorthand for
1594	foo.fetch("bar"). In 2.x, you can also access a specially-named member
1595	of a Tag object as a shorthand for first(). For instance, foo.barTag
1596	is a shorthand for foo.first("bar"). By chaining these shortcuts you
1597	traverse a tree in very little code: for header in
1598	soup.bodyTag.pTag.tableTag('th'):
1599
1600	If an element relationship (like parent or next) doesn't apply to a
1601	tag, it'll now show up Null instead of None. first() will also return
1602	Null if you ask it for a nonexistent tag. Null is an object that's
1603	just like None, except you can do whatever you want to it and it'll
1604	give you Null instead of throwing an error.
1605
1606	This lets you do tree traversals like soup.htmlTag.headTag.titleTag
1607	without having to worry if the intermediate stages are actually
1608	there. Previously, if there was no 'head' tag in the document, headTag
1609	in that instance would have been None, and accessing its 'titleTag'
1610	member would have thrown an AttributeError. Now, you can get what you
1611	want when it exists, and get Null when it doesn't, without having to
1612	do a lot of conditionals checking to see if every stage is None.
1613
1614	There are two new relations between page elements: previousSibling and
1615	nextSibling. They reference the previous and next element at the same
1616	level of the parse tree. For instance, if you have HTML like this:
1617
1618	<p><ul><li>Foo<br /><li>Bar</ul>
1619
1620	The first 'li' tag has a previousSibling of Null and its nextSibling
1621	is the second 'li' tag. The second 'li' tag has a nextSibling of Null
1622	and its previousSibling is the first 'li' tag. The previousSibling of
1623	the 'ul' tag is the first 'p' tag. The nextSibling of 'Foo' is the
1624	'br' tag.
1625
1626	I took out the ability to use fetch() to find tags that have a
1627	specific list of contents. See, I can't even explain it well. It was
1628	really difficult to use, I never used it, and I don't think anyone
1629	else ever used it. To the extent anyone did, they can probably use
1630	fetchText() instead. If it turns out someone needs it I'll think of
1631	another solution.
1632
1633	== Tree manipulation ==
1634
1635	You can add new attributes to a tag, and delete attributes from a
1636	tag. In 1.x you could only change a tag's existing attributes.
1637
1638	== Porting Considerations ==
1639
1640	There are three changes in 2.0 that break old code:
1641
1642	In the post-1.2 release you could pass in a function into fetch(). The
1643	function took a string, the tag name. In 2.0, the function takes the
1644	actual Tag object.
1645
1646	It's no longer to pass in SQL-style wildcards to fetch(). Use a
1647	regular expression instead.
1648
1649	The different parsing algorithm means the parse tree may not be shaped
1650	like you expect. This will only actually affect you if your code uses
1651	one of the affected parts. I haven't run into this problem yet while
1652	porting my code.
1653
1654	= Between 1.2 and 2.0 =
1655
1656	This is the release to get if you want Python 1.5 compatibility.
1657
1658	The desired value of an attribute can now be any of the following:
1659
1660	* A string
1661	* A string with SQL-style wildcards
1662	* A compiled RE object
1663	* A callable that returns None/false/empty string if the given value
1664	doesn't match, and any other value otherwise.
1665
1666	This is much easier to use than SQL-style wildcards (see, regular
1667	expressions are good for something). Because of this, I no longer
1668	recommend you use SQL-style wildcards. They may go away in a future
1669	release to clean up the code.
1670
1671	Made Beautiful Soup handle processing instructions as text instead of
1672	ignoring them.
1673
1674	Applied patch from Richie Hindle (richie at entrian dot com) that
1675	makes tag.string a shorthand for tag.contents[0].string when the tag
1676	has only one string-owning child.
1677
1678	Added still more nestable tags. The nestable tags thing won't work in
1679	a lot of cases and needs to be rethought.
1680
1681	Fixed an edge case where searching for "%foo" would match any string
1682	shorter than "foo".
1683
1684	= 1.2 "Who for such dainties would not stoop?" (20040708) =
1685
1686	Applied patch from Ben Last (ben at benlast dot com) that made
1687	Tag.renderContents() correctly handle Unicode.
1688
1689	Made BeautifulStoneSoup even dumber by making it not implicitly close
1690	a tag when another tag of the same type is encountered; only when an
1691	actual closing tag is encountered. This change courtesy of Fuzzy (mike
1692	at pcblokes dot com). BeautifulSoup still works as before.
1693
1694	= 1.1 "Swimming in a hot tureen" =
1695
1696	Added more 'nestable' tags. Changed popping semantics so that when a
1697	nestable tag is encountered, tags are popped up to the previously
1698	encountered nestable tag (of whatever kind). I will revert this if
1699	enough people complain, but it should make more people's lives easier
1700	than harder. This enhancement was suggested by Anthony Baxter (anthony
1701	at interlink dot com dot au).
1702
1703	= 1.0 "So rich and green" (20040420) =
1704
1705	Initial release.