47
46
>>> for kind, data, pos in stream:
48
47
... print kind, `data`, pos
50
START (u'p', [(u'class', u'intro')]) ('<string>', 1, 0)
51
TEXT u'Some text and ' ('<string>', 1, 31)
52
START (u'a', [(u'href', u'http://example.org/')]) ('<string>', 1, 31)
53
TEXT u'a link' ('<string>', 1, 67)
54
END u'a' ('<string>', 1, 67)
55
TEXT u'.' ('<string>', 1, 72)
56
START (u'br', []) ('<string>', 1, 72)
57
END u'br' ('<string>', 1, 77)
58
END u'p' ('<string>', 1, 77)
49
START (QName(u'p'), Attrs([(QName(u'class'), u'intro')])) (None, 1, 0)
50
TEXT u'Some text and ' (None, 1, 17)
51
START (QName(u'a'), Attrs([(QName(u'href'), u'http://example.org/')])) (None, 1, 31)
52
TEXT u'a link' (None, 1, 61)
53
END QName(u'a') (None, 1, 67)
54
TEXT u'.' (None, 1, 71)
55
START (QName(u'br'), Attrs()) (None, 1, 72)
56
END QName(u'br') (None, 1, 77)
57
END QName(u'p') (None, 1, 77)
104
105
stream = stream | noop | HTMLSanitizer()
107
For more information about the built-in filters, see `Stream Filters`_.
109
.. _`Stream Filters`: filters.html
110
The ``Stream`` class provides two methods for serializing this list of events:
111
``serialize()`` and ``render()``. The former is a generator that yields chunks
112
of ``Markup`` objects (which are basically unicode strings that are considered
113
safe for output on the web). The latter returns a single string, by default
115
Serialization means producing some kind of textual output from a stream of
116
events, which you'll need when you want to transmit or store the results of
117
generating or otherwise processing markup.
119
The ``Stream`` class provides two methods for serialization: ``serialize()`` and
120
``render()``. The former is a generator that yields chunks of ``Markup`` objects
121
(which are basically unicode strings that are considered safe for output on the
122
web). The latter returns a single string, by default UTF-8 encoded.
116
124
Here's the output from ``serialize()``::
158
166
>>> print stream | HTMLSanitizer() | TextSerializer()
159
167
Some text and a link.
170
Serialization Options
171
---------------------
173
Both ``serialize()`` and ``render()`` support additional keyword arguments that
174
are passed through to the initializer of the serializer class. The following
175
options are supported by the built-in serializers:
178
Whether the serializer should remove trailing spaces and empty lines. Defaults
181
(This option is not available for serialization to plain text.)
184
A ``(name, pubid, sysid)`` tuple defining the name, publid identifier, and
185
system identifier of a ``DOCTYPE`` declaration to prepend to the generated
186
output. If provided, this declaration will override any ``DOCTYPE``
187
declaration in the stream.
189
(This option is not available for serialization to plain text.)
191
``namespace_prefixes``
192
The namespace prefixes to use for namespace that are not bound to a prefix
193
in the stream itself.
195
(This option is not available for serialization to HTML or plain text.)
178
216
>>> from genshi import Stream
179
217
>>> substream = Stream(list(stream.select('a')))
181
<genshi.core.Stream object at 0x7118b0>
219
<genshi.core.Stream object at ...>
182
220
>>> print substream
183
221
<a href="http://example.org/">a link</a>
184
222
>>> print substream.select('@href')
185
223
http://example.org/
186
224
>>> print substream.select('text()')
227
See `Using XPath in Genshi`_ for more information about the XPath support in
230
.. _`Using XPath in Genshi`: xpath.html
238
Every event in a stream is of one of several *kinds*, which also determines
239
what the ``data`` item of the event tuple looks like. The different kinds of
240
events are documented below.
242
.. note:: The ``data`` item is generally immutable. If the data is to be
243
modified when processing a stream, it must be replaced by a new tuple.
244
Effectively, this means the entire event tuple is immutable.
248
The opening tag of an element.
250
For this kind of event, the ``data`` item is a tuple of the form
251
``(tagname, attrs)``, where ``tagname`` is a ``QName`` instance describing the
252
qualified name of the tag, and ``attrs`` is an ``Attrs`` instance containing
253
the attribute names and values associated with the tag (excluding namespace
256
START, (QName(u'p'), Attrs([(u'class', u'intro')])), pos
260
The closing tag of an element.
262
The ``data`` item of end events consists of just a ``QName`` instance
263
describing the qualified name of the tag::
265
END, QName(u'p'), pos
269
Character data outside of elements and comments.
271
For text events, the ``data`` item should be a unicode object::
273
TEXT, u'Hello, world!', pos
277
The start of a namespace mapping, binding a namespace prefix to a URI.
279
The ``data`` item of this kind of event is a tuple of the form
280
``(prefix, uri)``, where ``prefix`` is the namespace prefix and ``uri`` is the
281
full URI to which the prefix is bound. Both should be unicode objects. If the
282
namespace is not bound to any prefix, the ``prefix`` item is an empty string::
284
START_NS, (u'svg', u'http://www.w3.org/2000/svg'), pos
288
The end of a namespace mapping.
290
The ``data`` item of such events consists of only the namespace prefix (a
297
A document type declaration.
299
For this type of event, the ``data`` item is a tuple of the form
300
``(name, pubid, sysid)``, where ``name`` is the name of the root element,
301
``pubid`` is the public identifier of the DTD (or ``None``), and ``sysid`` is
302
the system identifier of the DTD (or ``None``)::
304
DOCTYPE, (u'html', u'-//W3C//DTD XHTML 1.0 Transitional//EN', \
305
u'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'), pos
311
For such events, the ``data`` item is a unicode object containing all character
312
data between the comment delimiters::
314
COMMENT, u'Commented out', pos
318
A processing instruction.
320
The ``data`` item is a tuple of the form ``(target, data)`` for processing
321
instructions, where ``target`` is the target of the PI (used to identify the
322
application by which the instruction should be processed), and ``data`` is text
323
following the target (excluding the terminating question mark)::
325
PI, (u'php', u'echo "Yo" '), pos
329
Marks the beginning of a ``CDATA`` section.
331
The ``data`` item for such events is always ``None``::
333
START_CDATA, None, pos
337
Marks the end of a ``CDATA`` section.
339
The ``data`` item for such events is always ``None``::