115
121
<p>There is one operation that Python <em>can</em> do with a non-ascii bytestring, and its a great source of confusion: it can dump the bytestring straight out to a stream or a file, with nary a care what the encoding is. To Python, this is pretty much like dumping any other kind of binary data (like an image) to a stream somewhere. So in a lot of cases, programs that embed all kinds of international characters and encodings into plain byte-strings (i.e. using <code>"hello world"</code> style literals) can fly right through their run, sending reams of strings out to whereever they are going, and the programmer, seeing the same output as was expressed in the input, is now under the illusion that his or her program is Unicode-compliant. In fact, their program has no unicode awareness whatsoever, and similarly has no ability to interact with libraries that <em>are</em> unicode aware.
117
<p>Particularly, some template languages like Cheetah, as well as earlier versions of Myghty, will treat expressions in this manner..they just go right through. Theres nothing "incorrect" about this, but Mako, since it deals with unicode internally, usually requires explicitness when dealing with non-ascii encodings. Additionally, if you ever need to handle unicode strings and other kinds of encoding conversions more intelligently, the usage of raw bytestrings quickly becomes a nightmare, since you are sending the Python interpreter collections of bytes for which it can make no intelligent decisions with regards to encoding.
123
<p>The "pass through encoded data" scheme is what template languages like Cheetah and earlier versions of Myghty do by default. Mako as of version 0.2 also supports this mode of operation using the "disable_unicode=True" flag. However, when using Mako in its default mode of unicode-aware, it requires explicitness when dealing with non-ascii encodings. Additionally, if you ever need to handle unicode strings and other kinds of encoding conversions more intelligently, the usage of raw bytestrings quickly becomes a nightmare, since you are sending the Python interpreter collections of bytes for which it can make no intelligent decisions with regards to encoding.
119
<p>In Mako, all parsed template constructs and output streams are handled internally as Python <code>unicode</code> objects. Its only at the point of <code>render()</code> that this unicode stream is rendered into whatever the desired output encoding is. The implication here is that the template developer must ensure that the encoding of all non-ascii templates is explicit, that all non-ascii-encoded expressions are in one way or another converted to unicode, and that the output stream of the template is handled as a unicode stream being encoded to some encoding.
125
<p>In normal Mako operation, all parsed template constructs and output streams are handled internally as Python <code>unicode</code> objects. Its only at the point of <code>render()</code> that this unicode stream is rendered into whatever the desired output encoding is. The implication here is that the template developer must ensure that the encoding of all non-ascii templates is explicit, that all non-ascii-encoded expressions are in one way or another converted to unicode, and that the output stream of the template is handled as a unicode stream being encoded to some encoding.
349
<a href="#top">back to section top</a>
355
<A name="unicode_saying"></a>
357
<div class="subsection">
360
<h3>Saying to Heck with it: Disabling the usage of Unicode entirely</h3>
364
<p>Some segements of Mako's userbase choose to make no usage of Unicode whatsoever, and instead would prefer the "passthru" approach; all string expressions in their templates return encoded bytestrings, and they would like these strings to pass right through. The generated template module is also in the same encoding as the template and additionally carries Python's "magic encoding comment" at the top. The only advantage to this approach is that templates need not use <code>u""</code> for literal strings; there's an arguable speed improvement as well since raw bytestrings generally perform slightly faster than unicode objects in Python. For these users, they will have to get used to using Unicode when Python 3000 becomes the standard, but for now they can hit the <code>disable_unicode=True</code> flag, introduced in version 0.2 of Mako, as so:
371
<div class="highlight"><pre><span class="c"># -*- encoding:utf-8 -*-</span>
372
<span class="k">from</span> <span class="nn">mako.template</span> <span class="k">import</span> <span class="n">Template</span>
374
<span class="n">t</span> <span class="o">=</span> <span class="n">Template</span><span class="p">(</span><span class="s">"drôle de petit voix m’a réveillé."</span><span class="p">,</span> <span class="n">disable_unicode</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">input_encoding</span><span class="o">=</span><span class="s">'utf-8'</span><span class="p">)</span>
375
<span class="k">print</span> <span class="n">t</span><span class="o">.</span><span class="n">code</span>
379
<p>The generated module source code will contain elements like these:
386
<div class="highlight"><pre><span class="c"># -*- encoding:utf-8 -*-</span>
387
<span class="c"># ...more generated code ...</span>
389
<span class="k">def</span> <span class="nf">render_body</span><span class="p">(</span><span class="n">context</span><span class="p">,</span><span class="o">**</span><span class="n">pageargs</span><span class="p">):</span>
390
<span class="n">context</span><span class="o">.</span><span class="n">caller_stack</span><span class="o">.</span><span class="n">push_frame</span><span class="p">()</span>
391
<span class="k">try</span><span class="p">:</span>
392
<span class="n">__M_locals</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">pageargs</span><span class="o">=</span><span class="n">pageargs</span><span class="p">)</span>
393
<span class="c"># SOURCE LINE 1</span>
394
<span class="n">context</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s">'dr</span><span class="se">\xc3\xb4</span><span class="s">le de petit voix m</span><span class="se">\xe2\x80\x99</span><span class="s">a r</span><span class="se">\xc3\xa9</span><span class="s">veill</span><span class="se">\xc3\xa9</span><span class="s">.'</span><span class="p">)</span>
395
<span class="k">return</span> <span class="s">''</span>
396
<span class="k">finally</span><span class="p">:</span>
397
<span class="n">context</span><span class="o">.</span><span class="n">caller_stack</span><span class="o">.</span><span class="n">pop_frame</span><span class="p">()</span>
401
<p>Where above you can see that the <code>encoding</code> magic source comment is at the top, and the string literal used within <code>context.write</code> is a regular bytestring.
403
<p>When <code>disable_unicode=True</code> is turned on, the <code>default_filters</code> argument which normally defaults to <code>["unicode"]</code> now defaults to <code>["str"]</code> instead. Setting default_filters to the empty list <code>[]</code> can remove the overhead of the <code>str</code> call. Also, in this mode you <strong>cannot</strong> safely call <code>render_unicode()</code> - you'll get unicode/decode errors.
405
<p><strong>Rules for using disable_unicode=True</strong>
409
don't use this mode unless you really, really want to and you absolutely understand what you're doing
413
don't use this option just because you don't want to learn to use Unicode properly; we aren't supporting user issues in this mode of operation. We will however offer generous help for the vast majority of users who stick to the Unicode program.
417
it's extremely unlikely this mode of operation will be present in the Python 3000 version of Mako since P3K strings are unicode objects by default; bytestrings are relegated to a "bytes" type that is not intended for dealing with text.