1
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
5
<html xmlns="http://www.w3.org/1999/xhtml">
7
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
9
<title>6.5. unicodedata — Unicode Database — Python 3.5.1 documentation</title>
11
<link rel="stylesheet" href="../_static/pydoctheme.css" type="text/css" />
12
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
14
<script type="text/javascript">
15
var DOCUMENTATION_OPTIONS = {
18
COLLAPSE_INDEX: false,
23
<script type="text/javascript" src="../_static/jquery.js"></script>
24
<script type="text/javascript" src="../_static/underscore.js"></script>
25
<script type="text/javascript" src="../_static/doctools.js"></script>
26
<script type="text/javascript" src="../_static/sidebar.js"></script>
27
<link rel="search" type="application/opensearchdescription+xml"
28
title="Search within Python 3.5.1 documentation"
29
href="../_static/opensearch.xml"/>
30
<link rel="author" title="About these documents" href="../about.html" />
31
<link rel="copyright" title="Copyright" href="../copyright.html" />
32
<link rel="top" title="Python 3.5.1 documentation" href="../contents.html" />
33
<link rel="up" title="6. Text Processing Services" href="text.html" />
34
<link rel="next" title="6.6. stringprep — Internet String Preparation" href="stringprep.html" />
35
<link rel="prev" title="6.4. textwrap — Text wrapping and filling" href="textwrap.html" />
36
<link rel="shortcut icon" type="image/png" href="../_static/py.png" />
37
<script type="text/javascript" src="../_static/copybutton.js"></script>
38
<script type="text/javascript" src="../_static/version_switch.js"></script>
43
<body role="document">
44
<div class="related" role="navigation" aria-label="related navigation">
47
<li class="right" style="margin-right: 10px">
48
<a href="../genindex.html" title="General Index"
49
accesskey="I">index</a></li>
51
<a href="../py-modindex.html" title="Python Module Index"
54
<a href="stringprep.html" title="6.6. stringprep — Internet String Preparation"
55
accesskey="N">next</a> |</li>
57
<a href="textwrap.html" title="6.4. textwrap — Text wrapping and filling"
58
accesskey="P">previous</a> |</li>
59
<li><img src="../_static/py.png" alt=""
60
style="vertical-align: middle; margin-top: -1px"/></li>
61
<li><a href="https://www.python.org/">Python</a> »</li>
63
<span class="version_switcher_placeholder">3.5.1</span>
64
<a href="../index.html">Documentation </a> »
67
<li class="nav-item nav-item-1"><a href="index.html" >The Python Standard Library</a> »</li>
68
<li class="nav-item nav-item-2"><a href="text.html" accesskey="U">6. Text Processing Services</a> »</li>
72
<div class="document">
73
<div class="documentwrapper">
74
<div class="bodywrapper">
75
<div class="body" role="main">
77
<div class="section" id="module-unicodedata">
78
<span id="unicodedata-unicode-database"></span><h1>6.5. <a class="reference internal" href="#module-unicodedata" title="unicodedata: Access the Unicode Database."><code class="xref py py-mod docutils literal"><span class="pre">unicodedata</span></code></a> — Unicode Database<a class="headerlink" href="#module-unicodedata" title="Permalink to this headline">¶</a></h1>
79
<p id="index-0">This module provides access to the Unicode Character Database (UCD) which
80
defines character properties for all Unicode characters. The data contained in
81
this database is compiled from the <a class="reference external" href="http://www.unicode.org/Public/8.0.0/ucd">UCD version 8.0.0</a>.</p>
82
<p>The module uses the same names and symbols as defined by Unicode
83
Standard Annex #44, <a class="reference external" href="http://www.unicode.org/reports/tr44/tr44-6.html">“Unicode Character Database”</a>. It defines the
84
following functions:</p>
86
<dt id="unicodedata.lookup">
87
<code class="descclassname">unicodedata.</code><code class="descname">lookup</code><span class="sig-paren">(</span><em>name</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.lookup" title="Permalink to this definition">¶</a></dt>
88
<dd><p>Look up character by name. If a character with the given name is found, return
89
the corresponding character. If not found, <a class="reference internal" href="exceptions.html#KeyError" title="KeyError"><code class="xref py py-exc docutils literal"><span class="pre">KeyError</span></code></a> is raised.</p>
90
<div class="versionchanged">
91
<p><span class="versionmodified">Changed in version 3.3: </span>Support for name aliases <a class="footnote-reference" href="#id3" id="id1">[1]</a> and named sequences <a class="footnote-reference" href="#id4" id="id2">[2]</a> has been added.</p>
96
<dt id="unicodedata.name">
97
<code class="descclassname">unicodedata.</code><code class="descname">name</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.name" title="Permalink to this definition">¶</a></dt>
98
<dd><p>Returns the name assigned to the character <em>chr</em> as a string. If no
99
name is defined, <em>default</em> is returned, or, if not given, <a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a> is
103
<dl class="function">
104
<dt id="unicodedata.decimal">
105
<code class="descclassname">unicodedata.</code><code class="descname">decimal</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.decimal" title="Permalink to this definition">¶</a></dt>
106
<dd><p>Returns the decimal value assigned to the character <em>chr</em> as integer.
107
If no such value is defined, <em>default</em> is returned, or, if not given,
108
<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a> is raised.</p>
111
<dl class="function">
112
<dt id="unicodedata.digit">
113
<code class="descclassname">unicodedata.</code><code class="descname">digit</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.digit" title="Permalink to this definition">¶</a></dt>
114
<dd><p>Returns the digit value assigned to the character <em>chr</em> as integer.
115
If no such value is defined, <em>default</em> is returned, or, if not given,
116
<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a> is raised.</p>
119
<dl class="function">
120
<dt id="unicodedata.numeric">
121
<code class="descclassname">unicodedata.</code><code class="descname">numeric</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.numeric" title="Permalink to this definition">¶</a></dt>
122
<dd><p>Returns the numeric value assigned to the character <em>chr</em> as float.
123
If no such value is defined, <em>default</em> is returned, or, if not given,
124
<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a> is raised.</p>
127
<dl class="function">
128
<dt id="unicodedata.category">
129
<code class="descclassname">unicodedata.</code><code class="descname">category</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.category" title="Permalink to this definition">¶</a></dt>
130
<dd><p>Returns the general category assigned to the character <em>chr</em> as
134
<dl class="function">
135
<dt id="unicodedata.bidirectional">
136
<code class="descclassname">unicodedata.</code><code class="descname">bidirectional</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.bidirectional" title="Permalink to this definition">¶</a></dt>
137
<dd><p>Returns the bidirectional class assigned to the character <em>chr</em> as
138
string. If no such value is defined, an empty string is returned.</p>
141
<dl class="function">
142
<dt id="unicodedata.combining">
143
<code class="descclassname">unicodedata.</code><code class="descname">combining</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.combining" title="Permalink to this definition">¶</a></dt>
144
<dd><p>Returns the canonical combining class assigned to the character <em>chr</em>
145
as integer. Returns <code class="docutils literal"><span class="pre">0</span></code> if no combining class is defined.</p>
148
<dl class="function">
149
<dt id="unicodedata.east_asian_width">
150
<code class="descclassname">unicodedata.</code><code class="descname">east_asian_width</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.east_asian_width" title="Permalink to this definition">¶</a></dt>
151
<dd><p>Returns the east asian width assigned to the character <em>chr</em> as
155
<dl class="function">
156
<dt id="unicodedata.mirrored">
157
<code class="descclassname">unicodedata.</code><code class="descname">mirrored</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.mirrored" title="Permalink to this definition">¶</a></dt>
158
<dd><p>Returns the mirrored property assigned to the character <em>chr</em> as
159
integer. Returns <code class="docutils literal"><span class="pre">1</span></code> if the character has been identified as a “mirrored”
160
character in bidirectional text, <code class="docutils literal"><span class="pre">0</span></code> otherwise.</p>
163
<dl class="function">
164
<dt id="unicodedata.decomposition">
165
<code class="descclassname">unicodedata.</code><code class="descname">decomposition</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.decomposition" title="Permalink to this definition">¶</a></dt>
166
<dd><p>Returns the character decomposition mapping assigned to the character
167
<em>chr</em> as string. An empty string is returned in case no such mapping is
171
<dl class="function">
172
<dt id="unicodedata.normalize">
173
<code class="descclassname">unicodedata.</code><code class="descname">normalize</code><span class="sig-paren">(</span><em>form</em>, <em>unistr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.normalize" title="Permalink to this definition">¶</a></dt>
174
<dd><p>Return the normal form <em>form</em> for the Unicode string <em>unistr</em>. Valid values for
175
<em>form</em> are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’.</p>
176
<p>The Unicode standard defines various normalization forms of a Unicode string,
177
based on the definition of canonical equivalence and compatibility equivalence.
178
In Unicode, several characters can be expressed in various way. For example, the
179
character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as
180
the sequence U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA).</p>
181
<p>For each character, there are two normal forms: normal form C and normal form D.
182
Normal form D (NFD) is also known as canonical decomposition, and translates
183
each character into its decomposed form. Normal form C (NFC) first applies a
184
canonical decomposition, then composes pre-combined characters again.</p>
185
<p>In addition to these two forms, there are two additional normal forms based on
186
compatibility equivalence. In Unicode, certain characters are supported which
187
normally would be unified with other characters. For example, U+2160 (ROMAN
188
NUMERAL ONE) is really the same thing as U+0049 (LATIN CAPITAL LETTER I).
189
However, it is supported in Unicode for compatibility with existing character
190
sets (e.g. gb2312).</p>
191
<p>The normal form KD (NFKD) will apply the compatibility decomposition, i.e.
192
replace all compatibility characters with their equivalents. The normal form KC
193
(NFKC) first applies the compatibility decomposition, followed by the canonical
195
<p>Even if two unicode strings are normalized and look the same to
196
a human reader, if one has combining characters and the other
197
doesn’t, they may not compare equal.</p>
200
<p>In addition, the module exposes the following constant:</p>
202
<dt id="unicodedata.unidata_version">
203
<code class="descclassname">unicodedata.</code><code class="descname">unidata_version</code><a class="headerlink" href="#unicodedata.unidata_version" title="Permalink to this definition">¶</a></dt>
204
<dd><p>The version of the Unicode database used in this module.</p>
208
<dt id="unicodedata.ucd_3_2_0">
209
<code class="descclassname">unicodedata.</code><code class="descname">ucd_3_2_0</code><a class="headerlink" href="#unicodedata.ucd_3_2_0" title="Permalink to this definition">¶</a></dt>
210
<dd><p>This is an object that has the same methods as the entire module, but uses the
211
Unicode database version 3.2 instead, for applications that require this
212
specific version of the Unicode database (such as IDNA).</p>
216
<div class="highlight-python3"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">unicodedata</span>
217
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="s1">'LEFT CURLY BRACKET'</span><span class="p">)</span>
218
<span class="go">'{'</span>
219
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">name</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
220
<span class="go">'SOLIDUS'</span>
221
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">decimal</span><span class="p">(</span><span class="s1">'9'</span><span class="p">)</span>
222
<span class="go">9</span>
223
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">decimal</span><span class="p">(</span><span class="s1">'a'</span><span class="p">)</span>
224
<span class="gt">Traceback (most recent call last):</span>
225
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n">?</span>
226
<span class="gr">ValueError</span>: <span class="n">not a decimal</span>
227
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">category</span><span class="p">(</span><span class="s1">'A'</span><span class="p">)</span> <span class="c1"># 'L'etter, 'u'ppercase</span>
228
<span class="go">'Lu'</span>
229
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">bidirectional</span><span class="p">(</span><span class="s1">'</span><span class="se">\u0660</span><span class="s1">'</span><span class="p">)</span> <span class="c1"># 'A'rabic, 'N'umber</span>
230
<span class="go">'AN'</span>
233
<p class="rubric">Footnotes</p>
234
<table class="docutils footnote" frame="void" id="id3" rules="none">
235
<colgroup><col class="label" /><col /></colgroup>
237
<tr><td class="label"><a class="fn-backref" href="#id1">[1]</a></td><td><a class="reference external" href="http://www.unicode.org/Public/8.0.0/ucd/NameAliases.txt">http://www.unicode.org/Public/8.0.0/ucd/NameAliases.txt</a></td></tr>
240
<table class="docutils footnote" frame="void" id="id4" rules="none">
241
<colgroup><col class="label" /><col /></colgroup>
243
<tr><td class="label"><a class="fn-backref" href="#id2">[2]</a></td><td><a class="reference external" href="http://www.unicode.org/Public/8.0.0/ucd/NamedSequences.txt">http://www.unicode.org/Public/8.0.0/ucd/NamedSequences.txt</a></td></tr>
252
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
253
<div class="sphinxsidebarwrapper">
254
<h4>Previous topic</h4>
255
<p class="topless"><a href="textwrap.html"
256
title="previous chapter">6.4. <code class="docutils literal"><span class="pre">textwrap</span></code> — Text wrapping and filling</a></p>
258
<p class="topless"><a href="stringprep.html"
259
title="next chapter">6.6. <code class="docutils literal"><span class="pre">stringprep</span></code> — Internet String Preparation</a></p>
261
<ul class="this-page-menu">
262
<li><a href="../bugs.html">Report a Bug</a></li>
263
<li><a href="../_sources/library/unicodedata.txt"
264
rel="nofollow">Show Source</a></li>
267
<div id="searchbox" style="display: none" role="search">
268
<h3>Quick search</h3>
269
<form class="search" action="../search.html" method="get">
270
<input type="text" name="q" />
271
<input type="submit" value="Go" />
272
<input type="hidden" name="check_keywords" value="yes" />
273
<input type="hidden" name="area" value="default" />
275
<p class="searchtip" style="font-size: 90%">
276
Enter search terms or a module, class or function name.
279
<script type="text/javascript">$('#searchbox').show(0);</script>
282
<div class="clearer"></div>
284
<div class="related" role="navigation" aria-label="related navigation">
287
<li class="right" style="margin-right: 10px">
288
<a href="../genindex.html" title="General Index"
291
<a href="../py-modindex.html" title="Python Module Index"
294
<a href="stringprep.html" title="6.6. stringprep — Internet String Preparation"
297
<a href="textwrap.html" title="6.4. textwrap — Text wrapping and filling"
299
<li><img src="../_static/py.png" alt=""
300
style="vertical-align: middle; margin-top: -1px"/></li>
301
<li><a href="https://www.python.org/">Python</a> »</li>
303
<span class="version_switcher_placeholder">3.5.1</span>
304
<a href="../index.html">Documentation </a> »
307
<li class="nav-item nav-item-1"><a href="index.html" >The Python Standard Library</a> »</li>
308
<li class="nav-item nav-item-2"><a href="text.html" >6. Text Processing Services</a> »</li>
312
© <a href="../copyright.html">Copyright</a> 1990-2016, Python Software Foundation.
314
The Python Software Foundation is a non-profit corporation.
315
<a href="https://www.python.org/psf/donations/">Please donate.</a>
317
Last updated on Jan 22, 2016.
318
<a href="../bugs.html">Found a bug</a>?
320
Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.3.3.
b'\\ No newline at end of file'