1
by Bernd Zeimetz
Import upstream version 0.8 |
1 |
Metadata-Version: 1.0
|
2 |
Name: PyICU
|
|
1.1.1
by Bernd Zeimetz
Import upstream version 0.8.1 |
3 |
Version: 0.8.1
|
1
by Bernd Zeimetz
Import upstream version 0.8 |
4 |
Summary: Python extension wrapping the ICU C++ API
|
5 |
Home-page: http://pyicu.osafoundation.org/
|
|
6 |
Author: Open Source Applications Foundation
|
|
7 |
Author-email: UNKNOWN
|
|
8 |
License: UNKNOWN
|
|
9 |
Description:
|
|
10 |
|
|
11 |
README file for PyICU
|
|
12 |
---------------------
|
|
13 |
|
|
14 |
Contents
|
|
15 |
--------
|
|
16 |
|
|
17 |
- Welcome
|
|
18 |
- Building PyICU
|
|
19 |
- Running PyICU
|
|
20 |
- What's available
|
|
21 |
- API Documentation
|
|
22 |
|
|
23 |
|
|
24 |
Welcome
|
|
25 |
-------
|
|
26 |
|
|
27 |
Welcome to PyICU, a Python extension wrapping IBM's International
|
|
28 |
Components for Unicode C++ library (ICU).
|
|
29 |
|
|
30 |
PyICU is a project maintained by the Open Source Applications Foundation.
|
|
31 |
|
|
32 |
IBM's ICU homepage is: http://www-306.ibm.com/software/globalization/icu/
|
|
33 |
|
|
34 |
|
|
35 |
Building PyICU
|
|
36 |
--------------
|
|
37 |
|
|
38 |
Before building PyICU the ICU 3.6 or 3.8 libraries must be built and
|
|
39 |
installed. Refer to each system's instructions for more information.
|
|
40 |
|
|
41 |
As of version 0.5 PyICU no longer uses SWIG.
|
|
42 |
|
|
43 |
As of version 0.8 PyICU is built with distutils or setuptools:
|
|
44 |
- verify that the INCLUDES, LFLAGS, CFLAGS and LIBRARIES dictionaries in
|
|
45 |
setup.py contain correct values for your platform
|
|
46 |
- python setup.py build
|
|
47 |
- sudo python setup.py install
|
|
48 |
|
|
49 |
|
|
50 |
Running PyICU
|
|
51 |
-------------
|
|
52 |
|
|
53 |
. Mac OS X
|
|
54 |
Make sure that DYLD_LIBRARY_PATH contains paths to the directory(ies)
|
|
55 |
containing the ICU libs.
|
|
56 |
|
|
57 |
. Linux
|
|
58 |
Make sure that LD_LIBRARY_PATH contains paths to the directory(ies)
|
|
59 |
containing the ICU libs or that you added the corresponding -rpath
|
|
60 |
argument to LFLAGS.
|
|
61 |
|
|
62 |
. Windows
|
|
63 |
Make sure that PATH contains paths to the directory(ies)
|
|
64 |
containing the ICU DLLs.
|
|
65 |
|
|
66 |
|
|
67 |
What's available
|
|
68 |
----------------
|
|
69 |
|
|
70 |
PyICU is under active development. Currently, the string, locale, format,
|
|
71 |
calendar, timezone, charset and various iterator classes are available.
|
|
72 |
See the CHANGES file for an up to date log of changes and additions.
|
|
73 |
|
|
74 |
|
|
75 |
API Documentation
|
|
76 |
-----------------
|
|
77 |
|
|
78 |
At the moment, there is no API documentation for PyICU. The API for ICU is
|
|
79 |
documented at http://icu.sourceforge.net/apiref/icu4c/ and the following
|
|
80 |
patterns can be used to translate from the C++ APIs to the corresponding
|
|
81 |
Python APIs.
|
|
82 |
|
|
83 |
- strings
|
|
84 |
|
|
85 |
The ICU string type, UnicodeString, is a type pointing at a mutable
|
|
86 |
array of UChar Unicode 16-bit wide characters. The Python unicode type
|
|
87 |
is an immutable string of 16-bit or 32-bit wide Unicode characters.
|
|
88 |
|
|
89 |
Because of these differences, UnicodeString and Python's unicode type
|
|
90 |
are not merged into the same type when crossing the C++ boundary.
|
|
91 |
ICU APIs taking UnicodeString arguments have been overloaded to also
|
|
92 |
accept Python str or unicode type arguments. In the case of str objects,
|
|
93 |
utf-8 encoding is assumed when converting them to UnicodeString
|
|
94 |
objects.
|
|
95 |
|
|
96 |
To convert a Python str encoded in a encoding other than utf-8 to an ICU
|
|
97 |
UnicodeString use the UnicodeString(str, encodingName) constructor.
|
|
98 |
|
|
99 |
ICU's C++ APIs accept and return UnicodeString arguments in several
|
|
100 |
ways: by value, by pointer or by reference.
|
|
101 |
When an ICU C++ API is documented to accept a UnicodeString & parameter,
|
|
102 |
it is safe to assume that there are several corresponding PyICU python
|
|
103 |
APIs making it accessible in simpler ways:
|
|
104 |
For example, the 'UnicodeString &Locale::getDisplayName(UnicodeString &)'
|
|
105 |
API, documented here:
|
|
106 |
http://icu.sourceforge.net/apiref/icu4c/classLocale.html#a19
|
|
107 |
can be invoked from Python in several ways:
|
|
108 |
|
|
109 |
1. The ICU way
|
|
110 |
|
|
111 |
>>> from PyICU import UnicodeString, Locale
|
|
112 |
>>> locale = Locale('pt_BR')
|
|
113 |
>>> string = UnicodeString()
|
|
114 |
>>> name = locale.getDisplayName(string)
|
|
115 |
>>> name
|
|
116 |
<UnicodeString: Portuguese (Brazil)>
|
|
117 |
>>> name is string
|
|
118 |
True <-- string arg was returned, modified in place
|
|
119 |
|
|
120 |
2. The Python way
|
|
121 |
|
|
122 |
>>> from PyICU import Locale
|
|
123 |
>>> locale = Locale('pt_BR')
|
|
124 |
>>> name = locale.getDisplayName()
|
|
125 |
>>> name
|
|
126 |
<UnicodeString: Portuguese (Brazil)>
|
|
127 |
|
|
128 |
A UnicodeString object was allocated for Python and returned.
|
|
129 |
|
|
130 |
A UnicodeString can be coerced to a Python unicode string with Python's
|
|
131 |
unicode() constructor. The usual len(), str(), comparison, [] and [:]
|
|
132 |
operators are all available, with the additional twists that slicing is
|
|
133 |
not read-only and that += is also available since a UnicodeString is
|
|
134 |
mutable. For example:
|
|
135 |
|
|
136 |
>>> name = locale.getDisplayName()
|
|
137 |
<UnicodeString: Portuguese (Brazil)>
|
|
138 |
>>> unicode(name)
|
|
139 |
u'Portuguese (Brazil)'
|
|
140 |
>>> len(name)
|
|
141 |
19
|
|
142 |
>>> str(name) <-- works when chars fit with default encoding
|
|
143 |
'Portuguese (Brazil)'
|
|
144 |
>>> name[3]
|
|
145 |
u't'
|
|
146 |
>>> name[12:18]
|
|
147 |
<UnicodeString: Brazil>
|
|
148 |
>>> name[12:18] = 'the country of Brasil'
|
|
149 |
>>> name
|
|
150 |
<UnicodeString: Portuguese (the country of Brasil)>
|
|
151 |
>>> name += ' oh joy'
|
|
152 |
>>> name
|
|
153 |
<UnicodeString: Portuguese (the country of Brasil) oh joy>
|
|
154 |
|
|
155 |
- error reporting
|
|
156 |
|
|
157 |
The C++ ICU library does not use C++ exceptions to report errors. ICU
|
|
158 |
C++ APIs return errors via a UErrorCode reference argument. All such
|
|
159 |
APIs are wrapped by Python APIs that omit this argument and throw an
|
|
160 |
ICUError Python exception instead. The same is true for ICU APIs taking
|
|
161 |
both a ParseError and a UErrorCode, they are both to be omitted.
|
|
162 |
|
|
163 |
For example, the 'UnicodeString &DateFormat::format(const Formattable &,
|
|
164 |
UnicodeString &, UErrorCode &)' API, documented here
|
|
165 |
http://icu.sourceforge.net/apiref/icu4c/classDateFormat.html#a6
|
|
166 |
is invoked from Python with:
|
|
167 |
|
|
168 |
>>> from PyICU import DateFormat, Formattable
|
|
169 |
>>> df = DateFormat.createInstance()
|
|
170 |
>>> df
|
|
171 |
<SimpleDateFormat: M/d/yy h:mm a>
|
|
172 |
>>> f = Formattable(940284258.0, Formattable.kIsDate)
|
|
173 |
>>> df.format(f)
|
|
174 |
<UnicodeString: 10/18/99 3:04 PM>
|
|
175 |
|
|
176 |
Of course, the simpler 'UnicodeString &DateFormat::format(UDate,
|
|
177 |
UnicodeString &)' documented here:
|
|
178 |
http://icu.sourceforge.net/apiref/icu4c/classDateFormat.html#a5
|
|
179 |
can be used too:
|
|
180 |
|
|
181 |
>>> from PyICU import DateFormat
|
|
182 |
>>> df = DateFormat.createInstance()
|
|
183 |
>>> df
|
|
184 |
<SimpleDateFormat: M/d/yy h:mm a>
|
|
185 |
>>> df.format(940284258.0)
|
|
186 |
<UnicodeString: 10/18/99 3:04 PM>
|
|
187 |
|
|
188 |
- dates
|
|
189 |
|
|
190 |
ICU uses a double floating point type called UDate that represents the
|
|
191 |
number of milliseconds elapsed since 1970-jan-01 UTC for dates.
|
|
192 |
|
|
193 |
In Python, the value returned by the time module's time() function is
|
|
194 |
the number of seconds since 1970-jan-01 UTC. Because of this difference,
|
|
195 |
floating point values are multiplied by 1000 when passed to APIs taking
|
|
196 |
UDate and divided by 1000 when returned as UDate.
|
|
197 |
|
|
198 |
Python's datetime objects, with or without timezone information, can
|
|
199 |
also be used with APIs taking UDate arguments. The datetime objects get
|
|
200 |
converted to UDate when crossing into the C++ layer.
|
|
201 |
|
|
202 |
- arrays
|
|
203 |
|
|
204 |
Many ICU API take array arguments. A list of elements of the array
|
|
205 |
element types is to be passed from Python.
|
|
206 |
|
|
207 |
- StringEnumeration
|
|
208 |
|
|
209 |
An ICU StringEnumeration has three 'next' methods: next() which returns
|
|
210 |
a 'str' objects, unext() which returns 'unicode' objects and snext()
|
|
211 |
which returns 'UnicodeString' objects.
|
|
212 |
Any of these methods can be used as an iterator, using the Python
|
|
213 |
built-in 'iter' function.
|
|
214 |
|
|
215 |
For example, let e be a StringEnumeration instance:
|
|
216 |
|
|
217 |
[s for s in e] is a list of 'str' objects
|
|
218 |
[s for s in iter(e.unext, None)] is a list of 'unicode' objects
|
|
219 |
[s for s in iter(e.snext, None)] is a list of 'UnicodeString' objects
|
|
220 |
|
|
221 |
- timezones
|
|
222 |
|
|
223 |
The ICU TimeZone type may be wrapped with an ICUtzinfo type for usage
|
|
224 |
with Python's datetime type. For example:
|
|
225 |
|
|
226 |
tz = ICUtzinfo(TimeZone.createTimeZone('US/Mountain'))
|
|
227 |
datetime.now(tz)
|
|
228 |
|
|
229 |
or, even simpler:
|
|
230 |
|
|
231 |
tz = ICUtzinfo.getInstance('Pacific/Fiji')
|
|
232 |
datetime.now(tz)
|
|
233 |
|
|
234 |
To get the default time zone use:
|
|
235 |
|
|
236 |
defaultTZ = ICUtzinfo.getDefault()
|
|
237 |
|
|
238 |
To get the time zone's id, use the 'tzid' attribute or coerce the time
|
|
239 |
zone to a string:
|
|
240 |
|
|
241 |
ICUtzinfo.getInstance('Pacific/Fiji').tzid -> 'Pacific/Fiji'
|
|
242 |
str(ICUtzinfo.getInstance('Pacific/Fiji')) -> 'Pacific/Fiji'
|
|
243 |
|
|
244 |
Platform: UNKNOWN
|