~ubuntu-branches/ubuntu/raring/libencode-perl/raring

Committer: Bazaar Package Importer
Author(s): Dominic Hargreaves, Dominic Hargreaves, gregor herrmann
Date: 2011-01-04 21:46:25 UTC
mfrom: (1.2.1 upstream) (2.1.2 intrepid)
Revision ID: james.westby@ubuntu.com-20110104214625-hbpxr6egctsifiqu

Tags: 2.42-1

http://bugs.debian.org/608294

* New upstream release (Closes: #608294)

[ Dominic Hargreaves ]
* Added myself as an uploader
* Bumped Standards-Version (no changes)

[ gregor herrmann ]
* Switch to source format 3.0 (quilt).
* debian/control: change Vcs-Browser field to ViewSVN.
* debian/watch: use extended regexp to match upstream releases.
* debian/copyright: switch to DEP5 formatting.
* Use debhelper 7 (debian/{rules,control,compat}.
* Add a patch to deal with spelling/grammar problems, and lintian override
where lintian and /me don't agree on English grammaar.
* Add /me to Uploaders.

files added:
.pc

.pc/.version

.pc/applied-patches

.pc/spelling.patch

.pc/spelling.patch/lib

.pc/spelling.patch/lib/Encode

.pc/spelling.patch/lib/Encode/Supported.pod

debian/libencode-perl.lintian-overrides

debian/patches/series

debian/patches/spelling.patch

debian/source

debian/source/format

t/piconv.t

t/utf8ref.t

ucm/cp858.ucm

files removed:
debian/dirs

debian/docs

debian/patches/00-installmans-Makefile.PL.patch

files modified:
AUTHORS

Byte/Byte.pm

CN/CN.pm

Changes

Encode.pm

Encode.xs

JP/JP.pm

KR/KR.pm

MANIFEST

META.yml

Makefile.PL

TW/TW.pm

Unicode/Unicode.pm

Unicode/Unicode.xs

bin/enc2xs

bin/piconv

bin/ucmlint

debian/changelog

debian/compat

debian/control

debian/copyright

debian/rules

debian/watch

encoding.pm

lib/Encode/Alias.pm

lib/Encode/CN/HZ.pm

lib/Encode/Config.pm

lib/Encode/GSM0338.pm

lib/Encode/Guess.pm

lib/Encode/JP/JIS7.pm

lib/Encode/MIME/Header.pm

lib/Encode/Supported.pod

lib/Encode/Unicode/UTF7.pm

t/Aliases.t

t/CJKT.t

t/Unicode.t

t/fallback.t

t/guess.t

t/mime-header.t

t/mime-name.t

t/mime_header_iso2022jp.t

t/perlio.t

t/utf8strict.t

ucm/cp850.ucm

ucm/cp852.ucm

ucm/cp855.ucm

ucm/cp856.ucm

ucm/cp857.ucm

ucm/cp860.ucm

ucm/cp861.ucm

ucm/cp862.ucm

ucm/cp863.ucm

ucm/cp864.ucm

ucm/cp865.ucm

ucm/cp866.ucm

ucm/cp869.ucm

ucm/cp874.ucm

ucm/cp875.ucm

ucm/macJapanese.ucm

ucm/nextstep.ucm

Show diffs side-by-side

added added

removed removed

Unicode/Unicode.pm

use warnings;

no warnings 'redefine';

our $VERSION = do { my @r = ( q$Revision: 2.4 $ =~ /\d+/g ); sprintf "%d." . "%02d" x $#r, @r };

our $VERSION = do { my @r = ( q$Revision: 2.7 $ =~ /\d+/g ); sprintf "%d." . "%02d" x $#r, @r };

use XSLoader;

XSLoader::load( __PACKAGE__, $VERSION );

=head1 SYNOPSIS

use Encode qw/encode decode/;

$ucs2 = encode("UCS-2BE", $utf8);

$utf8 = decode("UCS-2BE", $ucs2);

Decodes from ord(N) Encodes chr(N) to...

octet/char BOM S.P d800-dfff ord > 0xffff \x{1abcd} ==

---------------+-----------------+------------------------------

100

UCS-2BE 2 N N is bogus Not Available

100

UCS-2BE 2 N N is bogus Not Available

101

UCS-2LE 2 N N bogus Not Available

102

UTF-16 2/4 Y Y is S.P S.P BE/LE

103

UTF-16BE 2/4 N Y S.P S.P 0xd82a,0xdfcd

104

UTF-16LE 2 N Y S.P S.P 0x2ad8,0xcddf

105

UTF-32 4 Y - is bogus As is BE/LE

106

UTF-32BE 4 N - bogus As is 0x0001abcd

107

UTF-32LE 4 N - bogus As is 0xcdab0100

104

UTF-16LE 2/4 N Y S.P S.P 0x2ad8,0xcddf

105

UTF-32 4 Y - is bogus As is BE/LE

106

UTF-32BE 4 N - bogus As is 0x0001abcd

107

UTF-32LE 4 N - bogus As is 0xcdab0100

108

UTF-8 1-4 - - bogus >= 4 octets \xf0\x9a\af\8d

109

---------------+-----------------+------------------------------

110

230

$uni = 0x10000 + ($hi - 0xD800) * 0x400 + ($lo - 0xDC00);

231

232

Note this move has made \x{D800}-\x{DFFF} into a forbidden zone but

233

perl does not prohibit the use of characters within this range. To perl,

233

perl does not prohibit the use of characters within this range. To perl,

234

every one of \x{0000_0000} up to \x{ffff_ffff} (*) is I<a character>.

235

236

(*) or \x{ffff_ffff_ffff_ffff} if your perl is compiled with 64-bit

241

Unlike most encodings which accept various ways to handle errors,

242

Unicode encodings simply croaks.

243

244

% perl -MEncode -e '$_ = "\xfe\xff\xd8\xd9\xda\xdb\0\n"' \

245

-e 'Encode::from_to($_, "utf16","shift_jis", 0); print'

244

% perl -MEncode -e'$_ = "\xfe\xff\xd8\xd9\xda\xdb\0\n"' \

245

-e'Encode::from_to($_, "utf16","shift_jis", 0); print'

246

UTF-16:Malformed LO surrogate d8d9 at /path/to/Encode.pm line 184.

247

% perl -MEncode -e '$a = "BOM missing"' \

248

-e ' Encode::from_to($a, "utf16", "shift_jis", 0); print'

247

% perl -MEncode -e'$a = "BOM missing"' \

248

-e' Encode::from_to($a, "utf16", "shift_jis", 0); print'

249

UTF-16:Unrecognised BOM 424f at /path/to/Encode.pm line 184.

250

251

Unlike other encodings where mappings are not one-to-one against

259

L<Encode>, L<Encode::Unicode::UTF7>, L<http://www.unicode.org/glossary/>,

260

L<http://www.unicode.org/unicode/faq/utf_bom.html>,

261

262

RFC 2781 L<http://rfc.net/rfc2781.html>,

262

RFC 2781 L<http://www.ietf.org/rfc/rfc2781.txt>,

263

264

The whole Unicode standard L<http://www.unicode.org/unicode/uni2book/u2.html>

265

266

Ch. 15, pp. 403 of C<Programming Perl (3rd Edition)>

267

by Larry Wall, Tom Christiansen, Jon Orwant;

267

by Larry Wall, Tom Christiansen, Jon Orwant;

268

O'Reilly & Associates; ISBN 0-596-00027-8

269

270

=cut

Older »