~ubuntu-branches/ubuntu/wily/libsereal-encoder-perl/wily

Viewing changes to lib/Sereal/Encoder.pm

Committer: Package Import Robot
Author(s): Alexandre Mestiashvili, Alexandre Mestiashvili, gregor herrmann
Date: 2015-04-29 11:12:18 UTC
mfrom: (17.1.6 sid)
Revision ID: package-import@ubuntu.com-20150429111218-v3ghc7ck5gcr38fu

Tags: 3.005.001-1

[ Alexandre Mestiashvili ]
* Imported Upstream version 3.005.001
* d/control: cme fix dpkg
* d/copyright: updated debian/* copyright year

[ gregor herrmann ]
* Mark package as autopkgtest-able.

files added:
author_tools/decode.pl

debian/tests

debian/tests/pkg-perl

debian/tests/pkg-perl/test-files

srl_compress.h

srl_error.h

srl_taginfo.h

t/022_canonical_refs.t

t/030_canonical_vs_test_deep.t

t/170_cyclic_weakrefs.t

files removed:
const-c.inc

const-xs.inc

files modified:
Changes

Encoder.xs

MANIFEST

META.json

META.yml

Makefile.PL

author_tools/bench.pl

author_tools/hobodecoder.pl

author_tools/update_from_header.pl

debian/changelog

debian/control

debian/copyright

inc/Sereal/BuildTools.pm

lib/Sereal/Encoder.pm

lib/Sereal/Encoder/Constants.pm

ptable.h

srl_buffer.h

srl_buffer_types.h

srl_common.h

srl_encoder.c

srl_encoder.h

srl_protocol.h

t/lib/Sereal/TestSet.pm

typemap

Show diffs side-by-side

added added

removed removed

lib/Sereal/Encoder.pm

use Carp qw/croak/;

use XSLoader;

our $VERSION = '3.003'; # Don't forget to update the TestCompat set for testing against installed decoders!

our $VERSION = '3.005_001'; # Don't forget to update the TestCompat set for testing against installed decoders!

our $XS_VERSION = $VERSION; $VERSION= eval $VERSION;

# not for public consumption, just for testing.

257

override this optimization and use a standard REFN ARRAY style tag output. This

258

is primarily useful for producing canoncial output and for testing Sereal itself.

259

260

See L</CANONICAL REPRESENTATION> for why you might want to use this, and

261

for the various caveats involved.

262

260

263

=head3 sort_keys

261

264

262

265

Normally C<Sereal::Encoder> will output hashes in whatever order is convenient,

620

623

621

624

=back

622

625

626

There's also a few cases where Sereal will produce different documents

627

for values that you might think are the same thing, because if you

628

e.g. compared them with C<eq> or C<==> in perl itself would think they

629

were equivalent. However for the purposes of serialization they're not

630

the same value.

631

632

A good example of these cases is where L<Test::Deep> and Sereal's

633

canonical mode differ. We have tests for some of these cases in

634

F<t/030_canonical_vs_test_deep.t>. Here's the issues we've noticed so

635

far:

636

637

=over 4

638

639

=item Sereal considers ASCII strings with the UTF-8 flag to be different from the same string without the UTF-8 flag

640

641

Consider:

642

643

my $language_code = "en";

644

645

v.s.:

646

647

my $language_code = "en";

648

utf8::upgrade($en);

649

650

Sereal's canonical mode will encode these strings differently, as it

651

should, since the UTF-8 flag will be passed along on interpolation.

652

653

But this can be confusing if you're just getting some user-supplied

654

ASCII strings that you may inadvertently toggle the UTF-8 flag on,

655

e.g. because you're comparing an ASCII value in a database to a value

656

submitted in a UTF-8 web form.

657

658

=item Sereal will encode strings that look like numbers as strings, unless they've been used in numeric context

659

660

I.e. these values will be encoded differently, respectively:

661

662

my $IV_x = "12345";

663

my $IV_y = "12345" + 0;

664

my $NV_x = "12.345";

665

my $NV_y = "12.345" + 0;

666

667

But as noted above something like Test::Deep will consider these to be

668

the same thing.

669

670

=back

671

672

We might produce certain aggressive flags to the canonical mode in the

673

future to deal with this. For the cases noted above some combination

674

of turning the UTF-8 flag on on all strings, or stripping it from

675

strings that have it but are ASCII-only would "work", similarly we

676

could scan strings to see if they match C<looks_like_number()> and if

677

so numify them.

678

679

This would produce output that either would be a lot bigger (having to

680

encode all numbers as strings), or would be more expensive to generate

681

(having to scan strings for numeric or non-ASCII context), and for

682

some cases like the UTF-8 flag munging wouldn't be suitable for

683

general use outside of canonicialization.

684

623

685

=back

624

686

625

687

Often, people don't actually care about "canonical" in the strict sense

Older »