~ubuntu-branches/ubuntu/wily/libsereal-encoder-perl/wily : revision 20

5

use Carp qw/croak/;

6

use XSLoader;

7

8

our $VERSION = '3.001_012'; # Don't forget to update the TestCompat set for testing against installed decoders!

8

our $VERSION = '3.003'; # Don't forget to update the TestCompat set for testing against installed decoders!

9

our $XS_VERSION = $VERSION; $VERSION= eval $VERSION;

10

11

# not for public consumption, just for testing.

50

=head1 SYNOPSIS

51

52

use Sereal::Encoder qw(encode_sereal sereal_encode_with_object);

53

54

my $encoder = Sereal::Encoder->new({...options...});

55

my $out = $encoder->encode($structure);

56

57

# alternatively the functional interface:

58

$out = sereal_encode_with_object($encoder, $structure);

59

60

# much slower functional interface with no persistent objects:

61

$out = encode_sereal($structure, {... options ...});

62

278

it if you serialize a single very large data structure just once to free

279

the memory.

280

281

See L</NON-CANONICAL> for why you might want to use this, and for the

282

various caveats involved.

281

See L</CANONICAL REPRESENTATION> for why you might want to use this, and

282

for the various caveats involved.

283

284

=head3 no_shared_hashkeys

285

439

440

package

441

File;

442

443

use Moo;

444

445

has 'path' => (is => 'ro');

446

has 'fh' => (is => 'rw');

447

448

# open file handle if necessary and return it

449

sub get_fh {

450

my $self = shift;

456

}

457

return $fh;

458

}

459

460

sub FREEZE {

461

my ($self, $serializer) = @_;

462

# Could switch on $serializer here: JSON, CBOR, Sereal, ...

465

# to recreate.

466

return $self->path;

467

}

468

469

sub THAW {

470

my ($class, $serializer, $data) = @_;

471

# Turn back into object.

491

thread. This might change in a future release to become a full clone

492

of the encoder object.

493

494

=head1 NON-CANONICAL

494

=head1 CANONICAL REPRESENTATION

495

496

You might want to compare two data structures by comparing their serialized

497

byte strings. For that to work reliably the serialization must take extra

498

steps to ensure that identical data structures are encoded into identical

499

serialized byte strings (a so-called "canonical representation").

500

501

Currently the Sereal encoder I<does not> provide a mode that will reliably

502

generate a canonical representation of a data structure. The reasons are many

503

and sometimes subtle.

504

505

Sereal does support some use-cases however. In this section we attempt to outline

506

the issues well enough for you to decide if it is suitable for your needs.

501

Unfortunately in Perl there is no such thing as a "canonical representation".

502

Most people are interested in "structural equivalence" but even that is less

503

well defined than most people think. For instance in the following example:

504

505

my $array1= [ 0, 0 ];

506

my $array2= do {

507

my $zero= 0;

508

sub{ \@_ }->($zero,$zero);

509

};

510

511

the question of whether C<$array1> is structurally equivalent to C<$array2>

512

is a subjective one. Sereal for instance would B<NOT> consider them

513

equivalent but C<Test::Deep> would. There are many examples of this in

514

Perl. Simply stringifying a number technically changes the scalar. Storable

515

would notice this, but Sereal generally would not.

516

517

Despite this as of 3.002 the Sereal encoder supports a "canonical" option

518

which will make a "best effort" attempt at producing a canonical

519

representation of a data structure. This mode is actually a combination of

520

several other modes which may also be enabled independently, and as and when

521

we add new options to the encoder that would assist in this regard then

522

the C<canonical> will also enable them. These options may come with a

523

performance penalty so care should be taken to read the Changes file and

524

test the peformance implications when upgrading a system that uses this

525

option.

526

527

It is important to note that using canonical representation to determine

528

if two data structures are different is subject to false-positives. If

529

two Sereal encodings are identical you can generally assume that the

530

two data structures are functionally equivalent from the point of view of

531

normal Perl code (XS code might disagree). However if two Sereal

532

encodings differ the data structures may actually be functionally

533

equivalent. In practice it seems the the false-positive rate is low,

534

but your milage may vary.

535

536

Some of the issues with producing a true canonical representation are

537

outlined below:

507

538

508

539

=over 4

509

540

510

541

=item Sereal doesn't order the hash keys by default.

511

542

512

This can be enabled via C<sort_keys>, see above.

543

This can be enabled via the C<sort_keys>, which is itself enabled by

544

C<canonical> option.

513

545

514

546

=item Sereal output is sensitive to refcounts

515

547

517

549

518

550

=item There are multiple valid Sereal documents that you can produce for the same Perl data structure.

519

551

520

Just L<sorting hash keys|/sort_keys> is not enough. A trivial example is PAD bytes which

552

Just L<sorting hash keys|/sort_keys> is not enough. Some of the reasons

553

are outlined below. These issues are especially relevant when considering

554

language interoperability.

555

556

=over 4

557

558

=item PAD bytes

559

560

A trivial example is PAD bytes which

521

561

mean nothing and are skipped. They mostly exist for encoder optimizations to

522

562

prevent certain nasty backtracking situations from becoming O(n) at the cost of

523

563

one byte of output. An explicit canonical mode would have to outlaw them (or

526

566

operations to go from O(1) to a full memcpy of everything after the point of

527

567

where we backtracked to. Nasty.

528

568

569

=item COPY tag

570

529

571

Another example is COPY. The COPY tag indicates that the next element is an

530

572

identical copy of a previous element (which is itself forbidden from including

531

573

COPY's other than for class names). COPY is purely internal. The Perl/XS

533

575

other strings (theoretically), but doesn't for time-efficiency reasons. We'd

534

576

have to outlaw the use of this (significant) optimization of canonicalization.

535

577

578

=item REF representation

579

536

580

Sereal represents a reference to an array as a sequence of

537

581

tags which, in its simplest form, reads I<REF, ARRAY $array_length TAG1 TAG2 ...>.

538

582

The separation of "REF" and "ARRAY" is necessary to properly implement all of

543

587

for common cases. This, however, does mean that most arrays up to 15 elements

544

588

could be represented in two different, yet perfectly valid forms. ARRAYREF would

545

589

have to be outlawed for a properly canonical form. The exact same logic

546

applies to HASH vs. HASHREF.

590

applies to HASH vs. HASHREF. This behavior can be overridden by the

591

C<canonical_refs> option, which disables use of HASHREF and ARRAYREF.

592

593

=item Numeric representation

547

594

548

595

Similar to how Sereal can represent arrays and hashes in a full and a compact

549

596

form. For small integers (between -16 and +15 inclusive), Sereal emits only

571

618

supports different floating point precisions and will generally choose the most

572

619

compact that can represent your floating point number correctly.

573

620

574

These issues are especially relevant when considering language interoperability.

621

=back

575

622

576

623

=back

577

624

579

626

required for real I<identity> checking. They just require a best-effort sort of

580

627

thing for caching. But it's a slippery slope!

581

628

582

In a nutshell, the C<sort_keys> option may be sufficient for an application

629

In a nutshell, the C<canonical> option may be sufficient for an application

583

630

which is simply serializing a cache key, and thus there's little harm in an

584

631

occasional false-negative, but think carefully before applying Sereal in other

585

632

use-cases.