52
52
use Sereal::Encoder qw(encode_sereal sereal_encode_with_object);
54
54
my $encoder = Sereal::Encoder->new({...options...});
55
55
my $out = $encoder->encode($structure);
57
57
# alternatively the functional interface:
58
58
$out = sereal_encode_with_object($encoder, $structure);
60
60
# much slower functional interface with no persistent objects:
61
61
$out = encode_sereal($structure, {... options ...});
278
278
it if you serialize a single very large data structure just once to free
281
See L</NON-CANONICAL> for why you might want to use this, and for the
282
various caveats involved.
281
See L</CANONICAL REPRESENTATION> for why you might want to use this, and
282
for the various caveats involved.
284
284
=head3 no_shared_hashkeys
491
491
thread. This might change in a future release to become a full clone
492
492
of the encoder object.
494
=head1 CANONICAL REPRESENTATION
496
496
You might want to compare two data structures by comparing their serialized
497
497
byte strings. For that to work reliably the serialization must take extra
498
498
steps to ensure that identical data structures are encoded into identical
499
499
serialized byte strings (a so-called "canonical representation").
501
Currently the Sereal encoder I<does not> provide a mode that will reliably
502
generate a canonical representation of a data structure. The reasons are many
503
and sometimes subtle.
505
Sereal does support some use-cases however. In this section we attempt to outline
506
the issues well enough for you to decide if it is suitable for your needs.
501
Unfortunately in Perl there is no such thing as a "canonical representation".
502
Most people are interested in "structural equivalence" but even that is less
503
well defined than most people think. For instance in the following example:
505
my $array1= [ 0, 0 ];
508
sub{ \@_ }->($zero,$zero);
511
the question of whether C<$array1> is structurally equivalent to C<$array2>
512
is a subjective one. Sereal for instance would B<NOT> consider them
513
equivalent but C<Test::Deep> would. There are many examples of this in
514
Perl. Simply stringifying a number technically changes the scalar. Storable
515
would notice this, but Sereal generally would not.
517
Despite this as of 3.002 the Sereal encoder supports a "canonical" option
518
which will make a "best effort" attempt at producing a canonical
519
representation of a data structure. This mode is actually a combination of
520
several other modes which may also be enabled independently, and as and when
521
we add new options to the encoder that would assist in this regard then
522
the C<canonical> will also enable them. These options may come with a
523
performance penalty so care should be taken to read the Changes file and
524
test the peformance implications when upgrading a system that uses this
527
It is important to note that using canonical representation to determine
528
if two data structures are different is subject to false-positives. If
529
two Sereal encodings are identical you can generally assume that the
530
two data structures are functionally equivalent from the point of view of
531
normal Perl code (XS code might disagree). However if two Sereal
532
encodings differ the data structures may actually be functionally
533
equivalent. In practice it seems the the false-positive rate is low,
534
but your milage may vary.
536
Some of the issues with producing a true canonical representation are
510
541
=item Sereal doesn't order the hash keys by default.
512
This can be enabled via C<sort_keys>, see above.
543
This can be enabled via the C<sort_keys>, which is itself enabled by
514
546
=item Sereal output is sensitive to refcounts
518
550
=item There are multiple valid Sereal documents that you can produce for the same Perl data structure.
520
Just L<sorting hash keys|/sort_keys> is not enough. A trivial example is PAD bytes which
552
Just L<sorting hash keys|/sort_keys> is not enough. Some of the reasons
553
are outlined below. These issues are especially relevant when considering
554
language interoperability.
560
A trivial example is PAD bytes which
521
561
mean nothing and are skipped. They mostly exist for encoder optimizations to
522
562
prevent certain nasty backtracking situations from becoming O(n) at the cost of
523
563
one byte of output. An explicit canonical mode would have to outlaw them (or
533
575
other strings (theoretically), but doesn't for time-efficiency reasons. We'd
534
576
have to outlaw the use of this (significant) optimization of canonicalization.
578
=item REF representation
536
580
Sereal represents a reference to an array as a sequence of
537
581
tags which, in its simplest form, reads I<REF, ARRAY $array_length TAG1 TAG2 ...>.
538
582
The separation of "REF" and "ARRAY" is necessary to properly implement all of
543
587
for common cases. This, however, does mean that most arrays up to 15 elements
544
588
could be represented in two different, yet perfectly valid forms. ARRAYREF would
545
589
have to be outlawed for a properly canonical form. The exact same logic
546
applies to HASH vs. HASHREF.
590
applies to HASH vs. HASHREF. This behavior can be overridden by the
591
C<canonical_refs> option, which disables use of HASHREF and ARRAYREF.
593
=item Numeric representation
548
595
Similar to how Sereal can represent arrays and hashes in a full and a compact
549
596
form. For small integers (between -16 and +15 inclusive), Sereal emits only
579
626
required for real I<identity> checking. They just require a best-effort sort of
580
627
thing for caching. But it's a slippery slope!
582
In a nutshell, the C<sort_keys> option may be sufficient for an application
629
In a nutshell, the C<canonical> option may be sufficient for an application
583
630
which is simply serializing a cache key, and thus there's little harm in an
584
631
occasional false-negative, but think carefully before applying Sereal in other