1
<?xml version="1.0" encoding="iso-8859-1" ?>
2
<!DOCTYPE chapter SYSTEM "chapter.dtd">
9
<holder>Ericsson AB, All Rights Reserved</holder>
12
The contents of this file are subject to the Erlang Public License,
13
Version 1.1, (the "License"); you may not use this file except in
14
compliance with the License. You should have received a copy of the
15
Erlang Public License along with this software. If not, it can be
16
retrieved online at http://www.erlang.org/.
18
Software distributed under the License is distributed on an "AS IS"
19
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
20
the License for the specific language governing rights and limitations
23
The Initial Developer of the Original Code is Ericsson AB.
26
<title>External Term Format</title>
27
<prepared>Kenneth</prepared>
29
<date>2007-09-21</date>
34
<title>Introduction</title>
36
The external term format is mainly used in the distribution
40
Since Erlang has a fixed number of types, there is no need for a
41
programmer to define a specification for the external format used
42
within some application.
43
All Erlang terms has an external representation and the interpretation
44
of the different terms are application specific.
47
In Erlang the BIF <seealso marker="kernel:erlang#term_to_binary/1">term_to_binary/1,2</seealso> is used to convert a
48
term into the external format.
49
To convert binary data encoding a term the BIF
50
<seealso marker="kernel:erlang#binary_to_term/1">
56
The distribution does this implicitly when sending messages across
60
The overall format of the term format is:
64
<cell align="center">1</cell>
65
<cell align="center">1</cell>
66
<cell align="center">N</cell>
69
<cell align="center"><c>131</c></cell>
70
<cell align="center"><c>Tag</c></cell>
71
<cell align="center"><c>Data</c></cell>
73
<tcaption></tcaption></table>
75
A compressed term looks like this:
79
<cell align="center">1</cell>
80
<cell align="center">1</cell>
81
<cell align="center">4</cell>
82
<cell align="center">N</cell>
85
<cell align="center">131</cell>
86
<cell align="center">80</cell>
87
<cell align="center">UncompressedSize</cell>
88
<cell align="center">Zlib-compressedData</cell>
90
<tcaption></tcaption></table>
92
Uncompressed Size (unsigned 32 bit integer in big-endian byte order)
93
is the size of the data before it was compressed.
94
The compressed data has the following format when it has been
99
<cell align="center">1</cell>
100
<cell align="center">Uncompressed Size</cell>
103
<cell align="center">Tag</cell>
104
<cell align="center">Data</cell>
106
<tcaption></tcaption></table>
110
<marker id="SMALL_INTEGER_EXT"/>
111
<title>SMALL_INTEGER_EXT</title>
115
<cell align="center">1</cell>
116
<cell align="center">1</cell>
119
<cell align="center">97</cell>
120
<cell align="center">Int</cell>
122
<tcaption></tcaption></table>
124
Unsigned 8 bit integer.
129
<marker id="INTEGER_EXT"/>
130
<title>INTEGER_EXT</title>
134
<cell align="center">1</cell>
135
<cell align="center">4</cell>
138
<cell align="center">98</cell>
139
<cell align="center">Int</cell>
141
<tcaption></tcaption></table>
143
Signed 32 bit integer in big-endian format (i.e. MSB first)
148
<marker id="FLOAT_EXT"/>
149
<title>FLOAT_EXT</title>
153
<cell align="center">1</cell>
154
<cell align="center">31</cell>
157
<cell align="center">99</cell>
158
<cell align="center">Float String</cell>
160
<tcaption></tcaption></table>
162
A float is stored in string format. the format used in sprintf to
163
format the float is "%.20e"
164
(there are more bytes allocated than necessary).
165
To unpack the float use sscanf with format "%lf".
168
This term is used in minor version 0 of the external format;
169
it has been superseded by
170
<seealso marker="#NEW_FLOAT_EXT">
177
<marker id="ATOM_EXT"/>
178
<title>ATOM_EXT</title>
182
<cell align="center">1</cell>
183
<cell align="center">2</cell>
184
<cell align="center">Len</cell>
187
<cell align="center"><c>100</c></cell>
188
<cell align="center"><c>Len</c></cell>
189
<cell align="center"><c>AtomName</c></cell>
191
<tcaption></tcaption></table>
193
An atom is stored with a 2 byte unsigned length in big-endian order,
194
followed by <c>Len</c> numbers of 8 bit characters that forms the
196
Note: The maximum allowed value for <c>Len</c> is 255.
201
<marker id="REFERENCE_EXT"/>
202
<title>REFERENCE_EXT</title>
206
<cell align="center">1</cell>
207
<cell align="center">N</cell>
208
<cell align="center">4</cell>
209
<cell align="center">1</cell>
212
<cell align="center"><c>101</c></cell>
213
<cell align="center"><c>Node</c></cell>
214
<cell align="center"><c>ID</c></cell>
215
<cell align="center"><c>Creation</c></cell>
217
<tcaption></tcaption></table>
219
Encode a reference object (an object generated with <c>make_ref/0</c>).
220
The <c>Node</c> term is an encoded atom, i.e.
221
<seealso marker="#ATOM_EXT">ATOM_EXT</seealso>,
222
<seealso marker="#NEW_CACHE">NEW_CACHE</seealso> or
223
<seealso marker="#CACHED_ATOM">CACHED_ATOM</seealso>.
224
The <c>ID</c> field contains a big-endian
226
but <em>should be regarded as uninterpreted data</em>
227
since this field is node specific.
228
<c>Creation</c> is a byte containing a node serial number that
229
makes it possible to separate old (crashed) nodes from a new one.
232
In <c>ID</c>, only 18 bits are significant; the rest should be 0.
233
In <c>Creation</c>, only 2 bits are significant; the rest should be 0.
235
See <seealso marker="#NEW_REFERENCE_EXT">NEW_REFERENCE_EXT</seealso>.
240
<marker id="PORT_EXT"/>
241
<title>PORT_EXT</title>
245
<cell align="center">1</cell>
246
<cell align="center">N</cell>
247
<cell align="center">4</cell>
248
<cell align="center">1</cell>
251
<cell align="center"><c>102</c></cell>
252
<cell align="center"><c>Node</c></cell>
253
<cell align="center"><c>ID</c></cell>
254
<cell align="center"><c>Creation</c></cell>
256
<tcaption></tcaption></table>
258
Encode a port object (obtained form <c>open_port/2</c>).
259
The <c>ID</c> is a node specific identifier for a local port.
260
Port operations are not allowed across node boundaries.
261
The <c>Creation</c> works just like in
262
<seealso marker="#REFERENCE_EXT">REFERENCE_EXT</seealso>.
267
<marker id="PID_EXT"/>
268
<title>PID_EXT</title>
272
<cell align="center">1</cell>
273
<cell align="center">N</cell>
274
<cell align="center">4</cell>
275
<cell align="center">4</cell>
276
<cell align="center">1</cell>
279
<cell align="center"><c>103</c></cell>
280
<cell align="center"><c>Node</c></cell>
281
<cell align="center"><c>ID</c></cell>
282
<cell align="center"><c>Serial</c></cell>
283
<cell align="center"><c>Creation</c></cell>
285
<tcaption></tcaption></table>
287
Encode a process identifier object (obtained from <c>spawn/3</c> or
289
The <c>ID</c> and <c>Creation</c> fields works just like in
290
<seealso marker="#REFERENCE_EXT">REFERENCE_EXT</seealso>, while
291
the <c>Serial</c> field is used to improve safety.
293
In <c>ID</c>, only 15 bits are significant; the rest should be 0.
299
<marker id="SMALL_TUPLE_EXT"/>
300
<title>SMALL_TUPLE_EXT</title>
304
<cell align="center">1</cell>
305
<cell align="center">1</cell>
306
<cell align="center">N</cell>
309
<cell align="center">104</cell>
310
<cell align="center">Arity</cell>
311
<cell align="center">Elements</cell>
313
<tcaption></tcaption></table>
315
<c>SMALL_TUPLE_EXT</c> encodes a tuple. The <c>Arity</c>
316
field is an unsigned byte that determines how many element
317
that follows in the <c>Elements</c> section.
322
<marker id="LARGE_TUPLE_EXT"/>
323
<title>LARGE_TUPLE_EXT</title>
327
<cell align="center">1</cell>
328
<cell align="center">4</cell>
329
<cell align="center">N</cell>
332
<cell align="center">105</cell>
333
<cell align="center">Arity</cell>
334
<cell align="center">Elements</cell>
336
<tcaption></tcaption></table>
339
<seealso marker="#SMALL_TUPLE_EXT">SMALL_TUPLE_EXT</seealso>
340
with the exception that <c>Arity</c> is an
341
unsigned 4 byte integer in big endian format.
346
<marker id="NIL_EXT"/>
347
<title>NIL_EXT</title>
351
<cell align="center">1</cell>
354
<cell align="center">106</cell>
356
<tcaption></tcaption></table>
358
The representation for an empty list, i.e. the Erlang syntax <c>[]</c>.
363
<marker id="STRING_EXT"/>
364
<title>STRING_EXT</title>
368
<cell align="center">1</cell>
369
<cell align="center">2</cell>
370
<cell align="center">Len</cell>
373
<cell align="center">107</cell>
374
<cell align="center">Length</cell>
375
<cell align="center">Characters</cell>
377
<tcaption></tcaption></table>
379
String does NOT have a corresponding Erlang representation,
380
but is an optimization for sending lists of bytes (integer in
381
the range 0-255) more efficiently over the distribution.
382
Since the <c>Length</c> field is an unsigned 2 byte integer
383
(big endian), implementations must make sure that lists longer than
384
65535 elements are encoded as
385
<seealso marker="#LIST_EXT">LIST_EXT</seealso>.
391
<marker id="LIST_EXT"/>
392
<title>LIST_EXT</title>
396
<cell align="center">1</cell>
397
<cell align="center">4</cell>
398
<cell align="center"> </cell>
399
<cell align="center"> </cell>
402
<cell align="center">108</cell>
403
<cell align="center">Length</cell>
404
<cell align="center">Elements</cell>
405
<cell align="center">Tail</cell>
407
<tcaption></tcaption></table>
410
<c>Length</c> is the number of elements that follows in the
411
<c>Elements</c> section. <c>Tail</c> is the final tail of
413
<seealso marker="#NIL_EXT">NIL_EXT</seealso>
414
for a proper list, but may be anything type if the list is
415
improper (for instance <c>[a|b]</c>).
420
<marker id="BINARY_EXT"/>
421
<title>BINARY_EXT</title>
425
<cell align="center">1</cell>
426
<cell align="center">4</cell>
427
<cell align="center">Len</cell>
430
<cell align="center">109</cell>
431
<cell align="center">Len</cell>
432
<cell align="center">Data</cell>
434
<tcaption></tcaption></table>
436
Binaries are generated with bit syntax expression or with
437
<seealso marker="kernel:erlang#list_to_binary/1">list_to_binary/1</seealso>,
438
<seealso marker="kernel:erlang#term_to_binary/1">term_to_binary/1</seealso>,
439
or as input from binary ports.
440
The <c>Len</c> length field is an unsigned 4 byte integer
446
<marker id="SMALL_BIG_EXT"/>
447
<title>SMALL_BIG_EXT</title>
451
<cell align="center">1</cell>
452
<cell align="center">1</cell>
453
<cell align="center">1</cell>
454
<cell align="center">n</cell>
457
<cell align="center">110</cell>
458
<cell align="center">n</cell>
459
<cell align="center">Sign</cell>
460
<cell align="center">d(0) ... d(n-1)</cell>
462
<tcaption></tcaption></table>
464
Bignums are stored in unary form with a <c>Sign</c> byte
465
that is 0 if the binum is positive and 1 if is negative. The
466
digits are stored with the LSB byte stored first. To
467
calculate the integer the following formula can be used:<br/>
470
(d0*B^0 + d1*B^1 + d2*B^2 + ... d(N-1)*B^(n-1))
475
<marker id="LARGE_BIG_EXT"/>
476
<title>LARGE_BIG_EXT</title>
480
<cell align="center">1</cell>
481
<cell align="center">4</cell>
482
<cell align="center">1</cell>
483
<cell align="center">n</cell>
486
<cell align="center">111</cell>
487
<cell align="center">n</cell>
488
<cell align="center">Sign</cell>
489
<cell align="center">d(0) ... d(n-1)</cell>
491
<tcaption></tcaption></table>
493
Same as <seealso marker="#SMALL_BIG_EXT">SMALL_BIG_EXT</seealso>
494
with the difference that the length field
495
is an unsigned 4 byte integer.
501
<marker id="NEW_CACHE"/>
502
<title>NEW_CACHE</title>
506
<cell align="center">1</cell>
507
<cell align="center">1</cell>
508
<cell align="center">2</cell>
509
<cell align="center">Len</cell>
512
<cell align="center">78</cell>
513
<cell align="center">index</cell>
514
<cell align="center">Len</cell>
515
<cell align="center">Atom name</cell>
517
<tcaption></tcaption></table>
519
NEW_CACHE works just like
520
<seealso marker="#ATOM_EXT">ATOM_EXT</seealso>,
521
but it must also cache
522
the atom in the atom cache in the location given by index.
523
The atom cache is currently only used between real Erlang nodes
524
(not between Erlang nodes and C or Java nodes).
529
<marker id="CACHED_ATOM"/>
530
<title>CACHED_ATOM</title>
534
<cell align="center">1</cell>
535
<cell align="center">1</cell>
538
<cell align="center">67</cell>
539
<cell align="center">index</cell>
541
<tcaption></tcaption></table>
543
When the atom cache is in use, index is the slot number in which
544
the atom MUST be located.
549
<marker id="NEW_REFERENCE_EXT"/>
550
<title>NEW_REFERENCE_EXT</title>
554
<cell align="center">1</cell>
555
<cell align="center">2</cell>
556
<cell align="center">N</cell>
557
<cell align="center">1</cell>
558
<cell align="center">N'</cell>
561
<cell align="center">114</cell>
562
<cell align="center">Len</cell>
563
<cell align="center">Node</cell>
564
<cell align="center">Creation</cell>
565
<cell align="center">ID ...</cell>
567
<tcaption></tcaption></table>
569
Node and Creation are as in
570
<seealso marker="#REFERENCE_EXT">REFERENCE_EXT</seealso>.
573
<c>ID</c> contains a sequence of big-endian unsigned integers
574
(4 bytes each, so <c>N'</c> is a multiple of 4),
575
but should be regarded as uninterpreted data.
578
<c>N'</c> = 4 * <c>Len</c>.
581
In the first word (four bytes) of <c>ID</c>, only 18 bits are
582
significant, the rest should be 0.
583
In <c>Creation</c>, only 2 bits are significant,
584
the rest should be 0.
587
NEW_REFERENCE_EXT was introduced with distribution version 4.
588
In version 4, <c>N'</c> should be at most 12.
591
See <seealso marker="#REFERENCE_EXT">REFERENCE_EXT</seealso>).
596
<marker id="FUN_EXT"/>
597
<title>FUN_EXT</title>
601
<cell align="center">1</cell>
602
<cell align="center">4</cell>
603
<cell align="center">N1</cell>
604
<cell align="center">N2</cell>
605
<cell align="center">N3</cell>
606
<cell align="center">N4</cell>
607
<cell align="center">N5</cell>
610
<cell align="center">117</cell>
611
<cell align="center">NumFree</cell>
612
<cell align="center">Pid</cell>
613
<cell align="center">Module</cell>
614
<cell align="center">Index</cell>
615
<cell align="center">Uniq</cell>
616
<cell align="center">Free vars ...</cell>
618
<tcaption></tcaption></table>
620
<tag><c>Pid</c></tag>
622
is a process identifier as in
623
<seealso marker="#PID_EXT">PID_EXT</seealso>.
624
It represents the process in which the fun was created.
626
<tag><c>Module</c></tag>
628
is an encoded as an atom, using
629
<seealso marker="#ATOM_EXT">ATOM_EXT</seealso>,
630
<seealso marker="#NEW_CACHE">NEW_CACHE</seealso>
631
or <seealso marker="#CACHED_ATOM">CACHED_ATOM</seealso>.
632
This is the module that the fun is implemented in.
634
<tag><c>Index</c></tag>
636
is an integer encoded using
637
<seealso marker="#SMALL_INTEGER_EXT">SMALL_INTEGER_EXT</seealso>
638
or <seealso marker="#INTEGER_EXT">INTEGER_EXT</seealso>.
639
It is typically a small index into the module's fun table.
641
<tag><c>Uniq</c></tag>
643
is an integer encoded using
644
<seealso marker="#SMALL_INTEGER_EXT">SMALL_INTEGER_EXT</seealso> or
645
<seealso marker="#INTEGER_EXT">INTEGER_EXT</seealso>.
646
<c>Uniq</c> is the hash value of the parse for the fun.
648
<tag><c>Free vars</c></tag>
650
is <c>NumFree</c> number of terms, each one encoded according
657
<marker id="NEW_FUN_EXT"/>
658
<title>NEW_FUN_EXT</title>
662
<cell align="center">1</cell>
663
<cell align="center">4</cell>
664
<cell align="center">1</cell>
665
<cell align="center">16</cell>
666
<cell align="center">4</cell>
667
<cell align="center">4</cell>
668
<cell align="center">N1</cell>
669
<cell align="center">N2</cell>
670
<cell align="center">N3</cell>
671
<cell align="center">N4</cell>
672
<cell align="center">N5</cell>
675
<cell align="center">112</cell>
676
<cell align="center">Size</cell>
677
<cell align="center">Arity</cell>
678
<cell align="center">Uniq</cell>
679
<cell align="center">Index</cell>
680
<cell align="center">NumFree</cell>
681
<cell align="center">Module</cell>
682
<cell align="center">OldIndex</cell>
683
<cell align="center">OldUniq</cell>
684
<cell align="center">Pid</cell>
685
<cell align="center">Free Vars</cell>
687
<tcaption></tcaption></table>
689
This is the new encoding of internal funs: <c>fun F/A</c> and
690
<c>fun(Arg1,..) -> ... end</c>.
693
<tag><c>Size</c></tag>
695
is the total number of bytes, including the <c>Size</c> field.
697
<tag><c>Arity</c></tag>
699
is the arity of the function implementing the fun.
701
<tag><c>Uniq</c></tag>
703
is the 16 bytes MD5 of the significant parts of the Beam file.
705
<tag><c>Index</c></tag>
707
is an index number. Each fun within a module has an unique
708
index. <c>Index</c> is stored in big-endian byte order.
710
<tag><c>NumFree</c></tag>
712
is the number of free variables.
714
<tag><c>Module</c></tag>
716
is an encoded as an atom, using
717
<seealso marker="#ATOM_EXT">ATOM_EXT</seealso>,
718
<seealso marker="#NEW_CACHE">NEW_CACHE</seealso> or
719
<seealso marker="#CACHED_ATOM">CACHED_ATOM</seealso>.
720
This is the module that the fun is implemented in.
722
<tag><c>OldIndex</c></tag>
724
is an integer encoded using
725
<seealso marker="#SMALL_INTEGER_EXT">SMALL_INTEGER_EXT</seealso>
726
or <seealso marker="#INTEGER_EXT">INTEGER_EXT</seealso>.
727
It is typically a small index into the module's fun table.
729
<tag><c>OldUniq</c></tag>
731
is an integer encoded using
732
<seealso marker="#SMALL_INTEGER_EXT">SMALL_INTEGER_EXT</seealso>
734
<seealso marker="#INTEGER_EXT">INTEGER_EXT</seealso>.
735
<c>Uniq</c> is the hash value of the parse tree for the fun.
737
<tag><c>Pid</c></tag>
739
is a process identifier as in
740
<seealso marker="#PID_EXT">PID_EXT</seealso>.
741
It represents the process in which
745
<tag><c>Free vars</c></tag>
747
is <c>NumFree</c> number of terms, each one encoded according
754
<marker id="EXPORT_EXT"/>
755
<title>EXPORT_EXT</title>
759
<cell align="center">1</cell>
760
<cell align="center">N1</cell>
761
<cell align="center">N2</cell>
762
<cell align="center">N3</cell>
765
<cell align="center">113</cell>
766
<cell align="center">Module</cell>
767
<cell align="center">Function</cell>
768
<cell align="center">Arity</cell>
770
<tcaption></tcaption></table>
772
This term is the encoding for external funs: <c>fun M:F/A</c>.
775
<c>Module</c> and <c>Function</c> are atoms
776
(encoded using <seealso marker="#ATOM_EXT">ATOM_EXT</seealso>,
777
<seealso marker="#NEW_CACHE">NEW_CACHE</seealso> or
778
<seealso marker="#CACHED_ATOM">CACHED_ATOM</seealso>).
781
<c>Arity</c> is an integer encoded using
782
<seealso marker="#SMALL_INTEGER_EXT">SMALL_INTEGER_EXT</seealso>.
788
<marker id="BIT_BINARY_EXT"/>
789
<title>BIT_BINARY_EXT</title>
793
<cell align="center">1</cell>
794
<cell align="center">4</cell>
795
<cell align="center">1</cell>
796
<cell align="center">Len</cell>
799
<cell align="center">77</cell>
800
<cell align="center">Len</cell>
801
<cell align="center">Bits</cell>
802
<cell align="center">Data</cell>
804
<tcaption></tcaption></table>
806
This term represents a bitstring whose length in bits is not a
807
multiple of 8 (created using the bit syntax in R12B and later).
808
The <c>Len</c> field is an unsigned 4 byte integer (big endian).
809
The <c>Bits</c> field is the number of bits that are used
810
in the last byte in the data field,
811
counting from the most significant bit towards the least
819
<marker id="NEW_FLOAT_EXT"/>
820
<title>NEW_FLOAT_EXT</title>
824
<cell align="center">1</cell>
825
<cell align="center">8</cell>
828
<cell align="center">70</cell>
829
<cell align="center">IEEE float</cell>
831
<tcaption></tcaption></table>
833
A float is stored as 8 bytes in big-endian IEEE format.
836
This term is used in minor version 1 of the external format.