1
This is currently my sum total documentation on the AMF format, and
2
until I find some binary AMF data to test with, it'll be hard to know
3
if I've got this initial reverse engineering project working
6
Anyone that gets the urge is welcome to split this library off from
7
Gnash and make it be a standalone development reesource. There don't
8
appear to be any GPL'd AMF tools out there that I could find non
11
===========================================================
13
All of this info came from OSFlash:
14
http://osflash.org/amf
17
AMF has core data types that are there every step of the way for
18
serializing data. These should not be confused with the AMF
19
actionscript data types. The core data types include:
29
An AMF Byte is the simplest data type to read and write. It is simply
32
An AMF Int is made up of 2 consecutive bytes. It represents a 16-bit
33
number. The first byte in the file/stream is the most significant bit
34
and the second byte in the file/stream is the LSB.
36
An AMF MediumInt is made up of 3 consecutive bytes. It represents a
37
24-bit number. The first byte in the file/stream is the most
38
significant bit and the third byte in the file/stream is the
39
LSB. MediumInt's appear to be used exclusively by FlashCom.
41
The AMF Long is made up of 4 consecutive bytes. It represents a 32-bit
42
number. Like the Int and MediumInt, it is unsigned and the LSB is on
45
The AMF Double is made up of 8 consecutive bytes. It represents a
46
floating point, signed number. The double is little-endian encoded. In
47
PHP a double can be read in the following way (this should also work
48
for any language that has a pack function):
50
The AMF UTF8 represents a string shorter than 2^16 bytes. It is
51
composed of an Int (2 bytes) representing string length followed by
52
the UTF8-encoded string.
54
The AMF LongUTF8 represents a string potentially longer than 2^16
55
bytes. It is composed of an LongInt (4 bytes) representing string
56
length followed by the UTF8-encoded string.
61
A Remoting request from the client consists of a short preamble,
62
headers, and bodys. The preamble contains basic information about the
63
nature of the request. Headers can be used to request debugging
64
information, send authentication info, tag transactions, etc. Bodies
65
contain actual Remoting requests and responses. A single Remoting
66
envelope can contain several requests; Remoting supports batching out
69
Client headers and bodies need not be responded to in a one-to-one
70
manner. That is, a body or header may not require a response. Debug
71
information is requested by a header but sent back as a body
72
object. The response index is essential for Flash player to understand
73
the response therefore.
76
The first byte of the AMF file/stream is believed to be a version
77
indicator. So far the only valid value for this field that has been
78
found is 0�00. If it is anything other than 0�00 (zero), your
79
system should consider the AMF file/stream to be
80
'cmalformed'd. This can happen in the IDE if AMF calls are put
81
on the stack but never executed and the user exits the movie from the
82
IDE; the two top bytes will be random and the number of headers will
85
The second byte of the AMF file/stream is appears to be 0�00 if the
86
client is the Flash Player and 0�01 if the client is the FlashCom
89
The third and fourth bytes form an integer value that specifies the
94
Each header consists of the following:
96
* UTF string (including length bytes) - name
97
* Boolean - specifies if understanding the header is `required'
98
* Long - Length in bytes of header
99
* Variable - Actual data (including a type code)
101
AMF headers may be user-created. However, certain headers have certain
102
meaning that a gateway should respond to. See predefined headers for
106
Between the headers and the start of the bodies is a int specifying
107
the number of bodies. Each body consists of the following:
109
* UTF String - Target
110
* UTF String - Response
111
* Long - Body length in bytes
112
* Variable - Actual data (including a type code)
114
The target may be one of the following:
116
* An http or https URL. In that case the gateway should respond by
117
sending a SOAP request to that URL with the specified data. In that
118
case the data will be an array and the first key (data[0]) contains
119
the parameters to be sent.
120
* A string with at least one period (.). The value to the right of
121
the right-most period is the method to be called. The value to the
122
left of that is the service to be invoked including package name. In
123
that case data will be an array of arguments to be sent to the
126
The response is a string that gives the body an id so it can be
127
tracked by the player.
131
The response to a request has the exact same structure as a request. A
132
request requiring a body response should be answered in the following
135
* Target: set to Response index plus one of "/onStatus",
136
"onResult", or "/onDebugEvents". "/onStatus" is reserved for
137
runtime errors. "/onResult" is for succesful calls. "/onDebugEvents"
138
is for debug information, see debug information. Thus if the client
139
requested something with response index 1/1', and the call was succesful,
140
1/1/onResult' should be sent back.
141
* Response: should be set to the string `null'.
142
* Data: set to the returned data.
145
==========================================================
146
http://www.vanrijkom.org/archives/2005/06/amf_format.html
156
The folowing elements are defined within AMF. These are all based on
157
their ActionScript equivalants.
177
TypedObject(Class) 0x10;
180
For terminating sequences, a byte with value 0x09 is used.
182
Number: 0x00 B7 B6 ��� B0
184
Numbers in AMF are 64 bit ���Big Endian���. Windows works with little
185
endians, so conversion is required.
187
Boolean: 0x01 B0 (BOOL)
189
BOOL is 0 for FALSE and 1 for TRUE
191
String: 0x02 L0 L1 SMBSTRING
193
L1+L2 is Big Endian, length of the string. String is in multibyte
194
format, prefixed with a 2 byte Big Endian length specifier.
196
Object: 0x03 [SMBSTRING AMFELEMENT ] 0x09
198
An object contains zero or more AMF elements that are prefixed with a
199
multibyte string that indicates the AMF elements identifyer within the
204
An undefined element consists of soley one byte with the value 0x06.
206
Reference (TODO): 0x07 ?
208
A reference refers to an array or object that stored somewhere
209
before. It���s probably a mechanism that prevents
211
Associative Array: 0x08 L3 L2 L1 L0 [SMBSTRING AMFELEMENT ] 0x09
213
L0..L3 for a 32 bit number indicating the number of elements present
214
in the array. The length of the array is followed by (length) AMF
215
elements that are prefixed with a multibyte string (with a 2 byte
216
length prefix) that indicates the AMF elements identifyer within the
219
Array: 0x0A L3 L2 L1 L0 [ AMFELEMENT ]
221
L0..L3 form a 32 bit Big Endian number indicating the number of
222
elements present in the array. The size of the array is followed by
223
(length) AMF elements. Note that this collection is NOT terminated using 0x09.
225
Date: 0x0B T7 T6 .. T0 Z1 Z2
227
T7 to T0 form a 64 bit Big Endian number that specifies the number
228
of nanoseconds that have passed since 1/1/1970 0:00 to the
229
specified time. This format is ���UTC 1970���. Z1 an Z0 for a 16 bit
230
Big Endian number indicating the indicated time���s timezone.
234
To do %G�–%@ meaning unknown.
238
The multi-byte string is prefixed with a 32 bit Big Endian number,
239
indicating the length of the multibytestring that follows.
241
Class: 0x10 SMBSTRING [ SMBSTRING AMFELEMENT ] 0x09
243
A class element is similar to an object element, but has a class
244
name indentifyer string prefixed to the array of member elements.
248
Currently AMF is used with SharedObjectsLocal (.sol) files,
249
Local Connection and Flash Remoting.
252
// S3 .. S0 forms a 32 bit Big Endian number indicating the size of
253
// the file. The small multibyte string reflects the name of the
254
// object shared in the file. The array that follows has pairs of
255
// name (SMBSTRING) lue pairs.
258
16 big big-endian number follows the tag identifier 0x07.
260
------------------------------------
261
http://sourceforge.net/docman/display_doc.php?docid=27130&group_id=131628
263
Flashcoders Wiki - SharedObjectFile
266
The format of the shared object files
268
This script creates the described .sol file
270
so = SharedObject.getLocal("test");
272
so.data.myFloat = Math.PIE;
273
so.data.myString = 'ralle';
274
so.data.myIntArray = [1,2,3];
275
so.data.myStringArray = ['eins','zwei'];
276
so.data.myObject1 = {p1: 5, p2: 6};
277
so.data.myObject2 = {p3: 'hallo', p4: 8};
278
so.data.myDate = new Date();
279
so.data.myXML = new XML("<start><p>test</p><p>test2</p></start>");
280
so.data.myBool = true;
286
len 00 00 01 04 //length of file starting at filetype
287
filetype TCSO ... 00 04 00 00 00 00
290
00 05 myInt 00 40 1C 00 00 00 00 00 00 00
291
len name typ floatval end
294
00 07 myFloat 00 40 09 21 FB 54 44 2D 18 00
295
len name typ floatval end
297
//myString = "ralle";
298
00 08 myString 02 00 05 r a l l e 00
299
len name typ lenstr str end
301
//myIntArray = [1,2,3];
302
00 0A myIntArray 08 00 00 00 03
303
len name typ countidx
305
00 01 '0' 00 3F F0 00 00 00 00 00 00 !! no end here
306
00 01 '1' 00 40 00 00 00 00 00 00 00
307
00 01 '2' 00 40 08 00 00 00 00 00 00
308
lenidx idx typ floatval
313
//myStringArray = ["eins","zwei"];
314
00 0D myStringArray 08 00 00 00 02
317
00 10 '0' 02 00 04 e i n s
318
00 01 '1' 02 00 04 z w e i
319
lenidx idx typ strlen string
324
//myObject1 = {p1: 5, p2: 6};
326
00 02 p2 00 40 18 00 00 00 00 00 00 !!no end here
327
00 02 p1 00 40 14 00 00 00 00 00 00
328
lenidx idx typ floatval
333
//myObject2 = {p3: "hallo", p4: 8};
335
00 02 p4 00 40 20 00 00 00 00 00 00 00
336
lenidx idx typ floatval end
337
00 20 p3 02 00 05 hallo
338
lenidx idx typ strlen string
345
It seems that the previous sample isn't exact there is no end byte on the first property:
347
00 02 p4 00 40 20 00 00 00 00 00 00 00
348
lenidx idx typ floatval end
349
00 20 p3 02 00 05 hallo
350
lenidx idx typ strlen string
355
//myDate = new Date();
357
42 6D DA E6 18 52 C0 00 FF 88 00
358
floatval-getTime timezone (mins -ve of GMT)
359
FF88 = GMT +2 (-2 * 60)
360
FDE4 = GMT +9 (-9 * 60)
362
01E0 = GMT -8 (8 * 60)
364
//myXML = new XML("<start><p>test</p><p>test2</p></start>")
366
00 00 00 26 <start><p>test</p><p>test2</p></start> 00
375
Here are all the data type ids I can find:
384
0A : raw Array (amf only)
386
0D : object String, Number, Boolean, TextFormat
388
10 : object CustomClass
390
Where CustomClass is a class registered with Object.registerClass
396
Object.registerClass("classID",MyClass);
397
myObj = new MyClass();
399
00 05 m y o b j 10 00 07 c l a s s I D //len name typ class identifier
400
00 01 x 00 3F F0 00 00 00 00 00 00 //props
405
and here is a function to convert binary floats to a float
407
function binaryFloatStringToFloat(bfs){
408
var c0 = ord(bfs.charAt(0));
409
var c1 = ord(bfs.charAt(1));
412
var sign = (c0 & (1 << 7)) ? -1 : 1; //negative if highest bit is set
415
c0 &= 0x7f; //delete sign
416
var exp = (((c0 << 8) + c1) >> 4) - 1023;
420
var sum = (c1 & 0x0f) / Math.pow(2, e); //delete upper four bits
424
var byte = ord(bfs.charAt(byteIdx));
425
sum += byte / Math.pow(2, e);
426
} while (++byteIdx < bfs.length);
428
//trace(sum + " " + exp + " " + sign);
429
return (1 + sum) * Math.pow(2, exp) * sign;
433
s = String.fromCharCode(0x40,0x08,0,0,0,0,0,0); //3 //s = String.fromCharCode(0x40,0x1c,0,0,0,0,0,0); //7 //s = String.fromCharCode(0x40,0x0A,0x66,0x66,0x66,0x66,0x66,0x66); //3.3
434
trace(binaryFloatStringToFloat(s));
439
This is the first RTMP message:
442
$10 = "\003\000\000\017\000\000%G%@\024\000\000\000\000\002\000\aconnect\000?%G%@\000\000\000\000\000\000\003\000\003app\002\000#software/gnash/tests/1153948634.flv\000\bflashVer\002\000\fLNX 6,0,82,0\000\006swfUrl\002\000\035file:///file|%2Ftmp%2Fout.swf%G%@\000\005tcUrl\002\0004rtmp://localhost/software/gnash/tests/1153948634.flv\000\000\t\002\000\005userx"...
445
*0x3,* 0x0, 0x0, 0xf, 0x0, 0x0, *0xc9,*0x14, 0x00, 0x00, 0x00, 0x0, 0x2, 0x0, 0x7,
446
0x63, 0x6f, 0x6e, 0x6e, 0x65, 0x63, 0x74, 0x00, 0x3f, 0xf0, 0x00, 0x0, 0x0, 0x0,
447
0x0, 0x0, 0x3, 0x0, 0x3, 0x61, 0x70, 0x70, 0x02, 0x00, 0x23, 0x73, 0x6f, 0x66, 0x74,
448
0x77, 0x61, 0x72, 0x65, 0x2f, 0x67, 0x6e, 0x61, 0x73, 0x68, 0x2f, 0x74, 0x65,
449
0x73, 0x74, 0x73, 0x2f, 0x31, 0x31, 0x35, 0x33, 0x39, 0x34, 0x38, 0x36, 0x33,
450
0x34, 0x2e, 0x66, 0x6c, 0x76, 0x0, 0x8, 0x66, 0x6c, 0x61, 0x73, 0x68, 0x56, 0x65,
451
0x72, 0x2, 0x0, 0xc, 0x4c, 0x4e, 0x58, 0x20, 0x36, 0x2c, 0x30, 0x2c, 0x38, 0x32,
452
0x2c, 0x30, 0x0, 0x6, 0x73, 0x77, 0x66, 0x55, 0x72, 0x6c, 0x02, 0x00, 0x1d, 0x66,
453
0x69, 0x6c, 0x65, 0x3a, 0x2f, 0x2f, 0x2f, 0x66, 0x69, 0x6c, 0x65, 0x7c, 0x25,
454
0x32, 0x46, 0x74, 0x6d, 0x70, 0x25, 0x32, 0x46, 0x6f, 0x75, 0x74, 0x2e, 0x73,
455
0x77, 0x66,*0xc3*,0x0, 0x5, 0x74, 0x63, 0x55, 0x72, 0x6c, 0x02, 0x00, 0x34, 0x72,
456
0x74, 0x6d, 0x70, 0x3a, 0x2f, 0x2f, 0x6c, 0x6f, 0x63, 0x61, 0x6c, 0x68, 0x6f,
457
0x73, 0x74, 0x2f, 0x73, 0x6f, 0x66, 0x74, 0x77, 0x61, 0x72, 0x65, 0x2f, 0x67,
458
0x6e, 0x61, 0x73, 0x68, 0x2f, 0x74, 0x65, 0x73, 0x74, 0x73, 0x2f, 0x31, 0x31,
459
0x35, 0x33, 0x39, 0x34, 0x38, 0x36, 0x33, 0x34, 0x2e, 0x66, 0x6c, 0x76, 0x0, 0x0,
460
0x9, 0x2, 0x0, 0x5, 0x75, 0x73, 0x65, 0x72, 0x78...}
463
"\003\000\000\017\000\000
464
%G%@ \024\000\000\000\000\002\000
467
?%G %@ \000\000\000\000\000\000
476
0x09 is also the terminating byte of and object definition
477
All numbers are 64 bit big endian in AMF
481
0x02 <--- string type
483
0x00 <---- number type (64 bit big endian)
484
0x3f 0xf0 0x00 0x00 0x00 0x00 0x00 0x00
486
0x03 <--- object type
488
0x02 <--- string type
489
0x00 0035 software/gnash/tests/1153948634.flv
492
0x02 <--- string type
493
0x00 0xf LNX 6,0,82,0
496
0x02 <--- string type
497
0x00 0x00 35 file:///file|/%2Ftmp%2Fout.swf
500
0xc3 <---- header byte
502
0x02 <--- string type
503
0x00 0x00 0x04 rtmp://localhost/software/gnash/tests/1153948634.flv
505
0x00 0x00 0x09 <--- end of object definition
511
00 05 myInt 00 40 1C 00 00 00 00 00 00 00
512
len name typ floatval end
515
00 07 myFloat 00 40 09 21 FB 54 44 2D 18 00
516
len name typ floatval end
518
//myString = "ralle";
519
00 08 myString 02 00 05 r a l l e 00
520
len name typ lenstr str end
522
//myIntArray = [1,2,3];
523
00 0A myIntArray 08 00 00 00 03
524
len name typ countidx
526
00 01 '0' 00 3F F0 00 00 00 00 00 00 !! no end here
527
00 01 '1' 00 40 00 00 00 00 00 00 00
528
00 01 '2' 00 40 08 00 00 00 00 00 00
529
lenidx idx typ floatval
535
Good examples for SharedObj
536
http://www.flash-communications.net/technotes/sharedObjectEditor/editor.html
1
*** Please *** read this in it's entirety before making any changes to
2
the code in this directory!
3
===========================
5
All of the information in this document has been figured out through
6
reverse engineering, so it's entirely possible there are some minor
7
misconceptions about concepts. These explanations are based on many
8
months of staring at hex dumps of files, memory segments, and network
9
traffic. Many thanks to the volunteers that use the proprietary player
10
for allowing their data to be captured, and their disk drives
13
AMF is the lowest level representation of an SWF data type. Up until
14
swf version 8, the format is refered to as AMF0, and is widely used on
15
swf files for the SharedObject and LocalConnection ActionScript
16
classes, as well as for remoting and RTMP based streaming.
18
As of swf version 9, the a new version was created called AMF3, since
19
it only works with the new ActionScript 3 classes. The main reason for
20
this is performance. For example, AMF0 has only a single data type for
21
numbers, which is a double (8 bytes). AMF3 introduced a new integer
22
data type, which also supports a simplified packing scheme where if
23
the first bit is set, the only 3 bytes, instead of the usual 4 for an
26
Currently the AMF implementation in Gnash only supports AMF0. Although
27
there are bits of AMF3 implemented for the various constants used for
28
handling AMF objects, none of the code is currently using it until we
29
actually see some AMF3 based swf files out in the wild. Most of the
30
time AMF0 is used by swf version 9 anyway, and uses s special data
31
type to switch to AMF3 for that particular piece of data.
33
Another big difference is that AMF3 has more optimizations, supporting
34
a simple caching scheme where after an object has been sent, it can be
35
referred to by an index number. Where AMF0 had multiple types for the
36
various commonly used ActionScript classes, AMF3 instead as a single
37
object type, which can be defined to be an existing or custom
38
class. Other optimizations include removing the AMF0 Boolean data
39
type, which was 3 bytes ling by a single byte which is either true or
42
The usage of AMF has two main types of usage, one is a simple encoding
43
of basic data types, like string and number. The other usage is used
44
for the properties of ActionScript class objects. A basic AMF object
45
has a 3 byte header field. The first byte is the type, followed by 2
46
bytes to hold the length. A property uses the same format to hold
47
the data, but is preceeded by the name of the property, which is
48
preceeded by two bytes that contsain the length of the name.
50
As an AMF object doesn't do much but hold data, a header is used to
51
signify the number of objects, file size, etc... The SharedObject and
52
LocalConnection ActionScript classes have different headers, as
53
SharedObjects store AMF objects in a disk based file, while
54
LocalConnection stores AMF objects in a shared memory segment.
56
The basic lowdown on the classes are as follows. The Buffer class is
57
used to hold all data. We specifically don't use a std::vector because
58
this class is so heavily used. We want to avoid memory fragmentation,
59
which often happens when using classes from libstdc++. This class also
60
has special methods for handling the data types we use, so this class
61
would need to exist anyway. In it's simplest form, this is merely an
62
array of unsigned bytes with a length.
64
As a raw buffer is pretty useless for higher level processing, the
65
Element class is used to represent an AMF object. After a buffer is
66
read, it's data is extracted into a series of Elements. An Element
67
still uses the Buffer class to hold the data, but often this is a much
68
smaller buffer than the one used to read data. Most Elements are
69
simply a numeric value or string, but Elements can also hold a higher
70
level ActionScript class, including all of it's properties. The main
71
internal difference is that the properties of an ActionScript class
72
have a name, which is the name of the property, in addition to the
73
data. Only an Element of the data type *OBJECT* can have properties.
74
Properties of an object are stored as an array of more Elements, each
75
one representing a single AMF data type. Note that when allocated, the
76
memory in the Buffer points to is *not* memset() to 0. All Buffers are
77
the exact size they need to be. Setting the memory to zero is nice for
78
debugging in GBD, as you get nice clean hex dumps that way, but
79
imagine the performance hit if every single time a Buffer is
80
allocated, each bytes must be set to zero. If you want to set a
81
Buffer's data to zero for debugging, use the Buffer::clear() method,
82
and don't forget to remove it later so it doesn't become a subtle
85
The AMF class is used to encode and decode data between Buffers and
86
Elements. When encoding, all the methods are static, as no data needs
87
to be retained between usages of the data. Note that all the
88
AMF::encode*() methods allocate a Buffer, which then later needs to be
89
freed after usage. Once again, smart pointers, while useful are
90
avoided because of the memory fragmentation issue for heavily used
91
code. While this sort of defeats the purpose of both C++ and object
92
oriented programming, that's life when working with high-performance,
93
data-driven code. All decoding is handled by the non static
94
AMF::extract*{} methods. These are not static as they must retain the
95
current amount of data that has been parsed so subsequent decoding
96
starts in the right place.
98
The the only difference between the two higher level classes SOL and
99
LcShm, are where the data is stored (disk or memory), and the
100
appropriate headers for the data.
102
LocalConnection, on unix based systems uses the older SYSV style
103
shared memory segments. These are always the same size, 64528
104
bytes. There are two sections in the shared memory segment. One I call
105
"Listeners", not to be confused with ActionScript object Listeners,
106
although the concept is similar. LocalConnection is used as a
107
bi-directional way to transfer AMF objects between swf movies, instead
108
of using a network connection. When a swf movie attachs to the
109
LovalConnection shared memory segment, it registers itself by writing
110
it's name into the Listener section.
112
This registration step turns out to be optional, as it is possible to
113
send and receive data by polling for changes. This is of course a huge
114
security problem, as it allows any client to secretly monitor or inject
115
the communication between multiple swf files in an untraceable
116
way. Some web sites, YouTube in particular, exploit this feature by
117
never registering themselves as a Listener, so beware.
121
Note to developers. Please be very careful making any changes to this
122
code without seriously understanding how the code works. Byte
123
manipulation is very easy to screw up, minor changes can often cause
124
major problems. Anyone making changes here should run the libamf.all
125
test cases to make sure they haven't introduced breakage.
127
As a further note, valgrind gets confused with type casting sometimes,
128
displaying errors where there are none. As all data is stored as
129
unsigned bytes, to extract numeric values like the length often cause
130
valgrind to assume there are errors with word alignment. Eliminating
131
valgrind errors is a good thing though, so sometimes we have to jump
132
through hoops to keep it quite. Often this requires playing silly
133
games with local variables and multiple type casts. This makes the
134
code a bit convoluted at times, but that's life if you want solid code.
136
As this code does much allocation and deleting of memory blocks. After
137
any changes make sure there are no memory leaks. This can be done with
138
valgrind (eliminating the stupid valgrind errors makes it obvious).
139
Optionally, the Memory class in libbase/gmemory.h contains supported
140
for a valgrind like API that is under programmer control. Look at the
141
test cases in libamf.all for usage examples. Memory::analyze() will do
142
the same thing as valgrind to check to make sure all allocated memory
143
is properly deleted when the program exits. To use the Memory class,
144
you have to configure with --with-statistics=all or
145
--with-statistics=mem.