2
Informal standard M. Nilsson
3
Document: id3v2-00.txt 26th March 1998
8
Status of this document
10
This document is an Informal standard and is released so that
11
implementors could have a set standard before the formal standard is
12
set. The formal standard will use another version number if not
13
identical to what is described in this document. The contents in this
14
document may change for clarifications but never for added or altered
17
Distribution of this document is unlimited.
22
The recent gain of popularity for MPEG layer III audio files on the
23
internet forced a standardised way of storing information about an
24
audio file within itself to determinate its origin and contents.
26
Today the most accepted way to do this is with the so called ID3 tag,
27
which is simple but very limited and in some cases very unsuitable.
28
The ID3 tag has very limited space in every field, very limited
29
numbers of fields, not expandable or upgradeable and is placed at the
30
end of a the file, which is unsuitable for streaming audio. This draft
31
is an attempt to answer these issues with a new version of the ID3
37
2. Conventions in this document
40
3.2. ID3v2 frames overview
41
4. Declared ID3v2 frames
42
4.1. Unique file identifier
43
4.2. Text information frames
44
4.2.1. Text information frames - details
45
4.2.2. User defined text information frame
47
4.3.1. URL link frames - details
48
4.3.2. User defined URL link frame
49
4.4. Involved people list
50
4.5. Music CD Identifier
51
4.6. Event timing codes
52
4.7. MPEG location lookup table
53
4.8. Synced tempo codes
54
4.9. Unsychronised lyrics/text transcription
55
4.10. Synchronised lyrics/text
57
4.12. Relative volume adjustment
60
4.15. Attached picture
61
4.16. General encapsulated object
64
4.19. Recommended buffer size
65
4.20. Encrypted meta frame
66
4.21. Audio encryption
67
4.22. Linked information
68
5. The 'unsynchronisation scheme'
72
A. Appendix A - ID3-Tag Specification V1.1
74
A.2. ID3v1 Implementation
76
A.4. Track addition - ID3v1.1
80
2. Conventions in this document
82
In the examples, text within "" is a text string exactly as it appears
83
in a file. Numbers preceded with $ are hexadecimal and numbers
84
preceded with % are binary. $xx is used to indicate a byte with
85
unknown content. %x is used to indicate a bit with unknown content.
86
The most significant bit (MSB) of a byte is called 'bit 7' and the
87
least significant bit (LSB) is called 'bit 0'.
89
A tag is the whole tag described in this document. A frame is a block
90
of information in the tag. The tag consists of a header, frames and
91
optional padding. A field is a piece of information; one value, a
92
string etc. A numeric string is a string that consists of the
98
The two biggest design goals were to be able to implement ID3v2
99
without disturbing old software too much and that ID3v2 should be
102
The first criterion is met by the simple fact that the MPEG [MPEG]
103
decoding software uses a syncsignal, embedded in the audiostream, to
104
'lock on to' the audio. Since the ID3v2 tag doesn't contain a valid
105
syncsignal, no software will attempt to play the tag. If, for any
106
reason, coincidence make a syncsignal appear within the tag it will be
107
taken care of by the 'unsynchronisation scheme' described in section
110
The second criterion has made a more noticeable impact on the design
111
of the ID3v2 tag. It is constructed as a container for several
112
information blocks, called frames, whose format need not be known to
113
the software that encounters them. At the start of every frame there
114
is an identifier that explains the frames's format and content, and a
115
size descriptor that allows software to skip unknown frames.
117
If a total revision of the ID3v2 tag should be needed, there is a
118
version number and a size descriptor in the ID3v2 header.
120
The ID3 tag described in this document is mainly targeted to files
121
encoded with MPEG-2 layer I, MPEG-2 layer II, MPEG-2 layer III and
122
MPEG-2.5, but may work with other types of encoded audio.
124
The bitorder in ID3v2 is most significant bit first (MSB). The
125
byteorder in multibyte numbers is most significant byte first (e.g.
126
$12345678 would be encoded $12 34 56 78).
128
It is permitted to include padding after all the final frame (at the
129
end of the ID3 tag), making the size of all the frames together
130
smaller than the size given in the head of the tag. A possible purpose
131
of this padding is to allow for adding a few additional frames or
132
enlarge existing frames within the tag without having to rewrite the
133
entire file. The value of the padding bytes must be $00.
138
The ID3v2 tag header, which should be the first information in the
139
file, is 10 bytes as follows:
141
ID3/file identifier "ID3"
144
ID3 size 4 * %0xxxxxxx
146
The first three bytes of the tag are always "ID3" to indicate that
147
this is an ID3 tag, directly followed by the two version bytes. The
148
first byte of ID3 version is it's major version, while the second byte
149
is its revision number. All revisions are backwards compatible while
150
major versions are not. If software with ID3v2 and below support
151
should encounter version three or higher it should simply ignore the
152
whole tag. Version and revision will never be $FF.
154
The first bit (bit 7) in the 'ID3 flags' is indicating whether or not
155
unsynchronisation is used (see section 5 for details); a set bit
158
The second bit (bit 6) is indicating whether or not compression is
159
used; a set bit indicates usage. Since no compression scheme has been
160
decided yet, the ID3 decoder (for now) should just ignore the entire
161
tag if the compression bit is set.
163
The ID3 tag size is encoded with four bytes where the first bit (bit
164
7) is set to zero in every byte, making a total of 28 bits. The zeroed
165
bits are ignored, so a 257 bytes long tag is represented as $00 00 02
168
The ID3 tag size is the size of the complete tag after
169
unsychronisation, including padding, excluding the header (total tag
170
size - 10). The reason to use 28 bits (representing up to 256MB) for
171
size description is that we don't want to run out of space here.
173
A ID3v2 tag can be detected with the following pattern:
174
$49 44 33 yy yy xx zz zz zz zz
175
Where yy is less than $FF, xx is the 'flags' byte and zz is less than
179
3.2. ID3v2 frames overview
181
The headers of the frames are similar in their construction. They
182
consist of one three character identifier (capital A-Z and 0-9) and
183
one three byte size field, making a total of six bytes. The header is
184
excluded from the size. Identifiers beginning with "X", "Y" and "Z"
185
are for experimental use and free for everyone to use. Have in mind
186
that someone else might have used the same identifier as you. All
187
other identifiers are either used or reserved for future use.
189
The three character frame identifier is followed by a three byte size
190
descriptor, making a total header size of six bytes in every frame.
191
The size is calculated as framesize excluding frame identifier and
192
size descriptor (frame size - 6).
194
There is no fixed order of the frames' appearance in the tag, although
195
it is desired that the frames are arranged in order of significance
196
concerning the recognition of the file. An example of such order:
199
A tag must contain at least one frame. A frame must be at least 1 byte
200
big, excluding the 6-byte header.
202
If nothing else is said a string is represented as ISO-8859-1
203
[ISO-8859-1] characters in the range $20 - $FF. All unicode strings
204
[UNICODE] use 16-bit unicode 2.0 (ISO/IEC 10646-1:1993, UCS-2). All
205
numeric strings are always encoded as ISO-8859-1. Terminated strings
206
are terminated with $00 if encoded with ISO-8859-1 and $00 00 if
207
encoded as unicode. If nothing else is said newline character is
208
forbidden. In ISO-8859-1 a new line is represented, when allowed, with
209
$0A only. Frames that allow different types of text encoding have a
210
text encoding description byte directly after the frame size. If
211
ISO-8859-1 is used this byte should be $00, if unicode is used it
214
The three byte language field is used to describe the language of the
215
frame's content, according to ISO-639-2 [ISO-639-2].
217
All URLs [URL] may be relative, e.g. "picture.png", "../doc.txt".
219
If a frame is longer than it should be, e.g. having more fields than
220
specified in this document, that indicates that additions to the
221
frame have been made in a later version of the ID3 standard. This
222
is reflected by the revision number in the header of the tag.
225
4. Declared ID3v2 frames
227
The following frames are declared in this draft.
229
4.19 BUF Recommended buffer size
231
4.17 CNT Play counter
233
4.21 CRA Audio encryption
234
4.20 CRM Encrypted meta frame
236
4.6 ETC Event timing codes
237
4.13 EQU Equalization
239
4.16 GEO General encapsulated object
241
4.4 IPL Involved people list
243
4.22 LNK Linked information
245
4.5 MCI Music CD Identifier
246
4.7 MLL MPEG location lookup table
248
4.15 PIC Attached picture
249
4.18 POP Popularimeter
252
4.12 RVA Relative volume adjustment
254
4.10 SLT Synchronized lyric/text
255
4.8 STC Synced tempo codes
257
4.2.1 TAL Album/Movie/Show title
258
4.2.1 TBP BPM (Beats Per Minute)
260
4.2.1 TCO Content type
261
4.2.1 TCR Copyright message
263
4.2.1 TDY Playlist delay
267
4.2.1 TKE Initial key
268
4.2.1 TLA Language(s)
271
4.2.1 TOA Original artist(s)/performer(s)
272
4.2.1 TOF Original filename
273
4.2.1 TOL Original Lyricist(s)/text writer(s)
274
4.2.1 TOR Original release year
275
4.2.1 TOT Original album/Movie/Show title
276
4.2.1 TP1 Lead artist(s)/Lead performer(s)/Soloist(s)/Performing group
277
4.2.1 TP2 Band/Orchestra/Accompaniment
278
4.2.1 TP3 Conductor/Performer refinement
279
4.2.1 TP4 Interpreted, remixed, or otherwise modified by
280
4.2.1 TPA Part of a set
282
4.2.1 TRC ISRC (International Standard Recording Code)
283
4.2.1 TRD Recording dates
284
4.2.1 TRK Track number/Position in set
286
4.2.1 TSS Software/hardware and settings used for encoding
287
4.2.1 TT1 Content group description
288
4.2.1 TT2 Title/Songname/Content description
289
4.2.1 TT3 Subtitle/Description refinement
290
4.2.1 TXT Lyricist/text writer
291
4.2.2 TXX User defined text information frame
294
4.1 UFI Unique file identifier
295
4.9 ULT Unsychronized lyric/text transcription
297
4.3.1 WAF Official audio file webpage
298
4.3.1 WAR Official artist/performer webpage
299
4.3.1 WAS Official audio source webpage
300
4.3.1 WCM Commercial information
301
4.3.1 WCP Copyright/Legal information
302
4.3.1 WPB Publishers official webpage
303
4.3.2 WXX User defined URL link frame
306
4.1. Unique file identifier
308
This frame's purpose is to be able to identify the audio file in a
309
database that may contain more information relevant to the content.
310
Since standardisation of such a database is beyond this document, all
311
frames begin with a null-terminated string with a URL [URL] containing
312
an email address, or a link to a location where an email address can
313
be found, that belongs to the organisation responsible for this
314
specific database implementation. Questions regarding the database
315
should be sent to the indicated email address. The URL should not be
316
used for the actual database queries. If a $00 is found directly after
317
the 'Frame size' the whole frame should be ignored, and preferably be
318
removed. The 'Owner identifier' is then followed by the actual
319
identifier, which may be up to 64 bytes. There may be more than one
320
"UFI" frame in a tag, but only one with the same 'Owner identifier'.
322
Unique file identifier "UFI"
324
Owner identifier <textstring> $00
325
Identifier <up to 64 bytes binary data>
328
4.2. Text information frames
330
The text information frames are the most important frames, containing
331
information like artist, album and more. There may only be one text
332
information frame of its kind in an tag. If the textstring is followed
333
by a termination ($00 (00)) all the following information should be
334
ignored and not be displayed. All the text information frames have the
337
Text information identifier "T00" - "TZZ" , excluding "TXX",
341
Information <textstring>
344
4.2.1. Text information frames - details
347
The 'Content group description' frame is used if the sound belongs to
348
a larger category of sounds/music. For example, classical music is
349
often sorted in different musical sections (e.g. "Piano Concerto",
350
"Weather - Hurricane").
353
The 'Title/Songname/Content description' frame is the actual name of
354
the piece (e.g. "Adagio", "Hurricane Donna").
357
The 'Subtitle/Description refinement' frame is used for information
358
directly related to the contents title (e.g. "Op. 16" or "Performed
362
The 'Lead artist(s)/Lead performer(s)/Soloist(s)/Performing group' is
363
used for the main artist(s). They are seperated with the "/"
367
The 'Band/Orchestra/Accompaniment' frame is used for additional
368
information about the performers in the recording.
371
The 'Conductor' frame is used for the name of the conductor.
374
The 'Interpreted, remixed, or otherwise modified by' frame contains
375
more information about the people behind a remix and similar
376
interpretations of another existing piece.
379
The 'Composer(s)' frame is intended for the name of the composer(s).
380
They are seperated with the "/" character.
383
The 'Lyricist(s)/text writer(s)' frame is intended for the writer(s)
384
of the text or lyrics in the recording. They are seperated with the
388
The 'Language(s)' frame should contain the languages of the text or
389
lyrics in the audio file. The language is represented with three
390
characters according to ISO-639-2. If more than one language is used
391
in the text their language codes should follow according to their
395
The content type, which previously (in ID3v1.1, see appendix A) was
396
stored as a one byte numeric value only, is now a numeric string. You
397
may use one or several of the types as ID3v1.1 did or, since the
398
category list would be impossible to maintain with accurate and up to
399
date categories, define your own.
400
References to the ID3v1 genres can be made by, as first byte, enter
401
"(" followed by a number from the genres list (section A.3.) and
402
ended with a ")" character. This is optionally followed by a
403
refinement, e.g. "(21)" or "(4)Eurodisco". Several references can be
404
made in the same frame, e.g. "(51)(39)". If the refinement should
405
begin with a "(" character it should be replaced with "((", e.g. "((I
406
can figure out any genre)" or "(55)((I think...)". The following new
407
content types is defined in ID3v2 and is implemented in the same way
408
as the numerig content types, e.g. "(RX)".
414
The 'Album/Movie/Show title' frame is intended for the title of the
415
recording(/source of sound) which the audio in the file is taken from.
418
The 'Part of a set' frame is a numeric string that describes which
419
part of a set the audio came from. This frame is used if the source
420
described in the "TAL" frame is divided into several mediums, e.g. a
421
double CD. The value may be extended with a "/" character and a
422
numeric string containing the total number of parts in the set. E.g.
426
The 'Track number/Position in set' frame is a numeric string
427
containing the order number of the audio-file on its original
428
recording. This may be extended with a "/" character and a numeric
429
string containing the total numer of tracks/elements on the original
430
recording. E.g. "4/9".
433
The 'ISRC' frame should contian the International Standard Recording
437
The 'Year' frame is a numeric string with a year of the recording.
438
This frames is always four characters long (until the year 10000).
441
The 'Date' frame is a numeric string in the DDMM format containing
442
the date for the recording. This field is always four characters
446
The 'Time' frame is a numeric string in the HHMM format containing
447
the time for the recording. This field is always four characters
451
The 'Recording dates' frame is a intended to be used as complement to
452
the "TYE", "TDA" and "TIM" frames. E.g. "4th-7th June, 12th June" in
453
combination with the "TYE" frame.
456
The 'Media type' frame describes from which media the sound
457
originated. This may be a textstring or a reference to the predefined
458
media types found in the list below. References are made within "("
459
and ")" and are optionally followed by a text refinement, e.g. "(MC)
460
with four channels". If a text refinement should begin with a "("
461
character it should be replaced with "((" in the same way as in the
462
"TCO" frame. Predefined refinements is appended after the media type,
463
e.g. "(CD/S)" or "(VID/PAL/VHS)".
465
DIG Other digital media
466
/A Analog transfer from media
468
ANA Other analog media
470
/8CA 8-track tape cassette
473
/A Analog transfer from media
479
/A Analog transfer from media
490
/A Analog transfer from media
493
/A Analog transfer from media
494
/1 standard, 48 kHz/16 bits, linear
495
/2 mode 2, 32 kHz/16 bits, linear
496
/3 mode 3, 32 kHz/12 bits, nonlinear, low speed
497
/4 mode 4, 32 kHz/12 bits, 4 channels
498
/5 mode 5, 44.1 kHz/16 bits, linear
499
/6 mode 6, 44.1 kHz/16 bits, 'wide track' play
502
/A Analog transfer from media
505
/A Analog transfer from media
529
MC MC (normal cassette)
530
/4 4.75 cm/s (normal speed for a two sided cassette)
532
/I Type I cassette (ferric/normal)
533
/II Type II cassette (chrome)
534
/III Type III cassette (ferric chrome)
535
/IV Type IV cassette (metal)
542
/I Type I cassette (ferric/normal)
543
/II Type II cassette (chrome)
544
/III Type III cassette (ferric chrome)
545
/IV Type IV cassette (metal)
548
The 'File type' frame indicates which type of audio this tag defines.
549
The following type and refinements are defined:
556
/AAC Advanced audio compression
558
but other types may be used, not for these types though. This is used
559
in a similar way to the predefined types in the "TMT" frame, but
560
without parenthesis. If this frame is not present audio type is
564
BPM is short for beats per minute, and is easily computed by
565
dividing the number of beats in a musical piece with its length. To
566
get a more accurate result, do the BPM calculation on the main-part
567
only. To acquire best result measure the time between each beat and
568
calculate individual BPM for each beat and use the median value as
569
result. BPM is an integer and represented as a numerical string.
572
The 'Copyright message' frame, which must begin with a year and a
573
space character (making five characters), is intended for the
574
copyright holder of the original sound, not the audio file itself. The
575
absence of this frame means only that the copyright information is
576
unavailable or has been removed, and must not be interpreted to mean
577
that the sound is public domain. Every time this field is displayed
578
the field must be preceded with "Copyright " (C) " ", where (C) is one
579
character showing a C in a circle.
582
The 'Publisher' frame simply contains the name of the label or
586
The 'Encoded by' frame contains the name of the person or
587
organisation that encoded the audio file. This field may contain a
588
copyright message, if the audio file also is copyrighted by the
592
The 'Software/hardware and settings used for encoding' frame
593
includes the used audio encoder and its settings when the file was
594
encoded. Hardware refers to hardware encoders, not the computer on
595
which a program was run.
598
The 'Original filename' frame contains the preferred filename for the
599
file, since some media doesn't allow the desired length of the
600
filename. The filename is case sensitive and includes its suffix.
603
The 'Length' frame contains the length of the audiofile in
604
milliseconds, represented as a numeric string.
607
The 'Size' frame contains the size of the audiofile in bytes
608
excluding the tag, represented as a numeric string.
611
The 'Playlist delay' defines the numbers of milliseconds of silence
612
between every song in a playlist. The player should use the "ETC"
613
frame, if present, to skip initial silence and silence at the end of
614
the audio to match the 'Playlist delay' time. The time is represented
618
The 'Initial key' frame contains the musical key in which the sound
619
starts. It is represented as a string with a maximum length of three
620
characters. The ground keys are represented with "A","B","C","D","E",
621
"F" and "G" and halfkeys represented with "b" and "#". Minor is
622
represented as "m". Example "Cbm". Off key is represented with an "o"
626
The 'Original album/Movie/Show title' frame is intended for the title
627
of the original recording(/source of sound), if for example the music
628
in the file should be a cover of a previously released song.
631
The 'Original artist(s)/performer(s)' frame is intended for the
632
performer(s) of the original recording, if for example the music in
633
the file should be a cover of a previously released song. The
634
performers are seperated with the "/" character.
637
The 'Original Lyricist(s)/text writer(s)' frame is intended for the
638
text writer(s) of the original recording, if for example the music in
639
the file should be a cover of a previously released song. The text
640
writers are seperated with the "/" character.
643
The 'Original release year' frame is intended for the year when the
644
original recording, if for example the music in the file should be a
645
cover of a previously released song, was released. The field is
646
formatted as in the "TDY" frame.
649
4.2.2. User defined text information frame
651
This frame is intended for one-string text information concerning the
652
audiofile in a similar way to the other "T"xx frames. The frame body
653
consists of a description of the string, represented as a terminated
654
string, followed by the actual string. There may be more than one
655
"TXX" frame in each tag, but only one with the same description.
657
User defined... "TXX"
660
Description <textstring> $00 (00)
666
With these frames dynamic data such as webpages with touring
667
information, price information or plain ordinary news can be added to
668
the tag. There may only be one URL [URL] link frame of its kind in an
669
tag, except when stated otherwise in the frame description. If the
670
textstring is followed by a termination ($00 (00)) all the following
671
information should be ignored and not be displayed. All URL link
672
frames have the following format:
674
URL link frame "W00" - "WZZ" , excluding "WXX"
675
(described in 4.3.2.)
680
4.3.1. URL link frames - details
683
The 'Official audio file webpage' frame is a URL pointing at a file
687
The 'Official artist/performer webpage' frame is a URL pointing at
688
the artists official webpage. There may be more than one "WAR" frame
689
in a tag if the audio contains more than one performer.
692
The 'Official audio source webpage' frame is a URL pointing at the
693
official webpage for the source of the audio file, e.g. a movie.
696
The 'Commercial information' frame is a URL pointing at a webpage
697
with information such as where the album can be bought. There may be
698
more than one "WCM" frame in a tag.
701
The 'Copyright/Legal information' frame is a URL pointing at a
702
webpage where the terms of use and ownership of the file is described.
705
The 'Publishers official webpage' frame is a URL pointing at the
706
official wepage for the publisher.
709
4.3.2. User defined URL link frame
711
This frame is intended for URL [URL] links concerning the audiofile in
712
a similar way to the other "W"xx frames. The frame body consists of a
713
description of the string, represented as a terminated string,
714
followed by the actual URL. The URL is always encoded with ISO-8859-1
715
[ISO-8859-1]. There may be more than one "WXX" frame in each tag, but
716
only one with the same description.
718
User defined... "WXX"
721
Description <textstring> $00 (00)
725
4.4. Involved people list
727
Since there might be a lot of people contributing to an audio file in
728
various ways, such as musicians and technicians, the 'Text
729
information frames' are often insufficient to list everyone involved
730
in a project. The 'Involved people list' is a frame containing the
731
names of those involved, and how they were involved. The body simply
732
contains a terminated string with the involvement directly followed by
733
a terminated string with the involvee followed by a new involvement
734
and so on. There may only be one "IPL" frame in each tag.
736
Involved people list "IPL"
739
People list strings <textstrings>
742
4.5. Music CD Identifier
744
This frame is intended for music that comes from a CD, so that the CD
745
can be identified in databases such as the CDDB [CDDB]. The frame
746
consists of a binary dump of the Table Of Contents, TOC, from the CD,
747
which is a header of 4 bytes and then 8 bytes/track on the CD making a
748
maximum of 804 bytes. This frame requires a present and valid "TRK"
749
frame. There may only be one "MCI" frame in each tag.
751
Music CD identifier "MCI"
756
4.6. Event timing codes
758
This frame allows synchronisation with key events in a song or sound.
761
Event timing codes "ETC"
763
Time stamp format $xx
765
Where time stamp format is:
767
$01 Absolute time, 32 bit sized, using MPEG [MPEG] frames as unit
768
$02 Absolute time, 32 bit sized, using milliseconds as unit
770
Abolute time means that every stamp contains the time from the
771
beginning of the file.
773
Followed by a list of key events in the following format:
776
Time stamp $xx (xx ...)
778
The 'Time stamp' is set to zero if directly at the beginning of the
779
sound or after the previous event. All events should be sorted in
780
chronological order. The type of event is as follows:
782
$00 padding (has no meaning)
783
$01 end of initial silence
795
$0D unwanted noise (Snap, Crackle & Pop)
797
$0E-$DF reserved for future use
799
$E0-$EF not predefined sync 0-F
801
$F0-$FC reserved for future use
803
$FD audio end (start of silence)
805
$FF one more byte of events follows (all the following bytes with
806
the value $FF have the same function)
808
The 'Not predefined sync's ($E0-EF) are for user events. You might
809
want to synchronise your music to something, like setting of an
810
explosion on-stage, turning on your screensaver etc.
812
There may only be one "ETC" frame in each tag.
815
4.7. MPEG location lookup table
817
To increase performance and accuracy of jumps within a MPEG [MPEG]
818
audio file, frames with timecodes in different locations in the file
819
might be useful. The ID3 frame includes references that the software
820
can use to calculate positions in the file. After the frame header is
821
a descriptor of how much the 'frame counter' should increase for every
822
reference. If this value is two then the first reference points out
823
the second frame, the 2nd reference the 4th frame, the 3rd reference
824
the 6th frame etc. In a similar way the 'bytes between reference' and
825
'milliseconds between reference' points out bytes and milliseconds
828
Each reference consists of two parts; a certain number of bits, as
829
defined in 'bits for bytes deviation', that describes the difference
830
between what is said in 'bytes between reference' and the reality and
831
a certain number of bits, as defined in 'bits for milliseconds
832
deviation', that describes the difference between what is said in
833
'milliseconds between reference' and the reality. The number of bits
834
in every reference, i.e. 'bits for bytes deviation'+'bits for
835
milliseconds deviation', must be a multiple of four. There may only be
836
one "MLL" frame in each tag.
838
Location lookup table "MLL"
839
ID3 frame size $xx xx xx
840
MPEG frames between reference $xx xx
841
Bytes between reference $xx xx xx
842
Milliseconds between reference $xx xx xx
843
Bits for bytes deviation $xx
844
Bits for milliseconds dev. $xx
846
Then for every reference the following data is included;
848
Deviation in bytes %xxx....
849
Deviation in milliseconds %xxx....
852
4.8. Synced tempo codes
854
For a more accurate description of the tempo of a musical piece this
855
frame might be used. After the header follows one byte describing
856
which time stamp format should be used. Then follows one or more tempo
857
codes. Each tempo code consists of one tempo part and one time part.
858
The tempo is in BPM described with one or two bytes. If the first byte
859
has the value $FF, one more byte follows, which is added to the first
860
giving a range from 2 - 510 BPM, since $00 and $01 is reserved. $00 is
861
used to describe a beat-free time period, which is not the same as a
862
music-free time period. $01 is used to indicate one single beat-stroke
863
followed by a beat-free period.
865
The tempo descriptor is followed by a time stamp. Every time the tempo
866
in the music changes, a tempo descriptor may indicate this for the
867
player. All tempo descriptors should be sorted in chronological order.
868
The first beat-stroke in a time-period is at the same time as the beat
869
description occurs. There may only be one "STC" frame in each tag.
871
Synced tempo codes "STC"
873
Time stamp format $xx
874
Tempo data <binary data>
876
Where time stamp format is:
878
$01 Absolute time, 32 bit sized, using MPEG [MPEG] frames as unit
879
$02 Absolute time, 32 bit sized, using milliseconds as unit
881
Abolute time means that every stamp contains the time from the
882
beginning of the file.
885
4.9. Unsychronised lyrics/text transcription
887
This frame contains the lyrics of the song or a text transcription of
888
other vocal activities. The head includes an encoding descriptor and
889
a content descriptor. The body consists of the actual text. The
890
'Content descriptor' is a terminated string. If no descriptor is
891
entered, 'Content descriptor' is $00 (00) only. Newline characters
892
are allowed in the text. Maximum length for the descriptor is 64
893
bytes. There may be more than one lyrics/text frame in each tag, but
894
only one with the same language and content descriptor.
896
Unsynced lyrics/text "ULT"
900
Content descriptor <textstring> $00 (00)
901
Lyrics/text <textstring>
904
4.10. Synchronised lyrics/text
906
This is another way of incorporating the words, said or sung lyrics,
907
in the audio file as text, this time, however, in sync with the audio.
908
It might also be used to describing events e.g. occurring on a stage
909
or on the screen in sync with the audio. The header includes a content
910
descriptor, represented with as terminated textstring. If no
911
descriptor is entered, 'Content descriptor' is $00 (00) only.
913
Synced lyrics/text "SLT"
917
Time stamp format $xx
919
Content descriptor <textstring> $00 (00)
922
Encoding: $00 ISO-8859-1 [ISO-8859-1] character set is used => $00
924
$01 Unicode [UNICODE] character set is used => $00 00 is
927
Content type: $00 is other
929
$02 is text transcription
930
$03 is movement/part name (e.g. "Adagio")
931
$04 is events (e.g. "Don Quijote enters the stage")
932
$05 is chord (e.g. "Bb F Fsus")
934
Time stamp format is:
936
$01 Absolute time, 32 bit sized, using MPEG [MPEG] frames as unit
937
$02 Absolute time, 32 bit sized, using milliseconds as unit
939
Abolute time means that every stamp contains the time from the
940
beginning of the file.
942
The text that follows the frame header differs from that of the
943
unsynchronised lyrics/text transcription in one major way. Each
944
syllable (or whatever size of text is considered to be convenient by
945
the encoder) is a null terminated string followed by a time stamp
946
denoting where in the sound file it belongs. Each sync thus has the
949
Terminated text to be synced (typically a syllable)
950
Sync identifier (terminator to above string) $00 (00)
951
Time stamp $xx (xx ...)
953
The 'time stamp' is set to zero or the whole sync is omitted if
954
located directly at the beginning of the sound. All time stamps should
955
be sorted in chronological order. The sync can be considered as a
956
validator of the subsequent string.
958
Newline characters are allowed in all "SLT" frames and should be used
959
after every entry (name, event etc.) in a frame with the content type
962
A few considerations regarding whitespace characters: Whitespace
963
separating words should mark the beginning of a new word, thus
964
occurring in front of the first syllable of a new word. This is also
965
valid for new line characters. A syllable followed by a comma should
966
not be broken apart with a sync (both the syllable and the comma
967
should be before the sync).
969
An example: The "ULT" passage
971
"Strangers in the night" $0A "Exchanging glances"
973
would be "SLT" encoded as:
975
"Strang" $00 xx xx "ers" $00 xx xx " in" $00 xx xx " the" $00 xx xx
976
" night" $00 xx xx 0A "Ex" $00 xx xx "chang" $00 xx xx "ing" $00 xx
977
xx "glan" $00 xx xx "ces" $00 xx xx
979
There may be more than one "SLT" frame in each tag, but only one with
980
the same language and content descriptor.
985
This frame replaces the old 30-character comment field in ID3v1. It
986
consists of a frame head followed by encoding, language and content
987
descriptors and is ended with the actual comment as a text string.
988
Newline characters are allowed in the comment text string. There may
989
be more than one comment frame in each tag, but only one with the same
990
language and content descriptor.
996
Short content description <textstring> $00 (00)
997
The actual text <textstring>
1000
4.12. Relative volume adjustment
1002
This is a more subjective function than the previous ones. It allows
1003
the user to say how much he wants to increase/decrease the volume on
1004
each channel while the file is played. The purpose is to be able to
1005
align all files to a reference volume, so that you don't have to
1006
change the volume constantly. This frame may also be used to balance
1007
adjust the audio. If the volume peak levels are known then this could
1008
be described with the 'Peak volume right' and 'Peak volume left'
1009
field. If Peakvolume is not known these fields could be left zeroed
1010
or completely omitted. There may only be one "RVA" frame in each
1013
Relative volume adjustment "RVA"
1014
Frame size $xx xx xx
1015
Increment/decrement %000000xx
1016
Bits used for volume descr. $xx
1017
Relative volume change, right $xx xx (xx ...)
1018
Relative volume change, left $xx xx (xx ...)
1019
Peak volume right $xx xx (xx ...)
1020
Peak volume left $xx xx (xx ...)
1022
In the increment/decrement field bit 0 is used to indicate the right
1023
channel and bit 1 is used to indicate the left channel. 1 is
1024
increment and 0 is decrement.
1026
The 'bits used for volume description' field is normally $10 (16 bits)
1027
for MPEG 2 layer I, II and III [MPEG] and MPEG 2.5. This value may not
1028
be $00. The volume is always represented with whole bytes, padded in
1029
the beginning (highest bits) when 'bits used for volume description'
1030
is not a multiple of eight.
1035
This is another subjective, alignment frame. It allows the user to
1036
predefine an equalisation curve within the audio file. There may only
1037
be one "EQU" frame in each tag.
1040
Frame size $xx xx xx
1043
The 'adjustment bits' field defines the number of bits used for
1044
representation of the adjustment. This is normally $10 (16 bits) for
1045
MPEG 2 layer I, II and III [MPEG] and MPEG 2.5. This value may not be
1048
This is followed by 2 bytes + ('adjustment bits' rounded up to the
1049
nearest byte) for every equalisation band in the following format,
1050
giving a frequency range of 0 - 32767Hz:
1052
Increment/decrement %x (MSB of the Frequency)
1053
Frequency (lower 15 bits)
1054
Adjustment $xx (xx ...)
1056
The increment/decrement bit is 1 for increment and 0 for decrement.
1057
The equalisation bands should be ordered increasingly with reference
1058
to frequency. All frequencies don't have to be declared. Adjustments
1059
with the value $00 should be omitted. A frequency should only be
1060
described once in the frame.
1065
Yet another subjective one. You may here adjust echoes of different
1066
kinds. Reverb left/right is the delay between every bounce in ms.
1067
Reverb bounces left/right is the number of bounces that should be
1068
made. $FF equals an infinite number of bounces. Feedback is the amount
1069
of volume that should be returned to the next echo bounce. $00 is 0%,
1070
$FF is 100%. If this value were $7F, there would be 50% volume
1071
reduction on the first bounce, yet 50% on the second and so on. Left
1072
to left means the sound from the left bounce to be played in the left
1073
speaker, while left to right means sound from the left bounce to be
1074
played in the right speaker.
1076
'Premix left to right' is the amount of left sound to be mixed in the
1077
right before any reverb is applied, where $00 id 0% and $FF is 100%.
1078
'Premix right to left' does the same thing, but right to left. Setting
1079
both premix to $FF would result in a mono output (if the reverb is
1080
applied symmetric). There may only be one "REV" frame in each tag.
1082
Reverb settings "REV"
1083
Frame size $00 00 0C
1084
Reverb left (ms) $xx xx
1085
Reverb right (ms) $xx xx
1086
Reverb bounces, left $xx
1087
Reverb bounces, right $xx
1088
Reverb feedback, left to left $xx
1089
Reverb feedback, left to right $xx
1090
Reverb feedback, right to right $xx
1091
Reverb feedback, right to left $xx
1092
Premix left to right $xx
1093
Premix right to left $xx
1096
4.15. Attached picture
1098
This frame contains a picture directly related to the audio file.
1099
Image format is preferably "PNG" [PNG] or "JPG" [JFIF]. Description
1100
is a short description of the picture, represented as a terminated
1101
textstring. The description has a maximum length of 64 characters,
1102
but may be empty. There may be several pictures attached to one file,
1103
each in their individual "PIC" frame, but only one with the same
1104
content descriptor. There may only be one picture with the picture
1105
type declared as picture type $01 and $02 respectively. There is a
1106
possibility to put only a link to the image file by using the 'image
1107
format' "-->" and having a complete URL [URL] instead of picture data.
1108
The use of linked files should however be used restrictively since
1109
there is the risk of separation of files.
1111
Attached picture "PIC"
1112
Frame size $xx xx xx
1114
Image format $xx xx xx
1116
Description <textstring> $00 (00)
1117
Picture data <binary data>
1120
Picture type: $00 Other
1121
$01 32x32 pixels 'file icon' (PNG only)
1126
$06 Media (e.g. lable side of CD)
1127
$07 Lead artist/lead performer/soloist
1128
$08 Artist/performer
1132
$0C Lyricist/text writer
1133
$0D Recording Location
1134
$0E During recording
1135
$0F During performance
1136
$10 Movie/video screen capture
1137
$11 A bright coloured fish
1139
$13 Band/artist logotype
1140
$14 Publisher/Studio logotype
1143
4.16. General encapsulated object
1145
In this frame any type of file can be encapsulated. After the header,
1146
'Frame size' and 'Encoding' follows 'MIME type' [MIME] and 'Filename'
1147
for the encapsulated object, both represented as terminated strings
1148
encoded with ISO 8859-1 [ISO-8859-1]. The filename is case sensitive.
1149
Then follows a content description as terminated string, encoded as
1150
'Encoding'. The last thing in the frame is the actual object. The
1151
first two strings may be omitted, leaving only their terminations.
1152
MIME type is always an ISO-8859-1 text string. There may be more than
1153
one "GEO" frame in each tag, but only one with the same content
1156
General encapsulated object "GEO"
1157
Frame size $xx xx xx
1159
MIME type <textstring> $00
1160
Filename <textstring> $00 (00)
1161
Content description <textstring> $00 (00)
1162
Encapsulated object <binary data>
1167
This is simply a counter of the number of times a file has been
1168
played. The value is increased by one every time the file begins to
1169
play. There may only be one "CNT" frame in each tag. When the counter
1170
reaches all one's, one byte is inserted in front of the counter thus
1171
making the counter eight bits bigger. The counter must be at least
1172
32-bits long to begin with.
1175
Frame size $xx xx xx
1176
Counter $xx xx xx xx (xx ...)
1181
The purpose of this frame is to specify how good an audio file is.
1182
Many interesting applications could be found to this frame such as a
1183
playlist that features better audiofiles more often than others or it
1184
could be used to profile a persons taste and find other 'good' files
1185
by comparing people's profiles. The frame is very simple. It contains
1186
the email address to the user, one rating byte and a four byte play
1187
counter, intended to be increased with one for every time the file is
1188
played. The email is a terminated string. The rating is 1-255 where
1189
1 is worst and 255 is best. 0 is unknown. If no personal counter is
1190
wanted it may be omitted. When the counter reaches all one's, one
1191
byte is inserted in front of the counter thus making the counter
1192
eight bits bigger in the same away as the play counter ("CNT").
1193
There may be more than one "POP" frame in each tag, but only one with
1194
the same email address.
1197
Frame size $xx xx xx
1198
Email to user <textstring> $00
1200
Counter $xx xx xx xx (xx ...)
1203
4.19. Recommended buffer size
1205
Sometimes the server from which a audio file is streamed is aware of
1206
transmission or coding problems resulting in interruptions in the
1207
audio stream. In these cases, the size of the buffer can be
1208
recommended by the server using this frame. If the 'embedded info
1209
flag' is true (1) then this indicates that an ID3 tag with the
1210
maximum size described in 'Buffer size' may occur in the audiostream.
1211
In such case the tag should reside between two MPEG [MPEG] frames, if
1212
the audio is MPEG encoded. If the position of the next tag is known,
1213
'offset to next tag' may be used. The offset is calculated from the
1214
end of tag in which this frame resides to the first byte of the header
1215
in the next. This field may be omitted. Embedded tags is currently not
1216
recommended since this could render unpredictable behaviour from
1217
present software/hardware. The 'Buffer size' should be kept to a
1218
minimum. There may only be one "BUF" frame in each tag.
1220
Recommended buffer size "BUF"
1221
Frame size $xx xx xx
1222
Buffer size $xx xx xx
1223
Embedded info flag %0000000x
1224
Offset to next tag $xx xx xx xx
1227
4.20. Encrypted meta frame
1229
This frame contains one or more encrypted frames. This enables
1230
protection of copyrighted information such as pictures and text, that
1231
people might want to pay extra for. Since standardisation of such an
1232
encryption scheme is beyond this document, all "CRM" frames begin with
1233
a terminated string with a URL [URL] containing an email address, or a
1234
link to a location where an email adress can be found, that belongs to
1235
the organisation responsible for this specific encrypted meta frame.
1237
Questions regarding the encrypted frame should be sent to the
1238
indicated email address. If a $00 is found directly after the 'Frame
1239
size', the whole frame should be ignored, and preferably be removed.
1240
The 'Owner identifier' is then followed by a short content description
1241
and explanation as to why it's encrypted. After the
1242
'content/explanation' description, the actual encrypted block follows.
1244
When an ID3v2 decoder encounters a "CRM" frame, it should send the
1245
datablock to the 'plugin' with the corresponding 'owner identifier'
1246
and expect to receive either a datablock with one or several ID3v2
1247
frames after each other or an error. There may be more than one "CRM"
1248
frames in a tag, but only one with the same 'owner identifier'.
1250
Encrypted meta frame "CRM"
1251
Frame size $xx xx xx
1252
Owner identifier <textstring> $00 (00)
1253
Content/explanation <textstring> $00 (00)
1254
Encrypted datablock <binary data>
1257
4.21. Audio encryption
1259
This frame indicates if the actual audio stream is encrypted, and by
1260
whom. Since standardisation of such encrypion scheme is beyond this
1261
document, all "CRA" frames begin with a terminated string with a
1262
URL containing an email address, or a link to a location where an
1263
email address can be found, that belongs to the organisation
1264
responsible for this specific encrypted audio file. Questions
1265
regarding the encrypted audio should be sent to the email address
1266
specified. If a $00 is found directly after the 'Frame size' and the
1267
audiofile indeed is encrypted, the whole file may be considered
1270
After the 'Owner identifier', a pointer to an unencrypted part of the
1271
audio can be specified. The 'Preview start' and 'Preview length' is
1272
described in frames. If no part is unencrypted, these fields should be
1273
left zeroed. After the 'preview length' field follows optionally a
1274
datablock required for decryption of the audio. There may be more than
1275
one "CRA" frames in a tag, but only one with the same 'Owner
1278
Audio encryption "CRA"
1279
Frame size $xx xx xx
1280
Owner identifier <textstring> $00 (00)
1281
Preview start $xx xx
1282
Preview length $xx xx
1283
Encryption info <binary data>
1286
4.22. Linked information
1288
To keep space waste as low as possible this frame may be used to link
1289
information from another ID3v2 tag that might reside in another audio
1290
file or alone in a binary file. It is recommended that this method is
1291
only used when the files are stored on a CD-ROM or other circumstances
1292
when the risk of file seperation is low. The frame contains a frame
1293
identifier, which is the frame that should be linked into this tag, a
1294
URL [URL] field, where a reference to the file where the frame is
1295
given, and additional ID data, if needed. Data should be retrieved
1296
from the first tag found in the file to which this link points. There
1297
may be more than one "LNK" frame in a tag, but only one with the same
1298
contents. A linked frame is to be considered as part of the tag and
1299
has the same restrictions as if it was a physical part of the tag
1300
(i.e. only one "REV" frame allowed, whether it's linked or not).
1302
Linked information "LNK"
1303
Frame size $xx xx xx
1304
Frame identifier $xx xx xx
1305
URL <textstring> $00 (00)
1306
Additional ID data <textstring(s)>
1308
Frames that may be linked and need no additional data are "IPL",
1309
"MCI", "ETC", "LLT", "STC", "RVA", "EQU", "REV", "BUF", the text
1310
information frames and the URL link frames.
1312
The "TXX", "PIC", "GEO", "CRM" and "CRA" frames may be linked with the
1313
content descriptor as additional ID data.
1315
The "COM", "SLT" and "ULT" frames may be linked with three bytes of
1316
language descriptor directly followed by a content descriptor as
1320
5. The 'unsynchronisation scheme'
1322
The only purpose of the 'unsychronisation scheme' is to make the ID3v2
1323
tag as compatible as possible with existing software. There is no use
1324
in 'unsynchronising' tags if the file is only to be processed by new
1325
software. Unsynchronisation may only be made with MPEG 2 layer I, II
1326
and III and MPEG 2.5 files.
1328
Whenever a false synchronisation is found within the tag, one zeroed
1329
byte is inserted after the first false synchronisation byte. The
1330
format of a correct sync that should be altered by ID3 encoders is as
1335
And should be replaced with:
1337
%11111111 00000000 111xxxxx
1339
This has the side effect that all $FF 00 combinations have to be
1340
altered, so they won't be affected by the decoding process. Therefore
1341
all the $FF 00 combinations have to be replaced with the $FF 00 00
1342
combination during the unsynchonisation.
1344
To indicate usage of the unsynchronisation, the first bit in 'ID3
1345
flags' should be set. This bit should only be set if the tag
1346
contained a, now corrected, false synchronisation. The bit should
1347
only be clear if the tag does not contain any false synchronisations.
1349
Do bear in mind, that if a compression scheme is used by the encoder,
1350
the unsyncronisation scheme should be applied *afterwards*. When
1351
decoding a compressed, 'unsyncronised' file, the 'unsyncronisation
1352
scheme' should be parsed first, compression afterwards.
1357
Copyright (C) Martin Nilsson 1998. All Rights Reserved.
1359
This document and translations of it may be copied and furnished to
1360
others, and derivative works that comment on or otherwise explain it
1361
or assist in its implementation may be prepared, copied, published
1362
and distributed, in whole or in part, without restriction of any
1363
kind, provided that a reference to this document is included on all
1364
such copies and derivative works. However, this document itself may
1365
not be modified in any way and reissued as the original document.
1367
The limited permissions granted above are perpetual and will not be
1370
This document and the information contained herein is provided on an
1371
"AS IS" basis and THE AUTHORS DISCLAIMS ALL WARRANTIES, EXPRESS OR
1372
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1373
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1374
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
1379
[CDDB] Compact Disc Data Base
1381
<url:http://www.cddb.com>
1383
[ISO-639-2] ISO/FDIS 639-2.
1384
Codes for the representation of names of languages, Part 2: Alpha-3
1385
code. Technical committee / subcommittee: TC 37 / SC 2
1387
[ISO-8859-1] ISO/IEC DIS 8859-1.
1388
8-bit single-byte coded graphic character sets, Part 1: Latin
1389
alphabet No. 1. Technical committee / subcommittee: JTC 1 / SC 2
1391
[ISRC] ISO 3901:1986
1392
International Standard Recording Code (ISRC).
1393
Technical committee / subcommittee: TC 46 / SC 9
1395
[JFIF] JPEG File Interchange Format, version 1.02
1397
<url:http://www.w3.org/Graphics/JPEG/jfif.txt>
1399
[MIME] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
1400
Extensions (MIME) Part One: Format of Internet Message Bodies",
1401
RFC 2045, November 1996.
1403
<url:ftp://ftp.isi.edu/in-notes/rfc2045.txt>
1405
[MPEG] ISO/IEC 11172-3:1993.
1406
Coding of moving pictures and associated audio for digital storage
1407
media at up to about 1,5 Mbit/s, Part 3: Audio.
1408
Technical committee / subcommittee: JTC 1 / SC 29
1410
ISO/IEC 13818-3:1995
1411
Generic coding of moving pictures and associated audio information,
1413
Technical committee / subcommittee: JTC 1 / SC 29
1416
Generic coding of moving pictures and associated audio information,
1417
Part 3: Audio (Revision of ISO/IEC 13818-3:1995)
1420
[PNG] Portable Network Graphics, version 1.0
1422
<url:http://www.w3.org/TR/REC-png-multi.html>
1424
[UNICODE] ISO/IEC 10646-1:1993.
1425
Universal Multiple-Octet Coded Character Set (UCS), Part 1:
1426
Architecture and Basic Multilingual Plane. Technical committee
1427
/ subcommittee: JTC 1 / SC 2
1429
<url:http://www.unicode.org>
1431
[URL] T. Berners-Lee, L. Masinter & M. McCahill, "Uniform Resource
1432
Locators (URL).", RFC 1738, December 1994.
1434
<url:ftp://ftp.isi.edu/in-notes/rfc1738.txt>
1440
A. Appendix A - ID3-Tag Specification V1.1
1442
ID3-Tag Specification V1.1 (12 dec 1997) by Michael Mutschler
1443
<amiga2@info2.rus.uni-stuttgart.de>, edited for space and clarity
1449
The ID3-Tag is an information field for MPEG Layer 3 audio files.
1450
Since a standalone MP3 doesn't provide a method of storing other
1451
information than those directly needed for replay reasons, the
1452
ID3-tag was invented by Eric Kemp in 1996.
1454
A revision from ID3v1 to ID3v1.1 was made by Michael Mutschler to
1455
support track number information is described in A.4.
1458
A.2. ID3v1 Implementation
1460
The Information is stored in the last 128 bytes of an MP3. The Tag
1461
has got the following fields, and the offsets given here, are from
1464
Field Length Offsets
1474
The string-fields contain ASCII-data, coded in ISO-Latin 1 codepage.
1475
Strings which are smaller than the field length are padded with zero-
1478
Tag: The tag is valid if this field contains the string "TAG". This
1479
has to be uppercase!
1481
Songname: This field contains the title of the MP3 (string as
1484
Artist: This field contains the artist of the MP3 (string as above).
1486
Album: this field contains the album where the MP3 comes from
1489
Year: this field contains the year when this song has originally
1490
been released (string as above).
1492
Comment: this field contains a comment for the MP3 (string as
1493
above). Revision to this field has been made in ID3v1.1. See
1496
Genre: this byte contains the offset of a genre in a predefined
1497
list the byte is treated as an unsigned byte. The offset is
1498
starting from 0. See A.3.
1503
The following genres is defined in ID3v1
1552
47.Instrumental Rock
1556
51.Techno-Industrial
1586
The following genres are Winamp extensions
1636
A.4. Track addition - ID3v1.1
1638
In ID3v1.1, Michael Mutschler revised the specification of the
1639
comment field in order to implement the track number. The new format
1640
of the comment field is a 28 character string followed by a mandatory
1641
null ($00) character and the original album tracknumber stored as an
1642
unsigned byte-size integer. In such cases where the 29th byte is not
1643
the null character or when the 30th is a null character, the
1644
tracknumber is to be considered undefined.
1654
Email: nilsson@id3.org
1658
Johan Sundstr�m Email: johan@id3.org