<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>mp3Frame</title><revhistory><revision><revnumber>10</revnumber><date>2012-10-08 22:15:39</date><authorinitials>localhost</authorinitials><revremark>converted to 1.6 markup</revremark></revision><revision><revnumber>9</revnumber><date>2006-10-30 02:28:34</date><authorinitials>DanONeill</authorinitials></revision><revision><revnumber>8</revnumber><date>2006-10-30 02:22:39</date><authorinitials>DanONeill</authorinitials></revision><revision><revnumber>7</revnumber><date>2006-10-30 02:15:02</date><authorinitials>DanONeill</authorinitials></revision><revision><revnumber>6</revnumber><date>2006-10-30 02:07:35</date><authorinitials>DanONeill</authorinitials></revision><revision><revnumber>5</revnumber><date>2006-10-30 02:07:27</date><authorinitials>DanONeill</authorinitials></revision><revision><revnumber>4</revnumber><date>2006-10-30 02:07:19</date><authorinitials>DanONeill</authorinitials></revision><revision><revnumber>3</revnumber><date>2006-10-30 02:07:06</date><authorinitials>DanONeill</authorinitials></revision><revision><revnumber>2</revnumber><date>2006-10-30 02:06:45</date><authorinitials>DanONeill</authorinitials></revision><revision><revnumber>1</revnumber><date>2006-10-30 02:04:33</date><authorinitials>DanONeill</authorinitials></revision></revhistory></articleinfo><section><title>How is MP3 built?</title><informaltable><tgroup cols="2"><colspec colname="col_0"/><colspec colname="col_1"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para>Most people with a little knowledge in MP3 files know that the sound is divided into smaller parts and compressed with a psycoacoustic model. This smaller pieces of the audio is then put into something called 'frames', which is a little datablock with a header. I'll focus on that header in this text.</para><para>The header is 4 bytes, 32 bits, big and begins with something called sync. This sync is, at least according to the MPEG standard, 12 set bits in a row. Some add-on standards made later uses 11 set bits and one cleared bit. The sync is directly followed by a ID bit, indicating if the file is a MPEG-1 och MPEG-2 file. 0=MPEG-2 and 1=MPEG-1 </para></entry><entry colsep="1" rowsep="1"><para> <inlinemediaobject><imageobject><imagedata fileref="http://id3.org/mp3Frame?action=AttachFile&amp;do=get&amp;target=mp3frame_blocks.gif"/></imageobject><textobject><phrase>mp3frame_blocks.gif</phrase></textobject></inlinemediaobject> </para></entry></row></tbody></tgroup></informaltable><para>The layer is defined with the two layers bits. They are oddly defined as </para><informaltable><tgroup cols="2"><colspec colname="col_0"/><colspec colname="col_1"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 0 </para></entry><entry colsep="1" rowsep="1"><para> Not defined </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>0 1 </para></entry><entry colsep="1" rowsep="1"><para> Layer III </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>1 0 </para></entry><entry colsep="1" rowsep="1"><para> Layer II </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>1 1 </para></entry><entry colsep="1" rowsep="1"><para> Layer I </para></entry></row></tbody></tgroup></informaltable><para>With this information and the information in the bitrate field we can determine the bitrate of the audio (in kbit/s) according to this table. </para><informaltable><tgroup cols="7"><colspec colname="col_0"/><colspec colname="col_1"/><colspec colname="col_2"/><colspec colname="col_3"/><colspec colname="col_4"/><colspec colname="col_5"/><colspec colname="col_6"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para><emphasis role="strong"> Bitrate<!--Warning: Probably not emitting right sort of linebreak-->
value </emphasis> </para></entry><entry colsep="1" rowsep="1"><para><emphasis role="strong"> MPEG-1,<!--Warning: Probably not emitting right sort of linebreak-->
layer I </emphasis> </para></entry><entry colsep="1" rowsep="1"><para><emphasis role="strong"> MPEG-1,<!--Warning: Probably not emitting right sort of linebreak-->
layer II </emphasis> </para></entry><entry colsep="1" rowsep="1"><para><emphasis role="strong"> MPEG-1,<!--Warning: Probably not emitting right sort of linebreak-->
layer III </emphasis> </para></entry><entry colsep="1" rowsep="1"><para><emphasis role="strong"> MPEG-2,<!--Warning: Probably not emitting right sort of linebreak-->
layer I </emphasis> </para></entry><entry colsep="1" rowsep="1"><para><emphasis role="strong"> MPEG-2,<!--Warning: Probably not emitting right sort of linebreak-->
layer II </emphasis> </para></entry><entry colsep="1" rowsep="1"><para><emphasis role="strong"> MPEG-2,<!--Warning: Probably not emitting right sort of linebreak-->
layer III </emphasis> </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 0 0 0 </para></entry><entry align="center" colsep="1" nameend="col_6" namest="col_1" rowsep="1"/></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 0 0 1 </para></entry><entry colsep="1" rowsep="1"><para> 32 </para></entry><entry colsep="1" rowsep="1"><para> 32 </para></entry><entry colsep="1" rowsep="1"><para> 32 </para></entry><entry colsep="1" rowsep="1"><para> 32 </para></entry><entry colsep="1" rowsep="1"><para> 32 </para></entry><entry colsep="1" rowsep="1"><para> 8 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 0 1 0 </para></entry><entry colsep="1" rowsep="1"><para> 64 </para></entry><entry colsep="1" rowsep="1"><para> 48 </para></entry><entry colsep="1" rowsep="1"><para> 40 </para></entry><entry colsep="1" rowsep="1"><para> 64 </para></entry><entry colsep="1" rowsep="1"><para> 48 </para></entry><entry colsep="1" rowsep="1"><para> 16 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 0 1 1 </para></entry><entry colsep="1" rowsep="1"><para> 96 </para></entry><entry colsep="1" rowsep="1"><para> 56 </para></entry><entry colsep="1" rowsep="1"><para> 48 </para></entry><entry colsep="1" rowsep="1"><para> 96 </para></entry><entry colsep="1" rowsep="1"><para> 56 </para></entry><entry colsep="1" rowsep="1"><para> 24 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 1 0 0 </para></entry><entry colsep="1" rowsep="1"><para> 128 </para></entry><entry colsep="1" rowsep="1"><para> 64 </para></entry><entry colsep="1" rowsep="1"><para> 56 </para></entry><entry colsep="1" rowsep="1"><para> 128 </para></entry><entry colsep="1" rowsep="1"><para> 64 </para></entry><entry colsep="1" rowsep="1"><para> 32 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 1 0 1 </para></entry><entry colsep="1" rowsep="1"><para> 160 </para></entry><entry colsep="1" rowsep="1"><para> 80 </para></entry><entry colsep="1" rowsep="1"><para> 64 </para></entry><entry colsep="1" rowsep="1"><para> 160 </para></entry><entry colsep="1" rowsep="1"><para> 80 </para></entry><entry colsep="1" rowsep="1"><para> 64 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 1 1 0 </para></entry><entry colsep="1" rowsep="1"><para> 192 </para></entry><entry colsep="1" rowsep="1"><para> 96 </para></entry><entry colsep="1" rowsep="1"><para> 80 </para></entry><entry colsep="1" rowsep="1"><para> 192 </para></entry><entry colsep="1" rowsep="1"><para> 96 </para></entry><entry colsep="1" rowsep="1"><para> 80 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 1 1 1 </para></entry><entry colsep="1" rowsep="1"><para> 224 </para></entry><entry colsep="1" rowsep="1"><para> 112 </para></entry><entry colsep="1" rowsep="1"><para> 96 </para></entry><entry colsep="1" rowsep="1"><para> 224 </para></entry><entry colsep="1" rowsep="1"><para> 112 </para></entry><entry colsep="1" rowsep="1"><para> 56 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 1 0 0 0 </para></entry><entry colsep="1" rowsep="1"><para> 256 </para></entry><entry colsep="1" rowsep="1"><para> 128 </para></entry><entry colsep="1" rowsep="1"><para> 112 </para></entry><entry colsep="1" rowsep="1"><para> 256 </para></entry><entry colsep="1" rowsep="1"><para> 128 </para></entry><entry colsep="1" rowsep="1"><para> 64 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 1 0 0 1 </para></entry><entry colsep="1" rowsep="1"><para> 288 </para></entry><entry colsep="1" rowsep="1"><para> 160 </para></entry><entry colsep="1" rowsep="1"><para> 128 </para></entry><entry colsep="1" rowsep="1"><para> 288 </para></entry><entry colsep="1" rowsep="1"><para> 160 </para></entry><entry colsep="1" rowsep="1"><para> 128 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 1 0 1 0 </para></entry><entry colsep="1" rowsep="1"><para> 320 </para></entry><entry colsep="1" rowsep="1"><para> 192 </para></entry><entry colsep="1" rowsep="1"><para> 160 </para></entry><entry colsep="1" rowsep="1"><para> 320 </para></entry><entry colsep="1" rowsep="1"><para> 192 </para></entry><entry colsep="1" rowsep="1"><para> 160 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 1 0 1 1 </para></entry><entry colsep="1" rowsep="1"><para> 352 </para></entry><entry colsep="1" rowsep="1"><para> 224 </para></entry><entry colsep="1" rowsep="1"><para> 192 </para></entry><entry colsep="1" rowsep="1"><para> 352 </para></entry><entry colsep="1" rowsep="1"><para> 224 </para></entry><entry colsep="1" rowsep="1"><para> 112 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 1 1 0 0 </para></entry><entry colsep="1" rowsep="1"><para> 384 </para></entry><entry colsep="1" rowsep="1"><para> 256 </para></entry><entry colsep="1" rowsep="1"><para> 224 </para></entry><entry colsep="1" rowsep="1"><para> 384 </para></entry><entry colsep="1" rowsep="1"><para> 256 </para></entry><entry colsep="1" rowsep="1"><para> 128 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 1 1 0 1 </para></entry><entry colsep="1" rowsep="1"><para> 416 </para></entry><entry colsep="1" rowsep="1"><para> 320 </para></entry><entry colsep="1" rowsep="1"><para> 256 </para></entry><entry colsep="1" rowsep="1"><para> 416 </para></entry><entry colsep="1" rowsep="1"><para> 320 </para></entry><entry colsep="1" rowsep="1"><para> 256 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 1 1 1 0 </para></entry><entry colsep="1" rowsep="1"><para> 448 </para></entry><entry colsep="1" rowsep="1"><para> 384 </para></entry><entry colsep="1" rowsep="1"><para> 320 </para></entry><entry colsep="1" rowsep="1"><para> 448 </para></entry><entry colsep="1" rowsep="1"><para> 384 </para></entry><entry colsep="1" rowsep="1"><para> 320 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 1 1 1 1 </para></entry><entry align="center" colsep="1" nameend="col_6" namest="col_1" rowsep="1"/></row></tbody></tgroup></informaltable><para>The sample rate is described in the frequency field. These values is dependent of which MPEG standard is used according to the following table. </para><informaltable><tgroup cols="3"><colspec colname="col_0"/><colspec colname="col_1"/><colspec colname="col_2"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para><emphasis role="strong"> Frequency<!--Warning: Probably not emitting right sort of linebreak-->
Value</emphasis> </para></entry><entry colsep="1" rowsep="1"><para><emphasis role="strong">MPEG-1</emphasis> </para></entry><entry colsep="1" rowsep="1"><para><emphasis role="strong">MPEG-2</emphasis> </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>0 0 </para></entry><entry colsep="1" rowsep="1"><para>44100 Hz </para></entry><entry colsep="1" rowsep="1"><para>22050 Hz </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>0 1 </para></entry><entry colsep="1" rowsep="1"><para>48000 Hz </para></entry><entry colsep="1" rowsep="1"><para>24000 Hz </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>1 0 </para></entry><entry colsep="1" rowsep="1"><para>32000 Hz </para></entry><entry colsep="1" rowsep="1"><para>16000 Hz </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>1 1 </para></entry><entry align="center" colsep="1" nameend="col_2" namest="col_1" rowsep="1"/></row></tbody></tgroup></informaltable><para>Three bits is not needed in the decoding process at all. These are the copyright bit, original home bit and the private bit. The copyright has the same meaning as the copyright bit on CDs and DAT tapes, i.e. telling that it is illegal to copy the contents if the bit is set. The original home bit indicates, if set, that the frame is located on its original media. No one seems to know what the privat bit is good for. </para><para>If the protection bit is NOT set then the frame header is followed by a 16 bit checksum, inserted before the audio data. If the padding bit is set then the frame is padded with an extra byte. Knowing this the size of the complete frame can be calculated with the following formula </para><screen><![CDATA[                 FrameSize = 144 * BitRate / SampleRate
                   when the padding bit is cleared and
                 FrameSize = (144 * BitRate / SampleRate) + 1
                   when the padding bit is set.]]></screen><para>The !frameSize is of course an integer. If for an example BitRate=128000, SampleRate=44100 and the padding bit is cleared, then the FrameSize = 144 * 128000 / 44100 = 417 </para><para>The mode field is used to tell which sort of stereo/mono encoding that has been used. The purpose of the mode extension field is different for different layers, but I really don't know exactly what it's for. </para><informaltable><tgroup cols="2"><colspec colname="col_0"/><colspec colname="col_1"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para><emphasis role="strong"> Mode value </emphasis> </para></entry><entry colsep="1" rowsep="1"><para><emphasis role="strong"> mode </emphasis> </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 0 </para></entry><entry colsep="1" rowsep="1"><para> Stereo </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 1 </para></entry><entry colsep="1" rowsep="1"><para> Joint stereo </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 1 0 </para></entry><entry colsep="1" rowsep="1"><para> Dual channel </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 1 1 </para></entry><entry colsep="1" rowsep="1"><para> Mono </para></entry></row></tbody></tgroup></informaltable><para>The last field is the emphasis field. It is used to sort of 're-equalize' the sound after a Dolby-like noise supression. This is not very used and will probably never be. The following noise supression model is used </para><informaltable><tgroup cols="2"><colspec colname="col_0"/><colspec colname="col_1"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para><emphasis role="strong"> Emphasis value </emphasis> </para></entry><entry colsep="1" rowsep="1"><para><emphasis role="strong"> Emphasis method </emphasis> </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 0 </para></entry><entry colsep="1" rowsep="1"><para> none </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 0 1 </para></entry><entry colsep="1" rowsep="1"><para> 50/15ms </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 1 0 </para></entry><entry colsep="1" rowsep="1"/></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> 1 1 </para></entry><entry colsep="1" rowsep="1"><para> CCITT j.17 </para></entry></row></tbody></tgroup></informaltable></section></article>