#pragma section-numbers on
Informal standard<
> Document: id3v2-accessibility-1.0-draft3<
> C. Newell<
> 22 August 2006<
>
= ID3v2 Accessibility Addendum =
== Status of this document ==
Third draft - this document is a proposed addendum to the [[id3v2.3.0|ID3v2.3]] and [[id3v2.4.0-frames|ID3v2.4]] standards. Distribution of this document is unlimited.
== Abstract ==
This document describes extensions which make ID3v2 metadata accessible to the visually impaired. The approach may also be useful for audio players which have limited display capabilities. A new frame type is proposed that carries an audio clip which can provide a verbal expression of the textual information carried by another ID3v2 frame.
<>
= Conventions in this document =
Text within "" is a text string exactly as it appears in a tag. Numbers preceded with $ are hexadecimal and numbers preceded with % are binary. $xx is used to indicate a byte with unknown content. %x is used to indicate a bit with unknown content. The most significant bit (MSB) of a byte is called 'bit 7' and the least significant bit (LSB) is called 'bit 0'.
A tag is the whole tag described the ID3v2 main structure document [2]. A frame is a block of information in the tag. The tag consists of a header, frames and optional padding. A field is a piece of information; one value, a string etc. A numeric string is a string that consists of the characters "0123456789" only.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [[#rfc2119|RFC 2119]].
= Introduction =
The ID3v2 standards provide a way to deliver metadata that is predominantly human-readable, textual data. However, in this form the information is not easily accessible to the visually impaired.
The purpose of this Addendum is to allow content providers or third-party tools to provide an audio description (i.e. a spoken narrative) that is equivalent to the textual information carried by an ID3v2 frame. A new "audio-text" frame is defined which carries an audio clip and a matching equivalent text string. These text strings can be compared against the strings carried by other ID3v2 frames to identify when a matching audio description is available.
The audio clips can be played whenever the equivalent textual information is displayed or highlighted, providing a greatly improved user interface for the visually impaired. However, the feature may also be popular with other users and useful for media players with limited display capabilities.
= Proposed audio-text frame =
The purpose of this frame is to carry a short audio clip which represents the information carried by another ID3v2 frame that is present in the same tag.
To avoid these audio clips being confused with the main audio content of the file the ID3v2 unsynchronisation scheme must be used if the audio clip uses an MPEG audio format. If the unsynchronisation scheme is not appropriate for the audio format then the scrambling scheme defined in section 5 must be applied to the audio clip data.
{{{
Text encoding $xx
MIME type $00
Flags %0000000a
Equivalent text $00 (00)
Audio data
}}}
The Frame ID for the audio-text frame shall be set to "ATXT" using ISO-8859-1 character encoding.
The MIME type shall be represented as a terminated string encoded using ISO-8859-1 character encoding. Where the MIME type corresponds to MPEG 1/2 layer I, II and III, MPEG 2.5 or AAC audio the ID3v2 unsynchronisation scheme should be applied, either to the audio-text frame or to the tag which contains it. For other MIME types the scrambling scheme defined in the Appendix should be applied to the audio data.
Flag a - Scrambling flag:: This flag shall be set if the scrambling method defined in Section 5 has been applied to the audio data, or not set if no scrambling has been applied. <
><
>The Equivalent text field carries a null terminated string encoded according to the Text encoding byte as defined by the ID3v2 specifications [1], [2]. This text must be semantically equivalent to the spoken narrative in the audio clip and should match the text and encoding used by another ID3v2 frame in the tag. <
><
>The Audio data carries an audio clip which provides the audio description. The encoding of the audio data shall match the MIME type field and the data shall be scrambled if the scrambling flag is set. <
><
>More than one audio-text frame may be present in a tag but each must carry a unique string in the Equivalent text field.
= Scrambling scheme for non-MPEG audio formats =
This scrambling scheme is provided for non-MPEG audio formats where the unsynchronisation scheme defined by the ID3v2 specifications is unsuitable. Each bit of the audio data is scrambled by taking the exclusive-OR (XOR) between it and the equivalent bit of a pseudo-random byte sequence. The first byte of this pseudo-random byte sequence is always %11111110 and is used to scramble the first byte of the audio data. The next byte of the sequence is derived from the current byte of the sequence using the algorithm in Table 1 and is used to scramble the next byte of audio data. This process is repeated until all bytes in the audio clip have been scrambled.
<
>
||||'''Table 1: Scrambling sequence algorithm'''||
||