The ID3v2 standards provide a way to deliver metadata that is predominantly human-readable, textual data. However, in this form the information is not easily accessible to the visually impaired.

The purpose of this Addendum is to allow content providers or third-party tools to provide an audio description (i.e. a spoken narrative) that is equivalent to the textual information carried by an ID3v2 frame. A new "audio-text" frame is defined which carries an audio clip and a matching equivalent text string. These text strings can be compared against the strings carried by other ID3v2 frames to identify when a matching audio description is available.

The audio clips can be played whenever the equivalent textual information is displayed or highlighted, providing a greatly improved user interface for the visually impaired. However, the feature may also be popular with other users and useful for media players with limited display capabilities.