Assessing Audio Quality Using the MATT Test

The MATT test was developed by ASC founder, president and TubeTrap inventor, Art Noxon, PE Acoustical to identify audio quality issues so we can attain the highest quality sound possible from the components in your listening environment.

Article featured in AudioXpress by Richard Honeycutt

We humans have been paying attention to optimizing sound in rooms for a long time. The ancient Greeks and Romans took steps to optimize speech intelligibility when building theaters. But probably the earliest era in which specific attention was paid to good room design for music performance was in Baroque times (J. S. Bach and contemporaries).

Modern acoustical engineering started with the work of Wallace C. Sabine on Harvard University’s Fogg Lecture Hall. Sabine quickly recognized that the wretched lack of speech intelligibility resulted from excessive reverberation, so he developed methods for measuring and controlling reverberation. Thus, reverberation measurement and control became the first technological tool in the modern acoustician’s toolbox. Reverberation affects speech intelligibility and musical articulation, meaning that concert halls designed for sustained Gregorian Chants muddled the sound of highly articulated Baroque or Classical music. Sabine applied his “reverberation tool” later when he was the acoustic designer for the Boston Symphony Hall, and he recognized the need for sound to be evenly distributed throughout the seating area.

Early:Late Energy Ratio

In large rooms such as concert halls and theaters, the only significant frequency-dependent acoustical effect is the variation of acoustical absorption with frequency. There is no reason for anything like a “frequency response” measurement in these rooms. But in the latter part of the 20th century, other acoustical parameters were identified that have as much or more effect on music and speech articulation. These fell into the category of “early:late” energy ratios.

Figure 1 shows the impulse response (IR) of a somewhat unusual listening room. This IR shows the decay of acoustical energy over time. An example of an early:late ratio is “Clarity index” or “C80.” C80 is the 10× the log of the ratio of total energy contained in the first 80 ms after the onset of an impulsive sound to the total energy after 80 ms. Clarity is specified in decibels and the acceptable range is about +1 to -4 dB, although the best concert halls have C80 between about -1 and -4 dB.

Assessing Audio Quality Using the MATT Test

Figure 1: The impulse response of a listening room shows the room’s effect on the sound in the time domain.

A related early: late energy ratio is “Definition” or “D50,” which is the ratio of energy in the first 50 ms to total energy. Definition is specified in percentage and correlates more closely with speech intelligibility than with music articulation—C80 is more applicable in evaluating a room for music.

Speech Intelligibility

After the introduction of early:late energy ratios to the evaluation of auditorium acoustics, quantitative evaluation of speech intelligibility became an area of great interest. The first speech intelligibility metric in common use was called “percentage Articulation Loss of Consonants” or “%Alcons” and was introduced by V. M. A. Peutz in 1971. The basis of determining intelligibility was the ability of listeners to correctly distinguish consonants pronounced by a talker in the room. A standardized list of test syllables was used. Peutz developed a method of predicting %Alcons from the reverberation time (RT) and background noise level in the room.

Tammo Houtgast, Herman Steeneken, et al., introduced a method of specifying speech intelligibility using the Modulation Transfer Function (MTF), which quantified the effect upon speech articulation produced by any defect in the electroacoustical and/or acoustical path between the talker and the listener.

The metric is called the Speech Transmission Index (STI). The underlying theory is that speech can be considered to originate as a continuous wave created by the vocal folds, and then modulated by the throat, lips, and tongue. The “intelligence” or information is carried by the modulation and the depth of the modulation correlates with speech intelligibility. Reverberation, background noise, and other acoustical or electro-acoustical effects can decrease the modulation depth of the acoustic wave reaching the listener’s ears, hurting speech intelligibility.

Figure 3: Notice how the reverberation in this 1.5-second room fills in the spaces between sound bursts, decreasing modulation depth and STI.

Figure 2: Notice the spaces between sound bursts in this anechoic recording of speech.

Note that neither of these specifications of speech intelligibility addresses the naturalness of speech: Both exclusively target the articulation of speech. The sound of a letter “t,” when stretched by reverberation, masked by noise, or electronically distorted, is difficult to distinguish from the sound of a “p” or a “d.” Figure 2 and Figure 3 show an anechoically recorded speech signal and that same signal played in a room having a 1.5-second RT, respectively. Notice how the reverberation closes up the spaces between sound bursts (i.e., decreases the depth of modulation). Speech in the reverberant room would have a lower STI, indicating poorer speech intelligibility.

Continue reading the full article.