—Arthur M. Noxon
Acoustic Sciences Corporation
Eugene, Oregon USA
Presented at the 85th AES Convention
Los Angeles, CA, November 1988
The Modulation Transfer Function (MTF) is used in room acoustics as a descriptor of the effectiveness of transmission down the signal path, between the speaker and listener. A major application for this has been speech intelligibility. Basis for MTF analysis is the signal to noise ratio. Noise can be any sound masking effect, steady state noise, transient noise of reverberation or apparent noise due to adjacent octave sound levels.
Narrow band MTF is used in the present work. This is in contrast to the octave band methods common to traditional speech intelligibility. Here, pure tone modulation used to develop spectral response detail. A rapidly gated, slow sine sweep is the test signal for the articulation response curve. This technique allows blurred transmission bands to be specifically identified. These narrow ranges of poor articulation are both audible to the listener and visible in hard copy data. Changes to the room acoustic are also easily documented. The responsiveness of this test to room acoustics in addition to the fine grain spectral information in the articulation response curve suggests that this system be used as a diagnostic tool. Although originally developed to demonstrate small room acoustics in the lower registers, it has found use in the full range of room sizes from the amphitheater and auditorium right through to recording studio vocal booth.
I. Articulation Response Curve (ARC)
The Modulation Transfer Function (MTF) is used in room acoustics as the descriptor of effective signal transmission between speaker and listener. A popular application of the MTF is for speech intelligibility. Here we look at an application of MTF developed for precision playback environments such as the hi-end, HiFi listening room and the recording studio. The suitability of the standard Speech Transmission Index (STI) approach falls short on numerous points in these smaller spaces that have high musical articulation requirements.
The spectrum segment useful for STI prediction or measurement starts at the 125 Hz band and each octave band is weighted for significance in speech recognition. Music occupies two octaves lower than the range used for STI work, half the keyboard is below Middle C 2 5 2 . The weighting of these and other octaves in a calculation is not yet established. The musical spectrum and the relative significance of each octave band may well not be the same as for speech. The Music Transmission Index (MTI) may be convertible to STI, but the converse may not be possible. This would be due to the relative lack of full bandwidth information in the STI. Clearly, research remains to be done in this area.
The STI joins the group of single index acoustic descriptors, such as NRC, dB, A, IIC, RT60, et. al. Architectural specifications can be satisfied with a single index indicator. Acoustical engineers and consultants engaged in diagnosis and remedy have always required spectral detail and the subject of intelligibility is no different.
Measured STI only needs the signal to noise ratio to be detected. Tracking octave band decay rates is one method used and monitoring modulated octave band noise levels is another. Both use selected octave bandwidths and yield a single intelligibility rating. The approach contributes little to the diagnosis of room acoustics. The present technique provides narrow band spectral articulation information. This facilitates diagnostic efforts and evaluation of STC.
The predictive side of MTF analysis requires the ability to accurately estimate the signal to noise ratio. The noise level is due to the reflections in the room and due to its reverberation. Predictive methods that use room reverberation decay rates have the prerequisite imposed that the room sound field is instantaneously diffuse and has an exponential decay rate.
A non-linear method of predicting noise levels is to use ray tracing of the first 30 reflections. This method better correlates with measured STI. Complex room geometry limits this method. Neither linear acoustics nor ray tracing can be used for predicting in small rooms dominated by room resonant mode decays.
The musical line is characterized as a rapid staccato of complex tone bursts. Music then is a set of musical lines, overlaid and intertwining one another. The basic element of this woven fabric of music is the tone burst. The acoustic descriptor that relates to musical articulation may well be the tone burst, indeed a rapid staccato of bursts. Such a signal has been used for harmonic distortion analysis room acoustic transmission path. Here we only desire measurement of the signal envelope and the faithfulness of its modulated transmission. Wave form reproduction, although important, is not the issue addressed.
A synthesis of these constraints and requirements is embodied in the present approach to MTF. The Articulation Frequency Response Curve (AFC) is a relatively simple, direct physical measurement. Equally important is the subjective aspect. The auditor in a precision listening setting can play the test signal over headphones and hear the rapid, clean staccato of tone bursts whose frequency is slowly varied. The auditor expects the room acoustic to play this signal accurately. By removing the headphones and listening to the same signal in the playback room, defects in the transmission path become quite audible. In a small room, articulation dramatically varies with frequency. Typically, there are tenth-octave bands of totally garbled transmission adjoining similar sized bands of quite intelligible transmission. The Articulation Response Curve is a fine-grained quantification of the “fast tracking” ability of a listening room.
II. Comparison With Tradition
A. Definition of Standard Terms
1. Signal Intensity (I)
Standard MTF format assumes the sound intensity envelope is a modulated cosine with a DC offset.
The mean signal intensity (Io) is modulated by the modulation amplitude (mIo).
2. Modulation Index (m)
The modulation index is defined as the ratio of the intensity of the modulation to the mean intensity of the signal, modulation plus noise.
It is also expressed in terms of the signal level Is = mIo and the noise level IN = Io – mIo.
3. Modulation Transfer (MT)
This is the attenuation in dB of the modulated signal. It is a function of the modulation index.
MT = 20 log m
4. Signal to Noise Ratio (SNR)
The signal to noise ratio is the level of difference between the signal and the noise (LS/N).
It can also be expressed in terms of the modulation index.
5. Transmission Index (TI)
The transmission index is the SNR measured in dB and expressed in percent. To do this the SNR is offset to a practical zero % level and then proportioned to the range of effective SNR. These are subjectively determined constants that relate the perceived threshold of modulation to the maximum value of modulation.
The offset is 12 dB and the range is 30 dB.
6. Speech Transmission Index (STI)
This is compiled as the sum of the weighted TI for each of the 7 octave bands and expressed in percent.
The weighting factors (WK) normalize to 1.
7. Octave Masking Effect (mO)
This occurs when the lower octave is louder than the measured one. 0.3% of the lower octave intensity is considered noise acting on the test signal.
The impact of simultaneous independent masking effects is carried by multiplying their independent modulation indices together.
m = m1 x m2
B. MTF In Presently Measured Terms
1. Signal Modulation Level (La)
Separated signal and noise levels are not directly measured in the present test method. The articulation response curve is the timewise evolution of the received sound levels. This easily allows measurement of La, the fluctuation in dB of the test signal.
2. Modulation Index m(La)
The modulation level (La) can be expressed in terms of modulation index by rewriting its definition.
Upon rearrangement, the modulation index is resolved solely in terms of measured level fluctuation (La).
3. Modulation Transfer (MT)
The reduction in modulation can be related to the modulation level at the receiver La.
4. Signal to Noise Ratio (SNR)
The signal to noise ratio is developed by using the new expression of the modulation index.
5. Transmission Index (TI)
The transmission index remains except as the SNR term is above has been modified.
6. Mean Transmission Index (TI)
The concept is to sum the various TI values similar to that as done with the STI. Data collected here is not from octave bands but from small bandwidths of tones having similar modulation levels.
The STI octave band weighting factor (WK) here is undefined. It will be carried in the form of (Wi) to suggest that a listener based preference fit option still remains open.
The octave bandwidth weighting actor in STI appears here as a “log frequency” term in the averaged 5.
7. Octave Masking Index (mo)
This effect is left out of the current presentation. However, it should be thoroughly investigated and ultimately included. It clearly is an operative with small room acoustics. It is easy to find bandwidths with low level articulation and low mean sound level that are just upfrequency from a loud and strongly fluctuating signal.
A given mean intensity level is given by the mean sound level (L)
Assume, for example, equal octave fraction band widths for a low frequency 75 dB level followed by a weaker 65 dB level
This single level shift is of small consequence but cumulative effects can occur due to a very rough response curve loaded with room resonances. Only 4 such 10 dB shifts would produce a 90% masking index.