The best sound masking sound is a sound that sounds just like the sound that it is supposed to mask, except that the masking sound is a time and phase scrambled version of the original sound. The worst masking sound sounds nothing like the sound that is supposed to be masked. How loud would a hiss sound have to be to mask the staccato tonal presence of a rapidly plucked bass guitar? Probably 40 dB louder than the guitar. If the guitar is being played at 50 dB,A and a steam pipe hiss is kicked on at about 90 dB,A, maybe, just maybe most of the guitar sound would be drowned out. But if the reverberant sound of the guitar itself was used along with a wild set of time delayed attack transients mixed back in, we could mask out the guitar with a sound masking sound level that equals the guitar level alone. Post masking is the psychoacoustic process of listening to a direct sound which is quickly followed by a sound masking type of sound. In this case the head end ringing is post masking the direct signal.
Let’s look at the statistical version of music. There are 8 separate sound bursts per second, which means each burst lasts 1/16 second and it is followed by 1/16 second of silence, in a perfect world. The head end ringing energy from the leading edge of the former attack transient begins arriving at the listener’s position 1/12th second after the leading edge is heard, which is slightly after the tone burst turns off. Secondly, that same head end ringing is arriving at the listener at just about the same time that the next attack transient begins to arrive. The scrambled version of the leading edge of the first attack transient begins to arrive after 1/12 second after the beginning of the direct signal 1/16th second tone burst. That means the scrambled part of that tone burst begins to arrive just after the end of the direct tone burst, and it proceeds to fill in this subsequent silent 1/16th second time period in the tone burst sequence.
There is another aspect of attack transients we need to take a look at. It’s about listening to music and understanding what we are hearing. Each sound of music can be outlined by the ARSD pattern. When people in general listen to music they listen to the sequence of sustains. But when audiophiles and recording engineers listen, people heavily vested into the sound of the sound they are hearing, they naturally or through training learn to focus on and hear the sound of the attack transient. The truth of the sound is in the sound of the attack transient part of the sound, not in the sustain.
This is born out through psychoacoustic testing. The fundamentals and upper partials of an attack transient define the coloration of the tone of an instrument. Yet, some instruments can have exactly the same set of overtones and sound the same during the sustain but still sound different when their sound includes the attack transient. Tests have been done where the upper partials of an instrument are electronically time delayed, changing the relative phase of the fundamental to the upper partials. There is but only a slight recognition of the changes being made. However if the changes are made before each sound is struck, which includes the attack transient, the relative shifts of upper partial waveform timing are readily noticed. It was only when the phase alignment of the upper partials were included in the attack transient that synthesized musical notes began to sound real.
If we are listening to a rapid sequence of tone bursts, we want to hear the fine structure of the leading edge of the attack transient. This is where the accuracy or irregularity of the upper partial harmonic structure of the musical tone is best perceived. The musicality revealing fine structure of the attack transient can be masked, obscured by excessive early reverberation, head end ringing. We might electronically be revealing the top 15 dB of the attack transient, as it rises up out of the background din of ongoing sounds that are part of the music. However, due to head end ringing, we might only be able to sonically reveal just the top 4 dB of the attack transient.
The real program material is delivering 15 dB of dynamic range but with head end ringing being uncontrolled, the dynamic range is reduced to only 4 dB. Music suffers from a lack of dynamics because of the masking effect due to head end ringing. The music sounds as if it is compressed with a limiter. Secondly, we lost the ability to hear the lower 11 dB of the audible attack transient. We’ve lost the ability to hear more of the attack transient because lingering sound from head end ringing has back filled into the short period of electronic silence that actually is in the original music track. Not being able to hear more low level detail in the attack transient limits our ability to perceive upper partial musical detail.