Creating Binaural Sound, continued

hfriedman

4

Head-Related Transfer Functions

Though the combination of localization cues are are quite complicated, we can fortunately measure and represent them as a single filter called the Head-Related Transfer Function (HRTF). A HRTF is the characterisation of how our ears receive sound from a particular point in space. It’s important to mention that the HRTF is in the frequency domain and its time domain equivalent (the inverse Fourier Transform) is Head-Related Impulse Response (HRIR) [1][5][6].

The HRTF is usually measured by placing microphones at the entrance of each ear canal and recording an impulse played by a loudspeaker at a particular location. The recording is mostly done in anechoic environments though a very large room may be sufficient as well as long as the room reflections do not interfere with the direct sound and can be removed from the measurement [1]. The HRIR is then extracted and its HRTF is found by taking the Fourier Transform of the HRIR. We do this as filtering a signal with a HRTF is much simpler. If we take the signals Fourier Transform, we only have to multiply it with the HRTF, whereas if we were to use the HRIR, we would have to convolve the signal with the HRIR, which is much more
difficult. For this reason, we try to work in the frequency domain as much as possible. Additionally, we must account for the effect of the loudspeaker and microphones’ characteristics to get an accurate HRTF. This can be represented as follows:

$Y(ω) = X(ω)S(ω)M(ω)H(ω)$

where Y is the recorded signal, X is the test signal, S is the
transfer function of the loudspeaker and amplifier, M is the
transfer function of the microphone and pre-amplifier, and
H is the HRTF. We just need to isolate the HRTF like so:

$H(ω) = Y(ω)/X(ω)S(ω)M(ω)$

Listening to a signal filtered by the individual HRTFs, we experience the sound as if we were hearing that signal in the position and space the HRTF was taken. Because most HRTFs are measured in an anechoic environment, audio effects such as reverberation are added to the sound to give a more accurate representation of the chosen virtual space. Still, unless the listener’s own HRTFs were measured, the signal will still not sound exactly like how the listener hears. This can lead to incorrect localisation, issues with externalisation, and confusion between sounds positioned in front or behind the listener. The research is still ongoing but studies looking to personalise HRTFs [8][9][10] have had promising results.

Synthesis

As is common in VR, there are often sounds you would like to feature that don’t occur in the real world, and thus can’t record binaurally. In this case, you have to synthesise these sounds while also making them binaural. This can be done by multiplying the Fourier Transform of the wanted signal with the correct HRTF pair (or convolving the signal with the HRIR pair)[6]. This results in a static sound that will seem to originate from the point in space that particular HRTF was measured at.

This still puts a limitation on both the sound and the listener. Since only one HRTF filter was used, that sound can only appear to be located in that one spot. It does not take into account the movement of the listener and assumes they are static. Since, when wearing headphones, their headphones follow their head movement, if the listener turns their head to the side, it appears the sound follows in the same arc, seeming to originate from the same position in relation to the listener’s head.

To account for head movement or the listener’s change in location, a head tracker must be used. For a dynamic listener, the head tracker will frequently check and update the position of the listener [1][6]. This must also result in a change of HRTF filters depending on the position of the
listener so that they respond in real time.

Similarly, the HRTFs must continually be updated if the sound source is dynamic. Furthermore, HRTFs can be interpolated between the discrete points of previously measured HRTFs as to give the sound a smooth trajectory. Interpolation can also be used to approximate a new source location for which we don’t have a previously measured HRTF for. For both a dynamic listener and dynamic sound source, both must work in tandem, both updating the HRTFs based on the location of the listener and the trajectory of some dynamic sound.

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Head-Related Transfer Functions

Synthesis

License

Share This Book