3
The next question is how do we create binaural audio? Now that we know how localisation cues work and how they can be represented, we can surmise that we’ll have to get those cues in the sound that’s delivered to us somehow. The two ways we can do this is by recording binaural audio
and synthesising it.
Recording
To record binaural audio, we need to maintain those localisation cues that human hearing uses. Since these are mainly caused by our physical form, the easiest way to achieve this is to place a human or vaguely human-shaped object into the room and record the sounds at this person’s ear canals by blocking their ears and placing microphones just at the entrance. This way, the microphones will capture the sound as we hear it as something human-shaped is being used to interact with the sound waves and create the needed cues. The technique using human subjects is called the Blocked Meatus Method [1].
Humans are often switched out in favour of dummy heads or mannequin for convenience. This is often called the Dummy method [6].
The downsides of this approach are that:
- The recording is static, meaning that the perspective
cannot move, and - If a dummy is used or a person that is not the listener to record the audio, then the audio does not correspond exactly to the hearing of the listener and rather corresponds to the hearing of the one being recorded.
The first issue can be remedied by the Motion-Tracked Binaural (MTB) method [7]. This technique uses many microphones distributed over the horizontal plane of a head-shaped surface. This captures audio from many different potential perspectives. Upon playback, the listener’s head can be tracked and, given a certain head orientation, the sound recorded by the pair of microphones that correspond to that orientation will play back.
The second issue is much harder to solve with recorded sounds. In theory, the solution would be to record sounds using the Blocked Meatus method but this is not only expensive but also highly impractical. Another option is personalizing what is called the Head-Related Transfer Functions used for binaural synthesis.