Ohmic Audio

⚙️ ENGINEER LEVEL: Spatial Audio Rendering Theory

🔰 BEGINNER LEVEL: How Humans Perceive 3D Sound

Spatial audio is the technology that allows us to place a sound "anywhere" in the room, even if there isn't a speaker exactly in that spot. To do this, we have to trick the human brain using the same clues it uses in the real world.

1. The "Big Three" Clues

Your brain is a supercomputer that constantly analyzes sound to find out where it's coming from. It uses three main clues:

2. The "Acoustic Mirage"

In a spatial audio system, we use many speakers to recreate these three clues simultaneously. By carefully adjusting the timing and volume of each speaker, we can create an "acoustic mirage"—making you believe a helicopter is flying over your head, even though the speakers are only in the car doors and dashboard.

3. Why Stereo Isn't Enough

Standard stereo only lets you move sound from Left to Right. Spatial audio (like Dolby Atmos) adds "Height" and "Depth," turning the flat line of stereo into a full 3D bubble. This is especially important in cars, where the seats are fixed and listeners are always "off-center."

4. The "Sweet Spot" Challenge

In a living room, you can sit right in the middle of the speakers. In a car, you are always too close to one speaker and too far from others. Spatial audio technology fixes this by using a computer to delay the sound from the closest speakers, making it feel like you are sitting in the perfect center of the music.

Key Takeaways for Beginners:

  • Goal: To create a 360-degree "sound bubble."
  • Method: Tricking the brain with tiny timing and volume changes.
  • Benefit: A much more realistic and immersive experience.

🔧 INSTALLER LEVEL: Deploying Immersive Systems

For an installer, spatial audio is all about Placement, Alignment, and Reflection Management. A 1-inch error in speaker placement can ruin the 3D effect.

1. The Geometry of the Sweet Spot

In a spatial system, the "Sweet Spot" is the point where all sound waves arrive at the correct time. In a car, we usually tune for the driver's head position. This requires using a laser measurer to find the exact distance from the headrest to every single speaker (Tweeters, Mids, Heights, and Surrounds).

2. Height Channel Integration

True 3D audio requires height channels. Installers have three main options:

Method Implementation Acoustic Result
Discrete Overhead Speakers in the headliner. The best 3D effect; requires major interior work.
Reflection (Up-firing) Speakers on the dash angled at the glass. Easier to install; depends heavily on windshield angle.
Virtual (HRTF) DSP processing on door speakers. No extra speakers; only works for one seat.

3. Managing Windshield Reflections

The windshield is an "acoustic mirror." If you place a center channel or height channel near the glass, the sound will bounce off the glass and hit the listener slightly after the direct sound. This causes Comb Filtering (a hollow, "tinny" sound). Installers must use a DSP with high-resolution EQ to notch out these reflection frequencies.

Installer Insight: When setting up a spatial system, always start by muting everything except the Center channel and the Height channels. If the "voice" doesn't sound like it's coming from the middle of the windshield at eye level, your timing is wrong. Adjust the delay in 0.02ms increments until the image "snaps" into place.

4. Calibration Workflow

  1. Physical Alignment: Level and aim all speakers toward the listener.
  2. Time Alignment: Use an impulse response (IR) measurement to sync all speakers to T=0.
  3. Level Matching: Ensure all channels hit the same SPL (usually 75dB) at the listener's ear.
  4. Spatial Verification: Use a "Circling Pink Noise" track to ensure the sound moves smoothly around the cabin.

5. Phase Coherence Checklist

⚙️ ENGINEER LEVEL: Mathematical Rendering and Field Synthesis

Spatial rendering is the mathematical process of mapping a sound object at coordinate P(x, y, z) to an array of N speakers with fixed positions.

1. Vector Base Amplitude Panning (VBAP)

VBAP is used for 3D speaker layouts. It treats the speakers as vectors on a unit sphere. For any desired sound direction p, we select the three closest speakers (forming a triangle) and calculate their gains g.

Source vector: p = [px, py, pz]T

Speaker matrix: L = [l1, l2, l3]

Gain solution: g = L-1p

To maintain constant power, the gains are normalized: gi,norm = gi / sqrt(Σ gk2). If the source moves outside the triangle, the algorithm must "cross-fade" to the next set of speakers.

2. Higher-Order Ambisonics (HOA)

Ambisonics is "scene-based" audio. Instead of speakers, we describe the sound field as a sum of Spherical Harmonics (Ynm). The pressure at any point (θ, φ) is:

p(t, θ, φ) = Σn=0N Σm=-nn bnm(t) Ynm(θ, φ)

The first 16 Spherical Harmonic functions (up to 3rd order) are:

Order (n) Degree (m) Name Function Ynm(θ, φ)
00W1 / sqrt(4π)
1-1Ysqrt(3/4π) · sin θ sin φ
10Zsqrt(3/4π) · cos θ
11Xsqrt(3/4π) · sin θ cos φ
2-2Vsqrt(15/16π) · sin2 θ sin 2φ
2-1Qsqrt(15/4π) · sin θ cos θ sin φ
20Usqrt(5/16π) · (3 cos2 θ - 1)
21Ssqrt(15/4π) · sin θ cos θ cos φ
22Rsqrt(15/16π) · sin2 θ cos 2φ

3. Binaural Synthesis and HRTFs

For headphone or near-field rendering, we convolve the source signal S(t) with the Head-Related Impulse Response (HRIR) for the target angle:

Left Ear: L(t) = S(t) * hL(θ, φ, t)

Right Ear: R(t) = S(t) * hR(θ, φ, t)

In frequency domain: L(f) = S(f) · HL(f, θ, φ). The Head-Related Transfer Function (HRTF) includes the ITD, ILD, and pinna filtering effects.

4. Ambisonic Decoding Strategies

To reproduce HOA signals on a speaker array, we use a decoder matrix D. The speaker signals s are given by:

s = D * b

Several decoding methods exist:

5. Wave Field Synthesis (WFS)

WFS is based on the Huygens-Fresnel Principle, it uses a very dense line of speakers to physically reconstruct the wavefront of a sound. The driving signal for a speaker at xs to recreate a point source at x0 is:

q(t) = (1 / 2π) · (zs / |xs - x0|3/2) · δ'(t - |xs - x0|/c)

Advanced: Field Modeling and Room Impulse Response (RIR)

The car cabin is not an open field. It is a highly reflective box. To render spatial audio accurately, we must account for the Room Impulse Response (RIR).

1. The Mirror Image Method

To simulate reflections off the glass and doors, we create "Virtual Sources" outside the car. For a source S at distance d from a flat glass surface, a virtual source S' is placed at distance d on the other side of the glass. The resulting sound at the listener is the sum of the direct path and all reflected paths:

p(t) = Σ (Ai / ri) · s(t - ri/c)

2. Binaural Room Impulse Response (BRIR)

A BRIR is an HRTF that also includes the room's reflections. When you listen to a BRIR-rendered sound on headphones, it feels like you are sitting in a specific car cabin.

3. The Schroeder Frequency

In small cabins, there is a transition frequency (usually around 200Hz-400Hz) below which the sound behaves as Standing Waves (Modes) and above which it behaves like Rays.

fs = 2000 · sqrt(T60 / V)

4. Spherical Harmonic Transform (SHT)

The process of converting a multi-microphone recording into Ambisonic coefficients is the SHT. It involves integrating the pressure p(θ, φ) over the surface of a sphere:

bnm = ∫∫ p(θ, φ) Ynm*(θ, φ) sin θ dθ dφ

In practice, we use discrete summation over a finite number of microphone capsules.

5. Near-Field Compensation (NFC)

When sound sources are close to the head (as in a car), the wavefront is spherical, not planar. Ambisonics must be corrected for this Near-Field Effect, which causes a bass boost. We apply a series of high-pass filters to the Ambisonic channels:

Hn(s) = (Σ aksk) / (Σ bksk)

Where the coefficients depend on the order n and the distance to the source.

Advanced: Distance Rendering and Acoustic Environment

True spatial immersion requires more than just direction; it requires "Distance Cues."

1. The Inverse Square Law

Sound pressure level (SPL) drops by 6dB for every doubling of distance. However, in a small car cabin, the Critical Distance is very short (~0.5m).

2. Air Absorption (Low-Pass Filtering)

Air acts as a low-pass filter over long distances. To make a sound seem 10 meters away in a 2-meter car cabin, the renderer must apply a specific high-shelf dip based on the ISO 9613-1 standard.

3. Doppler Shift

For moving objects, the renderer must continuously update the delay line. The frequency shift f' is:

f' = f · (v / (v + vsource))

Perceptual Evaluation: Measuring "Realism"

How do we know if the math is working? We use standardized listening tests and objective metrics.

Individualized HRTF and Head Tracking

Standard HRTFs work well, but they aren't perfect. For the most immersive experience, the system must be personalized.

1. Anthropometric Scaling

We can estimate a person's HRTF by measuring their head width, ear size, and torso height. These Anthropometric Parameters are used to scale a generic HRTF database (like CIPIC).

2. Head-Tracking Latency

In a car with "Spatial Alerts," if the driver moves their head, the sound must stay "locked" to the hazard (e.g., a blind-spot warning). This requires head-tracking with a Motion-to-Photon Latency of less than 20ms to prevent nausea.

3. Asynchronous Time Warping (ATW)

To hide audio processing delay, we use ATW to "rotate" the finished audio buffer based on the very latest head-position data just before it is sent to the speakers.

Engineering Challenge: Latency and Lip-Sync

Spatial rendering is computationally heavy. Engineers must:

Technical Specifications: Spatial Formats

Format Type Max Channels Math Foundation
Dolby AtmosObject-Based128 ObjectsVBAP / Metadata
MPEG-HHybridVariableHOA + Objects
B-FormatScene-Based4 (1st Order)Spherical Harmonics
Auro-3DChannel-Based13.1Level/Time Matrix
BinauralEar-Based2HRTF Convolution

Glossary: Spatial Audio Engineering

Azimuth
The horizontal angle of a sound source.
Elevation
The vertical angle of a sound source.
Decorrelation
The process of making two signals mathematically different.
Spherical Harmonics
Mathematical functions used to describe the distribution of sound on a sphere.
ITD
Inter-aural Time Difference.
ILD
Inter-aural Level Difference.
Object-Based Audio
Audio stored as individual files plus metadata (x, y, z).
Near-Field Effect
The bass boost that occurs when a sound source is within 1 meter of the head.
HRIR
Head-Related Impulse Response (the time-domain version of HRTF).
B-Format
The standard 4-channel format for 1st-order Ambisonics.