Compression Applied to Soundfields

Created:

Targets of Testing

To determine the value of compression in terms of file reduction compared to the quality of retaining the compression across the whole soundfield evenly. Also with discussion of the comparison of typical value of file reduction compared to the quality of resulting decompressed audio files. Also exploring if VBR compression types are safe to use on channel dependent soundfield material.

Codecs Tested

  • OPUS via libopus
  • VORBIS via libvorbis
  • MP3 via LAME
  • AAC via FFMPEG Native AAC

Codecs To Test

  • AAC via FDKAAC
  • AAC via apple

Test Material

  • Test 1: Wideband noise material in 90 degree offset combined into a soundfield with a narrowband saw wave @ 2kHz in the front direction and a sine tone panned 90 degree to the left.

Methodology

For this test, we used ffpmeg to convert the original wave files to the targeted codec format, and decompress them back to the wave files. The goal of this process was to compare the decompressed waves with the original and build procedures to fundamentally compare their quality and analyze which codec format would be the closest to the original file. The chosen codec format were as follows:

Codec Bitrate/Quality Bitrate Mode
AAC 16 Constant
AAC 48 Constant
AAC 320 Constant
AAC 0.1 Variable
AAC 2 Variable
AAC 10 Variable
VORBIS 0 Variable
VORBIS 10 Variable
VORBIS 0 Variable
OPUS 16 Constant
OPUS 32 Constant
OPUS 96 Constant
OPUS 8 Variable
OPUS 32 Variable
MP3 320 Constant
MP3 30 Variable
MP3 260 Variable

First Test Case

To proceed, a mean over the total time of the Magnitude Squared Coherence Estimate was taken between the source PCM audio and the codec’s decompressed PCM of each compression and quality to get the Correlation Percent.

The Time Difference was calculated by comparing the number of samples in the source PCM audio file to the number of samples in the decompressed PCM of each compression and quality.

Figure below, depicts the accumulated results:

Results of File Size

Codec Filesize after Compression Correlation Percent* Time Difference** Multichannel Support
Source WAV 7.1MB –100%– Yes
VORBIS Q0 476KB 45.85% 0.0054% Yes
VORBIS Q3 696KB 66.16% 0.0054% Yes
VORBIS Q7 1.1MB 88.28% 0.0054% Yes
VORBIS Q10 2.2MB 98.81% 0.0054% Yes
OPUS C16 168KB 22.71% 0.065% Yes
OPUS C32 328MB 37.78% 0.065% Yes
OPUS C96 969KB 81.41% 0.065% Yes
OPUS V8 63KB 16.86% 0.065% Yes
OPUS V16 166KB 23.09% 0.065% Yes
OPUS V32 478KB 51.66% 0.065% Yes
AAC C16 189KB 21.21% 0.31% Yes
AAC C48 511KB 46.01% 0.31% Yes
AAC C320 2.4MB 98.89% 0.31% Yes
AAC V0 437KB 51.78% 0.31% Yes
AAC V0.1 45KB 12.44%*** 0.31% Yes
AAC V2 1.3MB 91.00% 0.31% Yes
AAC V10 2.1MB 99.67% 0.31% Yes
MP3 C320 3.2MB 91.73% No
MP3 V30 323KB 29.61% No
MP3 V260 2.68MB 91.64% No

* Only for showing decoded signal difference without compensating for time domain changes due to codec (not conclusive, see #Limitations section for more information)

** Calculated after decoding back to PCM and comparing number of samples

*** Possible exploit issue in command of compression settings, not definitive

Observations for the First Test Case

  • AAC: This compression consistently changes the time domain which may be less safe/ideal for soundfield audio (multichannel). aac native lib of ffmpeg also did not seem to be the best version of AAC tested so far.
  • Vorbis: This was the most consistent codec in terms of de-compressing back the results closest to source, even when material was noise/wideband related
  • Opus: This codec did a fair job and maintaining consistency on harmonic material (might be better for music related material) but did not perform as well on wideband sound material (maybe less ideal for some content and most sound design content)

It is important to mention that many of the codecs introduce extra samples at the beginning and end of the rendered files which make our test method faulty as it is depends only on the time alignment, therefore a high quality rendering might be considered low, only because it is not in phase with the original. For instance, VORBIS proves to be the closest to the original uncompressed files, it may not necessarily indicate the actual higher quality.Therefore, it is important for us to think of more comprehensive methods to analyze the quality of the codecs more objectively.

Second Test Case

In order to check for the actual quality of the decompressed files regardless of their time differences, for the second round of our test, we time aligned all the stems, flipped their phase and listened to all of them again. We were looking for the highest cancellation to check for the amount of similarity with the original. The top five percieved quality ranking were as follows:

As the next step, we calculated the magnitude squared coherence estimate to quantitaviely meassure the similarity of the decompressed files with the original wav files. It is calculated by taking the average over time domian of the two signals:

Correlation Percent = Mean(Magnitude Squared Coherence Estimate) 
Magnitude Squared Coherence Estimate = absolute(power spectral density [source * compressed decoded])^2 / (power spectral density of source * power spectral density of compressed decoded)

Spatial Decode Listening Results:

The Spatialially decoded results are available here. Both the source multichannel audio and decompressed codec multichannel audio are being decoded in as Mach1Spatial 8 channel (YPR) at the same settings, the codec test material are then subtracted from the source so we can audibly listen to the remainder which would be the “difference” of signals due to the compression process.

Source Spatially Decoded:

SOURCE WAV [PCM]:

Decoded Difference:

Note: This test is to audibly hear the difference, the less heard the more successful the codec at this quality setting was at recreating the source after decompressing. AAC C16 Phase Cancelled:

AAC C48 Phase Cancelled:

AAC C320 Phase Cancelled:

AAC V0 Phase Cancelled:

AAC V2 Phase Cancelled:

AAC V10 Phase Cancelled:

MP3 C320 Phase Cancelled:

MP3 V30 Phase Cancelled:

MP3 V260 Phase Cancelled:

OPUS C16 Phase Cancelled:

OPUS C32 Phase Cancelled:

OPUS C96 Phase Cancelled:

OPUS V8 Phase Cancelled:

OPUS V16 Phase Cancelled:

OPUS V32 Phase Cancelled:

VORBIS Q0 Phase Cancelled:

VORBIS Q3 Phase Cancelled:

VORBIS Q7 Phase Cancelled:

VORBIS Q10 Phase Cancelled:

Decoded Codecs:

Note: This is to hear the audio after compression->decompression to hear how it alters from the source in a subjective comparison.

AAC C16

AAC C48

AAC C320

AAC V0

AAC V2

AAC V10

MP3 C320

MP3 V30

MP3 V260

OPUS C16

OPUS C32

OPUS C96

OPUS V8

OPUS V16

OPUS V32

VORBIS Q0

VORBIS Q3

VORBIS Q7

VORBIS Q10

Limitations and Future Works

Although these tests are analyzing different codecs and how they are affecting the files, they do not necessarily study how they might affect the spatial image of the multichannel files. One might argue that any changes in the sample numbers, amplitude and the frequency domain will inevitably affect the spatial image, but it is crucial to come up with methods to analyze this in further details.

More importantly, it is important to consider that our test methods are ultimately based on our subjective judgment of the decompression quality. Therefore, we might need to establish a more solid testing methodologies in future to increase the validity of our results. Further testing on spherical harmonic spatial audio formats should be conducted for soundfield changes to correlated multichannel audio formats. Testing on VBAP/VVBP/SPS formats would likely not yield different results from stereo field testing as the channels are uncorrelated already. As is with stereo soundfields being altered due to compression quality being too low to support containing the unique differences of channels, all spatial audio will be subject to the same effect.