Compression Applied to Soundfields
Created:Targets of Testing
To determine the value of compression in terms of file reduction compared to the quality of retaining the compression across the whole soundfield evenly. Also with discussion of the comparison of typical value of file reduction compared to the quality of resulting decompressed audio files. Also exploring if VBR compression types are safe to use on channel dependent soundfield material.
Codecs Tested
- OPUS via libopus
- VORBIS via libvorbis
- MP3 via LAME
- AAC via FFMPEG Native AAC
Codecs To Test
- AAC via FDKAAC
- AAC via apple
Test Material
- Test 1: Wideband noise material in 90 degree offset combined into a soundfield with a narrowband saw wave @ 2kHz in the front direction and a sine tone panned 90 degree to the left.
Methodology
For this test, we used ffpmeg to convert the original wave files to the targeted codec format, and decompress them back to the wave files. The goal of this process was to compare the decompressed waves with the original and build procedures to fundamentally compare their quality and analyze which codec format would be the closest to the original file. The chosen codec format were as follows:
Codec | Bitrate/Quality | Bitrate Mode |
---|---|---|
AAC | 16 | Constant |
AAC | 48 | Constant |
AAC | 320 | Constant |
AAC | 0.1 | Variable |
AAC | 2 | Variable |
AAC | 10 | Variable |
VORBIS | 0 | Variable |
VORBIS | 10 | Variable |
VORBIS | 0 | Variable |
OPUS | 16 | Constant |
OPUS | 32 | Constant |
OPUS | 96 | Constant |
OPUS | 8 | Variable |
OPUS | 32 | Variable |
MP3 | 320 | Constant |
MP3 | 30 | Variable |
MP3 | 260 | Variable |
First Test Case
To proceed, a mean over the total time of the Magnitude Squared Coherence Estimate was taken between the source PCM audio and the codec’s decompressed PCM of each compression and quality to get the Correlation Percent
.
The Time Difference
was calculated by comparing the number of samples in the source PCM audio file to the number of samples in the decompressed PCM of each compression and quality.
Figure below, depicts the accumulated results:
Results of File Size
Codec | Filesize after Compression | Correlation Percent* | Time Difference** | Multichannel Support | |
---|---|---|---|---|---|
Source WAV | 7.1MB | –100%– | – | Yes | |
VORBIS Q0 | 476KB | 45.85% | 0.0054% | Yes | |
VORBIS Q3 | 696KB | 66.16% | 0.0054% | Yes | |
VORBIS Q7 | 1.1MB | 88.28% | 0.0054% | Yes | |
VORBIS Q10 | 2.2MB | 98.81% | 0.0054% | Yes | |
OPUS C16 | 168KB | 22.71% | 0.065% | Yes | |
OPUS C32 | 328MB | 37.78% | 0.065% | Yes | |
OPUS C96 | 969KB | 81.41% | 0.065% | Yes | |
OPUS V8 | 63KB | 16.86% | 0.065% | Yes | |
OPUS V16 | 166KB | 23.09% | 0.065% | Yes | |
OPUS V32 | 478KB | 51.66% | 0.065% | Yes | |
AAC C16 | 189KB | 21.21% | 0.31% | Yes | |
AAC C48 | 511KB | 46.01% | 0.31% | Yes | |
AAC C320 | 2.4MB | 98.89% | 0.31% | Yes | |
AAC V0 | 437KB | 51.78% | 0.31% | Yes | |
AAC V0.1 | 45KB | 12.44%*** | 0.31% | Yes | |
AAC V2 | 1.3MB | 91.00% | 0.31% | Yes | |
AAC V10 | 2.1MB | 99.67% | 0.31% | Yes | |
MP3 C320 | 3.2MB | 91.73% | No | ||
MP3 V30 | 323KB | 29.61% | No | ||
MP3 V260 | 2.68MB | 91.64% | No |
* Only for showing decoded signal difference without compensating for time domain changes due to codec (not conclusive, see #Limitations section for more information)
** Calculated after decoding back to PCM and comparing number of samples
*** Possible exploit issue in command of compression settings, not definitive
Observations for the First Test Case
- AAC: This compression consistently changes the time domain which may be less safe/ideal for soundfield audio (multichannel). aac native lib of ffmpeg also did not seem to be the best version of AAC tested so far.
- Vorbis: This was the most consistent codec in terms of de-compressing back the results closest to source, even when material was noise/wideband related
- Opus: This codec did a fair job and maintaining consistency on harmonic material (might be better for music related material) but did not perform as well on wideband sound material (maybe less ideal for some content and most sound design content)
It is important to mention that many of the codecs introduce extra samples at the beginning and end of the rendered files which make our test method faulty as it is depends only on the time alignment, therefore a high quality rendering might be considered low, only because it is not in phase with the original. For instance, VORBIS proves to be the closest to the original uncompressed files, it may not necessarily indicate the actual higher quality.Therefore, it is important for us to think of more comprehensive methods to analyze the quality of the codecs more objectively.
Second Test Case
In order to check for the actual quality of the decompressed files regardless of their time differences, for the second round of our test, we time aligned all the stems, flipped their phase and listened to all of them again. We were looking for the highest cancellation to check for the amount of similarity with the original. The top five percieved quality ranking were as follows:
As the next step, we calculated the magnitude squared coherence estimate to quantitaviely meassure the similarity of the decompressed files with the original wav files. It is calculated by taking the average over time domian of the two signals:
Correlation Percent = Mean(Magnitude Squared Coherence Estimate)
Magnitude Squared Coherence Estimate = absolute(power spectral density [source * compressed decoded])^2 / (power spectral density of source * power spectral density of compressed decoded)
Spatial Decode Listening Results:
The Spatialially decoded results are available here. Both the source multichannel audio and decompressed codec multichannel audio are being decoded in as Mach1Spatial 8 channel (YPR) at the same settings, the codec test material are then subtracted from the source so we can audibly listen to the remainder which would be the “difference” of signals due to the compression process.
Source Spatially Decoded:
SOURCE WAV [PCM]:
Decoded Difference:
Note: This test is to audibly hear the difference, the less heard the more successful the codec at this quality setting was at recreating the source after decompressing. AAC C16 Phase Cancelled:
AAC C48 Phase Cancelled:
AAC C320 Phase Cancelled:
AAC V0 Phase Cancelled:
AAC V2 Phase Cancelled:
AAC V10 Phase Cancelled:
MP3 C320 Phase Cancelled:
MP3 V30 Phase Cancelled:
MP3 V260 Phase Cancelled:
OPUS C16 Phase Cancelled:
OPUS C32 Phase Cancelled:
OPUS C96 Phase Cancelled:
OPUS V8 Phase Cancelled:
OPUS V16 Phase Cancelled:
OPUS V32 Phase Cancelled:
VORBIS Q0 Phase Cancelled:
VORBIS Q3 Phase Cancelled:
VORBIS Q7 Phase Cancelled:
VORBIS Q10 Phase Cancelled:
Decoded Codecs:
Note: This is to hear the audio after compression->decompression to hear how it alters from the source in a subjective comparison.
AAC C16
AAC C48
AAC C320
AAC V0
AAC V2
AAC V10
MP3 C320
MP3 V30
MP3 V260
OPUS C16
OPUS C32
OPUS C96
OPUS V8
OPUS V16
OPUS V32
VORBIS Q0
VORBIS Q3
VORBIS Q7
VORBIS Q10
Limitations and Future Works
Although these tests are analyzing different codecs and how they are affecting the files, they do not necessarily study how they might affect the spatial image of the multichannel files. One might argue that any changes in the sample numbers, amplitude and the frequency domain will inevitably affect the spatial image, but it is crucial to come up with methods to analyze this in further details.
More importantly, it is important to consider that our test methods are ultimately based on our subjective judgment of the decompression quality. Therefore, we might need to establish a more solid testing methodologies in future to increase the validity of our results. Further testing on spherical harmonic spatial audio formats should be conducted for soundfield changes to correlated multichannel audio formats. Testing on VBAP/VVBP/SPS formats would likely not yield different results from stereo field testing as the channels are uncorrelated already. As is with stereo soundfields being altered due to compression quality being too low to support containing the unique differences of channels, all spatial audio will be subject to the same effect.