Remember this sequence from the Matrix ? “There is no spoon”.
Well recently I’m hearing people talk more and more about “resolution” in digital audio, and I’m here to tell you –
There is no resolution.
It’s a red herring – an idea-virus left over from the earliest days of digital audio, perpetuated by gear manufacturers to try and sell us more kit we don’t need. Here’s why.
It all starts with the myth:
“Digital can never sound as good as analogue”
This statement simply isn’t true, but it doesn’t stop people repeating it like some kind of mantra. The reasons they give usually hang on the fact that digital audio samples the audio – “freezing” it at regular moments in time – and claiming that it can therefore never sound as smooth and continuous as the original analogue signal.
You can see it for yourself, they say. Zoom in far enough on a digital waveform and eventually you can see the blocky, grainy, digital “stair-steps” – so it stands to reason that you can hear them, if your hearing and equipment is good enough, right ?
There are no stair-steps.
The same flawed reasoning is used to explain why we need ever-higher bit-depths and sample rates – since digital audio contains these audible “pixels”, the smaller they are, the better it will sound, supposedly.
And it’s easy to see why.
Is digital audio inherently flawed ?
Hold your hand up in front of your face, close one eye and look at the world through your fingers.
You’re “sampling” what you can see.
In between your fingers you can see the world, and where your fingers block out the light, you can’t see the view.
This is sampling. Breaking the world up into chunks and using the chunks to record what we see – or in audio, breaking the waveform into samples and recording those to store what we can hear.
Afterwards, we re-assemble the chunks to play back the audio. But surely this method is fatally flawed ? Everything between the samples, everything blocked out by your fingers, is lost forever. “Quantising” the image into slices like this introduces a fundamental error.
The thinner your fingers, the finer and more accurate the picture of the world, sure – and the more frequent the samples (ie. the higher the sample rate) the more accurate the audio reproduction, the higher the resolution, right ?
But digital audio can never be perfect, because the sampling steps are always there ?
This analogy misses out a fundamental part of the digital audio equation.
A hand-waving analogy
Look through your fingers again, at your sliced-up, sampled view of the world.
Now start to wave your fingers up and down slightly – move them faster and faster.
Everything starts to look a bit flickery, a bit less crisp and clear – but suddenly you can see all the details of the view.
Because your fingers are moving, you can see between them for some of the time, and this enables our eyes to build a complete view of the image behind. Not by guessing the missing pieces, as we have to when the hand is still, but actually by seeing all the information, over time.
We have added noise to the system – the hand-waving – but the noise reveals what would otherwise be hidden behind our fingers, in between the sampling steps – it actually removes the quantisation error.
(Actually strictly speaking it de-correlates the error from the input signal, changing it into noise instead of truncation distortion – but the main point is it sounds good.)
In digital audio, this noise is called dither.
And it solves the problem of quantisation distortion completely.
Sure, we’ve had to add a little noise, but it’s no-where near as crude as the analogy of looking through your waving fingers – that would be more like 8-bit audio. That amount of noise would be huge in comparison to what we’re trying to hear, but at 16-bits or more the noise is very quiet.
At 16 bits the dither noise is actually quieter than the original natural noise that was in the audio in the first place. At that point, the extra noise is virtually irrelevant. But crucially, there are never any stair-steps – we can always hear everything that was there to begin with, just with more or less noise.
Even in the crude, noisy, 8-bit “through the fingers” version we can still see the entire, smooth, original image – it’s just very noisy. The same applies to audio – take a listen for yourself:
In a properly implemented digital system, increasing the bit-depth doesn’t improve “resolution”, it just reduces noise.
Green-tinted spectacles ?
Look through you fingers again. Now imagine you’re wearing a bright green sun-shade.
You’re in the Matrix. Everything you can see is tinted green. Even though our finger-wobbling dither prevents there from being any “gaps” in the samples we can see, all the red and blue has been filtered out – we’ve put a high and low-pass filter on our vision, just like we do in digital audio.
Because (say the digital sceptics) we’re only recording a limited frequency range, right ? Audio doesn’t stop at 20 kHz, so why do we stop sampling it there ? Higher sample rates will give us a more accurate representation of the original signal, so it must sound better, right ?
Swap the green Matrix sun-shade for a pair of photochromic sunglasses – the kind that go dark in bright sun and clear in the shade.
Suddenly we have our colours back – removing the high and low-pass filters was a good thing, right ? The photochromic lenses adjust the “recording levels” of the light coming in to a comfortable level for our eyes, and we can see everything again.
Well, not quite.
Those filters weren’t actually removed at all – they’re just working at different frequencies.
All decent sunglasses always filter the incoming light, not just to reduce it to a comfortable brightness for us, but also to protect our eyes from harmful UV radiation which could otherwise damage our vision.
Even when they’re completely clear, the photochromic lenses are still filtering out all that very high-frequency light. We don’t notice though, because we can’t ever see it – with or without the high-frequency filtering sunglasses.
Digital audio does the same thing. The anti-aliasing filter (when recording) and the converter’s reconstruction filter (when playing back) remove all the unnecessary high-frequency information from the signal, allowing us to hear as much as we like of the original signal, depending on the sample rate.
We don’t need what we can’t hear.
A video camera that recorded ultraviolet light would certainly reproduce a more “accurate” version of the original view – but we still wouldn’t be able to see the extra information. All that extra UV light would do is give us a sun-tan while we watched it !
In the same way, recording additonal high-frequency content may give a closer representation of the original audio signal, but we still won’t be able to hear it. Remember, the random dither noise removes any imaginary restrictions of “resolution” in the signal – so all the sample rate does is extend the high-frequency response.
In a properly implemented digital system, increasing the sampling frequency above the limits of our hearing doesn’t improve “resolution”, it just increases bandwidth.
In fact, high sample rates may even make things sound worse, in some cases.
Digital audio is not flawed
Many of the analogies in this post are borrowed wholesale from the legendary DSP engineer Paul Frindle, whose “Oxford” plugins are widely regarded as some of the best-sounding in the business. And part of the secret is that he’s always understood that dither is a key requirement for great-sounding digital audio.
He summarised all this very clearly in a conversation I had with him on Facebook a while ago:
“The thing is that there is actually no difference between digital and analogue signals – all have a dynamic range set by the ratio between the max level and noise. The difference is that analogue comes with it’s own noise (caused by the reality of signal in the physical world) whereas any digital representation in math requires us to re-insert the physical random component the math does not provide us. [ie. dither – Ian]
It is a theoretical requirement of the system, it doesn’t mask the distortion – it removes it… ANY digital data representation of a signal in the real world has artificial certainty (which reality doesn’t) and it has to be removed for the signal to be harmonically accurate – i.e. like a signal in the real world… It’s a deep subject that shows our math is an artificial human approximation of reality – but the approximiation has too much certainty. Fascinating implications to that concept…”
The “stair-steps” you see in your DAW when you zoom up on a digital waveform only exist inside the computer. (The spoon only exists inside the Matrix !) When digital audio is played back in the Real World, the reconstruction filter doesn’t reproduce those stair-steps – and the audio becomes truly analogue again.
So if the recording, processing and playback systems are working correctly, you will hear a perfect representation of the original analogue audio – up to the frequency specified by the sample rate, and with a noise-floor determined by the bit-depth.
Therefore as Paul says, the concept of “resolution” is irrelevant in a correctly-engineered digital audio system.
There is no spoon.
Lets stop worrying about the numbers and get back to work on the music.
Soon after I wrote this post, Chris Montgomery from Xiph.org and author of this excellent blog post, made a video clearly demonstrating exactly the topics I’m discussion in this post – if you want to see the proof for yourself, I strongly recommend you watch this video:
For more info, click here.
And for more posts like this one, subscribe to the newsletter.