”More
Dynamic Range Day - Loudness War Protest

Production Advice

make your music sound great

There are no “stair steps” in digital audio ! What The Matrix can teach us about “resolution”

by Ian Shepherd


 
 
Remember this sequence from the Matrix ? “There is no spoon”.

Well recently I’m hearing people talk more and more about “resolution” in digital audio, and I’m here to tell you -

There is no resolution.

It’s a red herring – an idea-virus left over from the earliest days of digital audio, perpetuated by gear manufacturers to try and sell us more kit we don’t need. Here’s why.

It all starts with the myth:

“Digital can never sound as good as analogue”

This statement simply isn’t true, but it doesn’t stop people repeating it like some kind of mantra. The reasons they give usually hang on the fact that digital audio samples the audio – “freezing” it at regular moments in time – and claiming that it can therefore never sound as smooth and continuous as the original analogue signal.

You can see it for yourself, they say. Zoom in far enough on a digital waveform and eventually you can see the blocky, grainy, digital “stair-steps” – so it stands to reason that you can hear them, if your hearing and equipment is good enough, right ?

Wrong.

There are no stair-steps.

The same flawed reasoning is used to explain why we need ever-higher bit-depths and sample rates – since digital audio contains these audible “pixels”, the smaller they are, the better it will sound, supposedly.

Wrong again.

And it’s easy to see why.
 

Is digital audio inherently flawed ?

Hold your hand up in front of your face, close one eye and look at the world through your fingers.

You’re “sampling” what you can see.

In between your fingers you can see the world, and where your fingers block out the light, you can’t see the view.

This is sampling. Breaking the world up into chunks and using the chunks to record what we see – or in audio, breaking the waveform into samples and recording those to store what we can hear.

Afterwards, we re-assemble the chunks to play back the audio. But surely this method is fatally flawed ? Everything between the samples, everything blocked out by your fingers, is lost forever. “Quantising” the image into slices like this introduces a fundamental error.

The thinner your fingers, the finer and more accurate the picture of the world, sure – and the more frequent the samples (ie. the higher the sample rate) the more accurate the audio reproduction, the higher the resolution, right ?

But digital audio can never be perfect, because the sampling steps are always there ?

No.

This analogy misses out a fundamental part of the digital audio equation.
 

A hand-waving analogy

Look through your fingers again, at your sliced-up, sampled view of the world.

Now start to wave your fingers up and down slightly – move them faster and faster.

Everything starts to look a bit flickery, a bit less crisp and clear – but suddenly you can see all the details of the view.

Because your fingers are moving, you can see between them for some of the time, and this enables our eyes to build a complete view of the image behind. Not by guessing the missing pieces, as we have to when the hand is still, but actually by seeing all the information, over time.

We have added noise to the system – the hand-waving – but the noise reveals what would otherwise be hidden behind our fingers, in between the sampling steps – it actually removes the quantisation error.

(Actually strictly speaking it de-correlates the error from the input signal, changing it into noise instead of truncation distortion – but the main point is it sounds good.)

In digital audio, this noise is called dither.

And it solves the problem of quantisation distortion completely.

Sure, we’ve had to add a little noise, but it’s no-where near as crude as the analogy of looking through your waving fingers – that would be more like 8-bit audio. That amount of noise would be huge in comparison to what we’re trying to hear, but at 16-bits or more the noise is very quiet.

At 16 bits the dither noise is actually quieter than the original natural noise that was in the audio in the first place. At that point, the extra noise is virtually irrelevant. But crucially, there are never any stair-steps – we can always hear everything that was there to begin with, just with more or less noise.

Even in the crude, noisy, 8-bit “through the fingers” version we can still see the entire, smooth, original image – it’s just very noisy. The same applies to audio – take a listen for yourself:
 
 

 
 
In a properly implemented digital system, increasing the bit-depth doesn’t improve “resolution”, it just reduces noise.
 

Green-tinted spectacles ?

Look through you fingers again. Now imagine you’re wearing a bright green sun-shade.

You’re in the Matrix. Everything you can see is tinted green. Even though our finger-wobbling dither prevents there from being any “gaps” in the samples we can see, all the red and blue has been filtered out – we’ve put a high and low-pass filter on our vision, just like we do in digital audio.

Because (say the digital sceptics) we’re only recording a limited frequency range, right ? Audio doesn’t stop at 20 kHz, so why do we stop sampling it there ? Higher sample rates will give us a more accurate representation of the original signal, so it must sound better, right ?

Wrong.

Swap the green Matrix sun-shade for a pair of photochromic sunglasses – the kind that go dark in bright sun and clear in the shade.

Suddenly we have our colours back – removing the high and low-pass filters was a good thing, right ? The photochromic lenses adjust the “recording levels” of the light coming in to a comfortable level for our eyes, and we can see everything again.

Well, not quite.

Those filters weren’t actually removed at all – they’re just working at different frequencies.

All decent sunglasses always filter the incoming light, not just to reduce it to a comfortable brightness for us, but also to protect our eyes from harmful UV radiation which could otherwise damage our vision.

Even when they’re completely clear, the photochromic lenses are still filtering out all that very high-frequency light. We don’t notice though, because we can’t ever see it – with or without the high-frequency filtering sunglasses.

Digital audio does the same thing. The anti-aliasing filter (when recording) and the converter’s reconstruction filter (when playing back) remove all the unnecessary high-frequency information from the signal, allowing us to hear as much as we like of the original signal, depending on the sample rate.
 

We don’t need what we can’t hear.

A video camera that recorded ultraviolet light would certainly reproduce a more “accurate” version of the original view – but we still wouldn’t be able to see the extra information. All that extra UV light would do is give us a sun-tan while we watched it !

In the same way, recording additonal high-frequency content may give a closer representation of the original audio signal, but we still won’t be able to hear it. Remember, the random dither noise removes any imaginary restrictions of “resolution” in the signal – so all the sample rate does is extend the high-frequency response.

In a properly implemented digital system, increasing the sampling frequency above the limits of our hearing doesn’t improve “resolution”, it just increases bandwidth.

In fact, high sample rates may even make things sound worse, in some cases.
 

Digital audio is not flawed

Many of the analogies in this post are borrowed wholesale from the legendary DSP engineer Paul Frindle, whose “Oxford” plugins are widely regarded as some of the best-sounding in the business. And part of the secret is that he’s always understood that dither is a key requirement for great-sounding digital audio.

He summarised all this very clearly in a conversation I had with him on Facebook a while ago:

“The thing is that there is actually no difference between digital and analogue signals – all have a dynamic range set by the ratio between the max level and noise. The difference is that analogue comes with it’s own noise (caused by the reality of signal in the physical world) whereas any digital representation in math requires us to re-insert the physical random component the math does not provide us. [ie. dither - Ian]

It is a theoretical requirement of the system, it doesn’t mask the distortion – it removes it… ANY digital data representation of a signal in the real world has artificial certainty (which reality doesn’t) and it has to be removed for the signal to be harmonically accurate – i.e. like a signal in the real world… It’s a deep subject that shows our math is an artificial human approximation of reality – but the approximiation has too much certainty. Fascinating implications to that concept…”

The “stair-steps” you see in your DAW when you zoom up on a digital waveform only exist inside the computer. (The spoon only exists inside the Matrix !) When digital audio is played back in the Real World, the reconstruction filter doesn’t reproduce those stair-steps – and the audio becomes truly analogue again.

So if the recording, processing and playback systems are working correctly, you will hear a perfect representation of the original analogue audio – up to the frequency specified by the sample rate, and with a noise-floor determined by the bit-depth.

Therefore as Paul says, the concept of “resolution” is irrelevant in a correctly-engineered digital audio system.

There is no spoon.

Lets stop worrying about the numbers and get back to work on the music.
 

Update:

Soon after I wrote this post, Chris Montgomery from Xiph.org and author of this excellent blog post, made a video clearly demonstrating exactly the topics I’m discussion in this post – if you want to see the proof for yourself, I strongly recommend you watch this video:
 
 

 
 
For more info, click here.

And for more posts like this one, subscribe to the newsletter.
 
 

facebook comments:

26 Responses

  1. Pedro Blanco says:

    great post;)

  2. Kahlbert says:

    Amen to that. :-)

  3. DonB says:

    Excellent. Last line says it all.

    Dither is a difficult concept to grasp, I guess because it’s a mathematical way of dealing with noise, which is something we normally think of as unwanted. But as you explain, noise is part of the real world.

    The other dimension of digital audio, sample rate, is where there are other misconceptions. (I believe you posted on that recently, as well.) Thinking that finer slices are needed to reconstruct the original audio more and more accurately does not take into account the fact that audio is a combination (addition and subtraction) of sine waves of various frequencies. Those waves can be reconstructed precisely by good converters. No compromise. All that within the parameters of the sampling frequencies, of course. (sample rate = 2*frequency). With oversampling and anti-aliasing filters, there is no need for converters to sample above audible frequencies.

  4. Paula says:

    I love it. Fantastic explanation of a very foreign concept.

  5. scarr says:

    Excellent article and points. The dither video gave me a much better understanding/appreciation, and I basically majored in DSP.

    Too few people understand the mathematical realities of digital audio, and sometimes of human ears/perception, and buy into all sorts of ideas that are contradicted by the evidence. Alchemy still survives in the modern audio world!

  6. Andy M says:

    Great post – really clear and useful. So does that mean that I’m being silly when I work at 96Khz? Should I just stick to 44.1? It would be great if that were so as it would save a lot of bandwidth and I could use more plugin processing. I read an interview with Joe Chiccarelli in SOS where he talked about this. Quote: “…in 96K there’s more air, more clarity, more depth. more separation. For me, it’s much easier to mix songs in 96K because of this, especially with sessions that are 100 tracks or more. When you have that many tracks at 44.1 or 48, the mix tends to get all mushy and unclear. At 96K, it’s possible to retain a lot more separation.” Is he fooling himself? Am I?

    He goes on to draw an analogy to recording on tape at 15ips and 30ips, and also mentions that when his studio moved from Pro Tools 9 to Pro Tools 10, he noticed a dramatic difference. Could that simply be due to Pro Tools 10 using 32-bit instead of 24?

    I understand that using higher bit rates gives you a lower noise floor, and I understand also that we can’t hear the added frequencies that are gained at higher sample rates. But is it really true that there’s no audible difference between 96K sampling and 44.1K? Help – I’m confused.

  7. Ian Shepherd says:

    Hi Andy,

    I wouldn’t say you’re being silly – but personally I think a large part of that bandwidth is wasted. I’d be surprised if anyone can hear the difference between 48 and 96.

    There are lots of engineers who swear by 96hHz and higher – I can’t say for certain they’re wrong, because I haven’t heard the material they used to reach those conclusions. But I do know how astonishingly difficult it is to create a fair test – and how easy it is to decide ourselves. For example, take a look at this post:

    http://productionadvice.co.uk/can-you-trust-your-ears/

    I’m currently debating with a DSP engineer about the value of 32-bit float (PT10) over 24-bit fixed (PT9) and my personal opinion is that if he uses a lot of plugins, then yes it could well make a difference.

    To answer your final answer clearly – no, I think there IS an audible difference on many systems between 44.1 and 96 kHz – more to do with the practical design considerations than the theory, though. I think there is unlikely to be a real-world difference between 48 and 96, though.

    You will probably enjoy this video:

    http://www.wired.com/gadgetlab/2013/02/sound-smart-watch-this-excellent-primer-on-digital-audio/

    - in fact I’m going to add it to the end of the post.

    Also, check this out:

    http://people.xiph.org/~xiphmont/demo/neil-young.html

    Cheers,

    Ian

  8. Andy M says:

    Thanks, Ian. These posts and the links you provide have really helped me understand all this in a new way.

    I think a lot of ‘magical thinking’ goes on around audio, so it’s good to get a firm grasp of the science and understand why certain things just are so. Your account of the double-blind trial is fascinating. It reminded me of a book I think everyone involved in making recorded music should read – ‘Perfecting Sound Forever’ by Greg Milner. His account of his own experience of blind testing towards the end of the book is salutary – he finds his familiar moorings shifting out from under him in a way that sounds quite disconcerting. Can you trust your ears? Well, you can certainly train them. After that, I think it really is a matter of trying not to let yourself be fooled. And that is a lot harder than we often think.

  9. Andy M says:

    Oh dear. Just tried the intermodulation test files at http://people.xiph.org/~xiphmont/demo/neil-young.html.

    I put them into Cubase running at 96K/24-bit. Output is via a Fireface 800 running through a Mackie Big Knob into Genelec 8040 monitors.

    Result: epic fail. Every one of the test files produces highly audible artifacts. The worst is the 30&33 Hz file, which produces a piercing whistling tone. But they all sound – to use a technical term – horrible.

    *Smacks head repeatedly on desk – thought of starting current project again too awful to contemplate – consoles self with thought that if the music sounds all right, it probably is all right.*

  10. Ian Shepherd says:

    Andy,

    Bear in mind you need to test where the problem occurs – could be the converter, the Mackie or the Genelecs… or any combination !

    Ian

  11. dum looking says:

    hand waving analogies? how bout this one: watch a basketball game, at the game(not on TV). Try to watch how all players are moving and positioning themselves. Follow the ball. Now do the same thing waving your hand in front of your eyes. Can you really say that you won’t miss more of what’s going on that you would have without that hand waving? My point is of course that while things in the world are not moving, or are moving slowly, your analogy works in favor of your main argument. Otherwise it works against it.

  12. Ian Shepherd says:

    My analogy still stands.

    As I said, the “fingers” analogy is similar to 8-bit digital audio – ie. very noisy. Even at the game you’ll see all the action, it will just be hard work because of the hand-waving noise. If your fingers were very narrow and vibrated very quickly, you’d have no problem seeing everything.

    But also, it’s an analogy, not a proof :-)

  13. Colin Young says:

    Apologies, I left this in the ‘comments’ as well, I think I shouild have gone here first? Facebook noob…

    Good article but it doesn’t tell the whole story, in my opinion.

    I am a keen devotee of Hi-Res and enjoy it immensely, yet I struggle to convey any of this to people who will insist that CD is perfectly fine and ‘all you need’.

    This article, and many hundreds like it, suppose that you need only set your sample rate and bit depth to cover the bases on two elements: amplitude and frequency response. It’s almost written in stone as the ‘Nyquist theory’. That is the end of the argument for most people.

    In order to understand what I’m going to set out below, you need to take a fundamental leap of faith and begin thinking the previously unthinkable: that the Nyquist theory isn’t everything. Yikes. But if you don’t accept what I say, you can go back to the theory as you prefer, I won’t mind. Honest :-)

    There is a third element to audio: Timing.
    It is what allows us to hear WHEN something happens, not how loud it is or what frequency it is. In this way what we think of as ‘resolution’ can now be described as a three-point plot with a new and different name ‘focus’. Amplitude + frequency response + timing = focus.
    It’s this ‘focus’ which vinyl and Hi-Res fans are trying to describe, and in most cases failing, when they talk about their favourite subject, because they’re not even aware of it themselves, such is the power of the prevailing opinion otherwise. The power of Nyquist I shall say.

    Consider this aspect: If two separate but otherwise identical sonic events occur within the time of less than the sample rate, the sample rate cannot convey the timing. (Strictly speaking it would convey some of it but with blurred boundaries).
    This wouldn’t be important but for the fact that ours ears/brain CAN distinguish the timing of events down to timing levels well under the rate of most digital audio (44Khz). it’s what helps us to locate things in the real world in 3 dimensional space.
    It is this ‘invisible’ element of audio that people tend to overlook and fail to measure.

    Crucial to the argument of analogue vs digital: It comes built in to analogue audio but must be deliberately allowed for in digital.

    It gets a little more tricky to explain here… It is very difficult to measure such a phenomenon in the real world since the very existence of ‘focus’ and thus its ‘quantity’ is governed by the ear/brains ability to distinguish timing. A very hard thing to measure but it has been done. There is a very impressive article on the subject here: http://phys.org/news/2013-02-human-fourier-uncertainty-principle.html.
    Our ability to do this is incredibly sensitive and has been measured in some individuals down to a timing variation of as small as 1/120,000th of a second. Although that is very unusual, the average ability is easily less than the timing of two samples on a CD.
    It is nothing less than an entirely fundamental component of what we hear and it is being ignored routinely by engineers who should really know better.

    Hey, best of all, it doesn’t diminish with our age and the associated diminishing ability to hear frequencies… in fact it improves with age and experience :-) Until you go truly deaf that is :-(

    There is another complicating but vital ingredient to our ability do these things: It is not a ‘given’. It is not a natural ‘ability’ but must be learned. When you pop out of the womb, you cannot ‘hear’ anything. You must learn what it all means. Same as seeing. In this way you must learn what your capabilities are with regard to what the limits of your ‘focus’ can be.
    More often than not, this is a naturally occuring process that requires no ‘thinking’ but it can be greatly helped by actively making yourself do it. Not always, for some it’s impossible but for others it’s obvious. That’s about as snobby as I’ll get on the issue, this isn’t a question of snobbery, it’s a perfectly normal human activity that anyone can try.

    The effects and the existence of ‘focus’ are always all around us but we don’t ‘think’ about usually so in order to grasp the nettle you need to stop listening and merely experience, then dissect the experience without doing any damage to the reality, a tricky task! Yet once you’ve done it, you can ‘get back there’ much easier next time.

    I have a favourite example of ‘focus’ in action: if you find the ‘sweet spot’ between two speakers that have a narrow stereo image field (in other words that require accurate positioning of the listener), then, you deliberately muddy the sweet spot by moving your head a foot or so along the plane of the two speakers, then do the math, you’ll find that the difference in the time it took for each of the peaks from both speakers to reach your ears falls well outside the resolution of 44/16. It falls somewhere about half-way between any 2 samples! This example is crude of course and isn’t entirely descriptive of the problem but goes to show how sensitive our ears are to the phenomenon of timing. You are of course changing the amplitude, lowering one speaker’s as you move away, increasing the amplitude as you get nearer to, the other, but the ‘smear’ that you hear, the lack of ‘focus’ is caused by the timing issue NOT the amplitude. At which point I need to ention that if you don’t hear a ‘smear’ then your ears/brain are not trained enough :P

    So what about the numbers? Well, it’s a tough one and I don’t have accurate answers for you (further research is necessary I think!) but if you take a very basic approach to it along the lines of setting your sample rate and bit depth to allow for the Nyquist theroy + a good dose of extra samples to cover timing accuracy for ‘focus’ then you will be approaching something like twice the rate of CD, minimum. So: 96/24, I would argue, should be about right to cover most instances where timing is crucial. In a sine wave, it is not. In a recording of a riveted ride cymbal, it might be the difference that allows you to differentiate between two of the rivets. An impossible mental task, granted, but one that your ear/brain is perfectly capable of on a subliminal level and the effects of which, amplified by a whole musical ‘picture’ can lead to a whole new way of enjoying pre-recorded music.

    I should add that dither has no real effect on this issue being as it is, concerned with effects which exist no matter what the sample rate.

    I hope that helps… but I know it’s an argument that usually fails when confronted with the apparent ‘finality’ of the Nyquist theory. So, get over it!

  14. Ian Shepherd says:

    Hi Colin,

    Well – you said it was OK to disagree, so:

    Watch the video at the end of the post at 20’52″ – it debunks the idea that digital systems can’t resolve events that fall between two samples – they absolutely CAN.

    As far as the articles link you posted, it seems fascinating but it’s all about our perception of sound, not the way we store it (digital or analogue). I don’t really see how it connects with the “timing” you’re discussing – in fact I’m pretty sure it doesn’t.

    The fact that we can hear better than simple models of hearing suggests doesn’t surprise me (although it’s very interesting) but it has nothing to do with the basic physics that allow us to describe a recorded audio signal.

    In other words, our ears are capable of incredible feats when listening to audio of all kinds, and we don’t yet understand how they do it. But this has nothing to do with the way we attempt to record it. Our understanding of THAT is very complete, even though the results still fall far short of reality, in either analogue or digital…

    Another question – how were the experiments you mention conducted ? Do you have any more links you can share ?

  15. Jon Hargreaves says:

    Ian,
    As I think you know, I come to this knowing very little… I now know a lot more and found all of this fascinating, it has certainly helped me straightening out a few misconceptions I had re teaching limited parts of this at A level Physics standard.
    However, I especially found Colin’s comments about “timing” interesting. What I don’t get is how you can say “our ears are capable of incredible feats when listening to audio of all kinds, and we don’t yet understand how they do it. But this has nothing to do with the way we attempt to record it. Our understanding of THAT is very complete” and yet you follow this statement up with: “the results still fall far short of reality” – they certainly do, so why so dismissive that this maybe one of a number of other points to consider to improve recording outcomes?
    Just out of interest can you point me to any articles that explain why there is such a discrepancy and why it would appear we’ll never get there?

  16. Ian Shepherd says:

    Hi Jon,

    Great question !

    Firstly, high sample rates don’t give an improvement in “timing” over analogue – this is just another misconception, as Monty’s video shows.

    But more importantly, the major limitation on our ability to record sound is the microphone, and the major limitation on reproducing it is the speaker. Our ears “are capable of incredible feats”, but our recording techniques aren’t. A binaural head is pretty good, but still not as good as being in the room with live musicians. NOTHING comes close to that, imo.

    The best we have is two (or maybe 6) channels channels of electronic signal going down some wires. It’s irrelevant whether we store that digitally, on tape on vinyl or wax cylinder – there’s no magic that can make it sound better than the original signal, which is why we fall so far short.

    We’re very good at storing that signal, and understand it really well – but the original signal just doesn’t come close to what our ears can do, just as even HD video doesn’t come close to what our eyes see in the real world.

    Of course that hasn’t stopped some outstanding recordings being made over the years, (otherwise why would I bother !) and actually I’m not sure I agree we’ll never get there – who knows what new recording techniques are available over the horizon.

    But for now, the whole analogue/digital thing is a complete red herring.

    Does that make sense ?

    Ian

  17. Colin Young says:

    Hi all,

    First of all let me apologize for the length of this… I’m trying to get a simple concept across and failing to do so concisely. Ah well.

    Jon, Ian’s not dismissing me. What he’s saying is correct and irrefutably so. I still feel there is another element to it all that isn’t being explored and it’s that which concerns me. I’m not arguing with anything that is described above (that’s if I understand it!). Monty’s video is an impressive and very clear example of how to get the basics of digital signal processing across. It couldn’t be better made. I mean that sincerely. Better shirt maybe… but it tells the story as it’s widely accepted to be true.
    So why is it still not enough for me? I admit… I’m struggling with a way to put my ideas about timing across in as clear a fashion, though it’s clear ‘in my head’. I’m seeing a problem and, though the fruit is dangling just out of reach, it may still be an illusion caused by my own inability to grasp the concepts. I’m trying, honest.
    I’ve gone over the video several times and paid very close attention to the part which Ian mentions, where Monty shows that the timing of any one event can be very accurately reproduced in time, between samples.
    My problem is that I don’t see how this adequately explains what happens when these events pile up and become congested ‘in time’. I see this as the moment where a 44.1/16 digital signal becomes unable to resolve different events in TIME as well as our ears and brain can do.
    This brings me to mention that I want to stress the properties of sound, as opposed to the ability of a system to represent it, for now. Sound is composed of the elements we all know and love; frequency and amplitude. Frequency is broadly speaking a function of the relationship of events, in time. Amplitude. err, speaks for itself. But I keep seeing a big shiny THIRD element in all this and it’s the time at which all these things occur. To get back to frequencies, briefly, you can use the overall frequency to describe a note (and its harmonics) but how do you describe the shape of the decay of a peak? The shape of the peak? All these things combine to form ‘texture’. They can’t just be meaningless subjective terms, they must have a real world technical explanation so where does it lie? I say it lies in the relationships of all the elements, including time.

    The explanation of dither (excellent) above is as good a place as any to place the inconvenient truth as I see it. If dither is just random noise, it does still have a value at any given location along the sample chain. I contend that our ears are, if never going to be able to hear the events caused by dither (no frame of reference), at least capable of resolving real-world sounds, in time as well as in frequency of and amplitude of, to the degree necessary to create a problem with 44/16.

    But if the timing of an event can be anywhere between two samples, where’s the problem you ask? Shouldn’t the system be able to represent any event or combination of events, by overlapping the ability of the signals, between any two samples? Well, I just don’t see that argument holds water. Putting it terribly bluntly if you have a 2hz 2 bit digital signal then your ability to distinguish timing of events is going to be pretty irrelevant due to quantization errors and/or the necessary dither which would be a torrent of meaningless noise and not much else. You ‘cure’ this by throwing more numbers into the mix until the problem dissipates enough so as not to be noticeable. The thinking goes that if you throw enough numbers in to satisfy the Nyquist theorem, then you’ve covered the bases of all possible sounds, however I contend that, if you allow for the time domain, this process has no ceiling and that throwing even more numbers into the mix will continue to further resolve the actual time of sounds all the way to infinity (but not beyond). Not resolving ‘what’ but ‘when’.

    As more than adequately explained by Monty et al., dither is a way of masking quantization errors. It doesn’t deal with timing errors, actually it can’t even be used to describe timing errors. There is no conscious OR subconscious way to detect the ‘problem’ if it’s not there, right?
    The way I see it, whereas dither masks quantization errors, fewer samples have the effect of masking timing resolution errors. It doesn’t sound ‘wrong’ so much as it doesn’t sound all that can be right. You following my logic? What I’m seeing is that this is a truism that follows an infinite curve all the way up to an infinite number of samples and bit depths. Can we detect any of it? Obviously if we can’t then Monty’s right, Ian’s right and all that extra info is superfluous. But what if we can detect it?
    If there is a point at which the numbers really do become superfluous, what is that limit? I don’t know. My hunch is that 96/24 is adequate for most hearing systems (humans) and that 192/24 really is getting towards overkill but hunches are no proof. I do feel quite strongly that 44.1/16 bit is not enough to get as close as you need to be approaching full satisfaction. Feelings, again, are no proof.

    An example I like to play to my friends is the Hi-Res edition of Getz / Gilberto with ‘Girl From Ipanema’. I’ve had a CD of this for years and it sounds, you know, nice. But when I loaded the Hi-Res version the first thing I noticed, hard, was the incredibly life-like holography to the presentation. I could map out a mental picture of the studio (or something that manifested very much like it). It was really quite scary by comparison and it bears close comparison. To make sure I wasn’t just fooled by this being a different mastering, I downsampled the HR with dither, and listened to the same material on a CD instead. Holography gone. Try as I could, I couldn’t get that ‘space’ back. Put the HR back on and poink! There it was. Listener bias? I really don’t know if I can accept that. But it’s possible.
    Remember this was an analogue recording made nearly 50 years ago!!

    Ah yes, what of analogue systems’ lower dynamic range and frequency response? Well, a FR of 1 is required by nature to live in the same time dimension as a FR of 192k. A well presented analogue system should resolve timings of events effortlessly so FR is not where to look in analogue. The ear’s sensitivity to amplitude changes is much more important and has a closer link to timing, being as it concerned with ‘change’ rather than ‘state’ so actual ‘bit-depth’ or the analogue equivalent isn’t it either. It’s not about how much something can change, it’s about the change itself. Plenty of reading and personal experience has shown me that identifying a sound is chiefly done by analysing frequencies (what is it, what does it sound like?). Locating a sound is something else entirely and is a detection that is based on both amplitude and direction including spatial cues that include the time domain. Add memory to the mix and I say that this gives us the ability to listen to music in several different ways, or modes. I say that the sensitivity of our hearing systems to the time domain gives us an ability to get something of real value from a Hi-Res system, BUT that it doesn’t come in the standard recognized package. It’s something else, something other..

    I don’t have a great deal of extra reading for you although I found this article about spatial awareness that takes you through the concepts of sound as an environment:
    http://soundenvironments.files.wordpress.com/2011/10/experiencing-_aural_architecture.pdf

    and there’s the research done on how the human ear can outperform the limits set by the Fourier uncertainty principle with regards to resolving timing of events.
    http://phys.org/news/2013-02-human-fourier-uncertainty-principle.html

    For bibliophiles there’s also Oliver Sacks’ excellent ‘Musicophilia – Tales of Music And The Brain’ which is required reading in my household.

    These sources all have a common thread which is that our brains can alter what our ears are capable of, in addition to the natural taken-for-granted abilities we are supposedly born with. I think that’s also a very important element that needs further consideration because it directly affects the argument with regard to our ability to make use of the extra information in Hi-Res. For me, that’s where the argument lives. Can we or can we not use it?

    I’m one of those guys that can walk into a room and tell within seconds if a stereo speaker system is out of phase. It’s like hearing a single piano note, to me. Other people, and in my own experience sometimes even professional sound engineers (!) do not necessarily have this ability, though they may have others I do not. Where did I get it? I have no idea. Was I born with it? I very much doubt that. Is it down to experience? Hm.

    Can we be trained to do these things? I believe so. Does it matter? If you accept that there are powers in our hearing system that are beyond what we have come to see as the norm, and IF that includes the ability to resolve sounds in ways that a 44.1khz /16 bit system cannot represent, then of course it matters.
    Those are big ‘ifs’, I know… and IF I’m just simply wrong, then that’s it. I’m not trying to bash the door down however I hope I’ve got across the concept of what’s bothering me, if nothing else.

    Colin

  18. Ian Shepherd says:

    Colin,

    No offense but this is basic physics. EVERYTHING you’re wondering about is resolved in the electrical waveform. “Timing”, phase, everything. That may seem unlikely to you, but it’s true. The “timing” of events that you’re wondering about seems to be to do with transients – and these are entirely determined by the waveform.

    There are legitimate questions about what we can and can’t hear, and how we may do it – but a digital system simply can reproduce anything a microphone can generate, to any accuracy we like – limited only by frequency response and noise floor.

    QED…

    Ian

  19. Colin Young says:

    Ian,

    No offence taken (or ever intended) I’m a bad student but for what I’ve learned recently I need to thank you (and Monty + al).
    I now get it. The ‘basic physics’. I’m quite convinced that a digital system can resolve an event that ‘fell between two samples’.
    “…to any accuracy we like – limited only by frequency response and noise floor.”
    Lord help me, this new knowledge isn’t making my life any easier.

    OK, Frequency – check. .. but it’s that noise floor part that still irks me and isn’t it in that tiny area of accuracy where improvements ought to be available at higher resolutions? When events do pile up, does not noise become more of an issue than it might first appear? What is harmless in a simple sine wave might become more destructive in a more complex setting? An anology with digital photography is inescapable here: Namely that noise is often used as a way to obfuscate more glaring errors elsewhere in the make-up of the image. Same with sound? I give you dither. The technical reasons for dither are obvious. What may be less obvious is the overall effect of it?

    So you can’t hear the noise below, say -60dB on replay anyway. So a Dynamic Range of anything over that in a digital system combined with dither should be all you need, right? Again I can’t find a logical way to disagree with that statement but I can’t hear the benefits.

    It just won’t let me go (or is it I won’t let it go?. I still prefer Hi-Res. I often find differences between 44/16 and 96/24.
    I admit I might just be driving myself crazy with listener bias or some other as yet unknown mechanism but the damn stuff won’t stop sounding better at HR. To me.

    Could it be, not the noise itself but the effects of the processing (dither) that you’re hearing in comparison? After all, what is left after your technical arguments have been proved? What IS the biggest difference between a well made 44/16 and a well made 96/24? I’m going to suggest that it’s the dither itself. In 44/16 the percentage of deviation from signal that is required to dither is measurably bigger than at 96/24. You’ll no doubt say I can’t hear this. Is that true?

    As you know I do restorations from vinyl recordings. I chose very early on to work with 96/24 despite the fact that virtually no signal content exists above 20K for pretty much all vinyl, because I noticed early on, things disappearing at the lower rate of 44/16. This still happens.
    I just finished a restoration on a Robert Fripp 7 inch… if you’re familiar with his material known as ‘Frippertronics’ which consists mainly of highly fuzzed out and sharply distorted guitar tracks, overlayed and tape-looped? The harmonics and interplay between some layers of guitar, where the notes are similar yet not exactly the same, are very pleasing to the ear at 96/24. Downsample to 44/16 (on equipment designed for the task) and listening back again on the very same equipment and… it’s gone. In fact, bizarrely, the effect also seems to be that you can hear the individual notes ‘better’ at 44/16 because the harmonic interplay is no longer there to listen to. WHAT is happening?? What changed?

    I’d like to be able to rule the equipment out, if for no other reason than I’ve tried just about all the variations of equipment out there, and the same effects (dis)appear time and again.

    I now wish I was simply making this stuff up because the search for the reason still eludes me. Is it still eluding others? Are we even asking the right questions? Your technical arguments are completely unassailable. So why do I still sense a positive difference at HR? Am I hearing stuff others don’t?? I’m certainly not trying to big up myself, but I can’t shake the feeling that there is stuff going on that I am worrying about when others are not.

    I may be driving myself crazy, but I don’t wish to drive anyone else crazy! I’d love your thoughts on this Ian,…

    Colin

  20. David Deckert says:

    Ian, I just found this topic of yours from last year.

    I appreciate what you’ve done for music lovers everywhere in highlighting some faction’s misguided fascination with killing dynamic range.

    But your article here is sadly irrelevant due to your own disclaimer, which is use of the word “if.” If not for that, I’d agree 100%!

    If only things worked as the theory tells us, this-and-that wouldn’t be needed. I empathize–no, I sympathize. The high end audio market is telling us we need to buy expensive new copies that hog our hard drives. BUMMER. Trust me–as a consumer I share that lament.

    But here’s why I buy that stuff: there’s a bunch of slop in the real world and sometimes, cash masks a few errors in otherwise idealized specs.

    Professionals work in a 24-bit space because in the real world, the noise floor of 16-bit files (while looking pretty darn good on paper, right?) causes problems. Amateurs like me buy various 24-bit files because they sound better, likely due to our DACS not always delivering (here’s that word again …) all the resolution you think it does.

    Seriously now, what is resolution if not absence of noise? Are we arguing semantics? That said, I like your dither example with the fingers.

    So … the stair steps. That was reasonably convincing actually, and the first time I’d read it that way. I won’t spend too much time arguing for higher sample rates here except to mention that “hearing” isn’t literally all about the frequency response your audiologist reports you’re capable of detecting. Our bodies have senses not always measured as experienced and yes that includes sound.

    Again … wiggle room for the “slop” we don’t yet know how to define. Bottom line: If it can or sometimes does sound better, PLEASE let that be sufficient reason, OK? More to the point, be an advocate for things you don’t necessarily understand or agree with, “just in case”. A cushion. The slop. Whatever label lets you sleep at night.

    Example: Years ago music was routinely engineered and sold in a technically flawed but nevertheless hifi format–the LP–most listeners could never fully appreciate, and until decades later most still haven’t. The proof is to take a decades-old LP and place it on a modern, decent system and the gem is revealed. Overbuild just a touch. History and experience tells us it’s good for the future. Staring at specs and theory got us “Perfect Sound Forever” and a realization a few years later that was A Mistake.

    As for Monty. Sorry, but he’s no hero in this. He speaks neither for professionals nor for enthusiasts. He thinks his Ogg Vorbis not only sounds better than other lossy files, and that it’s worth missing the forest for the trees while positing lossless files unnecessary, especially the kind you work with as pro and that we listeners can sometimes appreciate. It’s all there in the link you published.

  21. Nick Murphy says:

    Colin,

    Interesting discussion. You’re probably struggling because the Nyquist Theorem isn’t strictly logical. Monty’s video really is a great demonstration of everything involved. Many audiophiles I know seem to rack their brains trying to come up with theories on what we “aren’t measuring.” There is so much mass delusion in the audiophile industry, it’s quite shocking to find out that properly-done 16/44.1 really can be “perfect,” just like the theorem says. It was quite a shock for me.

    Have you done any blind tests? If not, I strongly suggest you do so. It may relieve quite a bit of your frustration. Take one or your high-res files and convert it to 16/44.1 with good software such as Adbobe Audition, Foobar2000, etc. Then convert it BACK to the original sample rate so you can be sure your software/DAC will play them back the same. (This is key because you want to make sure that the *only* difference you are hearing is the effect of reduction in bit depth and sample rate.) I would recommend using the Foobar2000 ABX plugin for simplicity. Take as much time as you want.

    If you can correctly identify the original hi-res file 95% of the time, that’s quite impressive. I suspect that if the conversion is done properly, you will not be able to. Google the “Meyer/Moran” test which was very well-done. The people in that test could no longer hear the “night and day” differences once they didn’t know in advance which resolution they were listening to. Regardless of *if* hi-res formats may make any audible difference, tests like those demonstrate that the effects of expectation bias are infinitely stronger. ;-)

    The great thing with the scientific method is that we don’t need to go looking for the explanation until the experience is verified and replicated by others. Meaning, until controlled testing shows that there is an audible difference with hi-res audio (it doesn’t yet), it’s a fool’s errand to aware time looking for why hi-res *might* sound better, while at the same time ignoring the much more likely explanations for anecdotal experiences. I know it was quite a relief for me to realize I did not need pay “by the bit” for higher resolution recordings and store those large files!

    Nick

  22. Graham says:

    Unfortunately, just found this “http://www.sony.co.nz/microsite/hiresaudio/” about Sony’s new hi res audio converters, showing the claimed coarse stair step wave form that CD gives you, on an official Sony site! for shame! I have personally seen 20kHz sine waves on my late 70′s analogue oscilloscope that look like perfectly reconstructed sine waves that I was generating on my PC through the motherboards built in audio chip at 16/44.1. easy experiment to do yourself if proof is needed.

  23. Nick Muprhy says:

    Graham,

    That’s terrible! Sony definitely knows better. They wouldn’t have been able to develop *any* digital technology without understanding how digital signals work. Unfortunately they seem to confirm that the only way to sell “hi-res” audio equipment is to pull out the old tired stair step graphs.

    Of course, they neglect to show a graph of the reconstructed signal, which would be identical whether it’s 16/44 or 32/384!

    Agree with you regarding the oscilloscope; that’s why I think the xiph.org video is so cool!

  24. Ryan says:

    A big part of the reason that audio at 48 or 96 vs. 44.1 sounds different/better has to do with the way filters work. This issue is almost always neglected in these discussions, but is essential to understanding why someone would want to listen to music in 48 rather than 44–it actually has nothing or very little to do with those frequencies above the threshold of normal human hearing that are captured in 48. Ian makes some great videos–I’d love to see him explain this issue.

  25. Ryan says:

    As an addition to the above comment, I can always point out 48 vs. 44 in blind testing (100% of the time), with a preference for 48. Again, because of the way the filter handles the signal, which changes the sound. I can never point out 16-bit vs. 24-bit in blind testing. If you look at the wav forms closely of 44,48,96 they are distinguishable from each other. The wav forms of 16-bit and 24-bit are identical except for in extremely quiet passages, where you can see a very slight difference due, presumably, to the dither.

  26. Ian Shepherd says:

    Ryan, I agree about 48 versus 44. I’m not convinced about 96 versus 48 though, personally…

Leave a Reply

Ian Shepherd


BBC Radio 4 Interview

Please install Flash plugin

Ian Shepherd from Production Advice discusses the Loudness Wars

Connect