You’ve probably heard by now that Spotify recently announced it will soon be possible for anyone to upload directly to their streaming service, without going through an agregator like TuneCore or CD Baby.
What you may not have heard yet is that along with that, they’ve also published recommendations for the best format and specifications for your music when you do.
These include some interesting details (like the fact that they support 24-bit files) and confirm several things we already knew – that they’re using ReplayGain for loudness normalization, and that the default playback reference level is approximately -14 LUFS, for example.
There’s one suggestion that may raise a few eyebrows though, and that’s the recommendation that files should peak no higher than -2 dBTP (True Peak) – thanks to Christopher Carvalo for the heads-up.
[Update – Since I posted this last week, Spotify have updated their FAQ to clarify that the -2 dBTP recommendation only applies to material mastered louder than -14 LUFS. If your material measures -14 LUFS or lower, the True Peak recommendation is -1 dBTP]
So why is this important ?
Mainly because it’s a much more conservative number than many people would expect. I’ve been mastering with peak levels no higher than -1 dBTP for years now, and recommending people do the same, but I still see people saying that True Peaks aren’t an issue “in the real world”. And Spotify’s guideline is even more conservative than mine.
The reason for the recommendation is simple – Spotify doesn’t stream lossless audio. They encode using Ogg/Vorbis and AAC data-compression methods to reduce bandwidth – like more sophisticated versions of mp3 encoding. These encoded streams sound pretty good, but reduce the file size by as much as ten times to reduce the amount of data needed get the audio from Spotify’s servers to our mobile phones and other playback devices.
There’s no such thing as a free lunch, though – to achieve this reduction in data-rate, the audio has to be heavily processed when it’s encoded.
What happens during encoding
VERY roughly speaking, the audio is split up into many different frequency bands. The encoder analyses these and prioritises the ones that contribute most to the way we percieve the sound, and throws away the ones we’re least likely to hear.
When the audio is decoded for playback later, the signal is rebuilt, and usually sounds remarkably similar to the original, despite all the discarded data. However even though it sounds pretty close to the original, the audio waveform has typically changed dramatically – and one of the most noticeable differences is that the peak level will have increased.
And this is where the problem arises. If the audio was already peaking near 0 dBFS, the reconstructed waveform will almost certainly contain peaks that are above zero. And that means that the encoded file could cause clipping distortion when it’s reduced to a fixed bit-depth for playback, which wasn’t present in the original.
In fact, it’s even worse than that, sometimes. Encoded files store the data with “scale factor information” built in (kind of like a coarse floating point), but many players reduce the decoded files to fixed-point immediately after decoding. So whereas extra decoding peaks aren’t an issue if the signal is turned down before it gets played back, clipping during the decoding process will be “baked in” to the decoded audio in this case, regardless of normalization or the final playback level.
(If you’re asking why the encoder doesn’t detect when this might happen and reduce the level automatically – great question ! And actually some do. But the answer for Spotify is almost certainly that users would complain. The simplest way to test an encoded file is to compare it directly to the original, and if the result is quieter than the super-loud result people have worked so hard to achieve, many users would be unhappy, even if the encode is cleaner as a result.)
What does all this have to do with True Peaks ?
There’s no way to know for sure if encoding will cause clipping, or how much – it depends heavily on the codec, the material and the data-rate, to begin with. Lower data rates require heavier processing, and cause bigger changes in peak level, and can potentially cause more encoder clipping.
The True Peak level gives a useful warning, though. It was introduced as part of the R128 Loudness Unit specification, and gives a reasonable indication of when encoder clipping is likely to occur. Really loud modern masters can easily register True Peaks levels of +1 or +2 dBTP, and often as much as +3 or +4 !
Those files are virtually guaranteed to cause encoder clipping if they’re processed as-is, so to avoid the risk of encoder clipping, it’s sensible to reduce the level of those files before you supply them, to get the best quality encodes.
The question is, how much should they be reduced ?
It’s impossible to say exactly without trying it. The harder the audio is hitting the limiter, and the lower the data rate, the bigger the changes in peak level during encoding and decoding will be, and the more likelihood of problems as a result, so there’s no one-size fits all solution.
Personally I don’t make super-loud masters, and have found that my suggestion of -1 dBTP typically produces very clean encodes, but we have to assume that Spotify’s recommendation is based on analysis of the files they encode. I’ve double checked some of my own recent masters, and found that using my own loudness guidelines I’m getting clean encodes, so I won’t be changing how I work because of this recommendation.
[Update – As I mentioned above, Spotify have updated their FAQ to confirm this – the -2 dBTP recommendation only applies to material mastered louder than -14 LUFS]
But certainly if you’re making mixes or masters that are hitting close to 0 dBFS, you should be thinking of starting to measure True Peaks and reduce the levels to avoid them, at the very least.
But the music is MEANT to be loud, why should we turn it down ?!
Well firstly because the encodes could sound better if you do. But also because it’s going to be turned down eventually, anyway ! Spotify uses loudness normalization by default, just like YouTube, TIDAL and Pandora. This means they measure the loudness of all the material they stream, and turn the loudest stuff down. This is done to stop users being “blasted” by unexpected changes in level, which is a major source of complaints. And even if users turn normalization off, they’re unlikely to run the software with the volume at maximum !
So even if you’re in love with the super-dense sound of your music, reducing the overall level when you submit it won’t have any practical consequences for the final playback level – it can only sound better because of a cleaner encode.
What about -14 LUFS ?
I’ve had a few people asking about the fact that Spotify’s normalization reference level is approximately -14 LUFS, and if this -2 dB True Peak recommendation over-rules or replaces it.
The answer is No – these are two separate issues. The -14 LUFS figure simply gives us an idea of how loud Spotify will try and play songs in shuffle mode – it’s never been a “target” or a recommendation. This is a common source of confusion, and I wrote about it in more detail here.
The -2 dBTP recommendation is to try and ensure better encoding quality for material that was mastered very loud originally – peak levels aren’t a good way to judge loudness. So to get the best results you should keep both numbers in mind.
I’ve said it before, and I’ll say it again – the loudness normalization reference levels aren’t meant to be targets. Instead, master your music so it sounds great to you, and preview it using the free Loudness Penalty site to see how it will sound when normalized.
But you should also be aware that very high peak levels can cause sub-standard encodes when the files are converted for streaming. And if you’re like me, you’ll want to do everything you can to get the best possible results – including keeping an eye on the True Peaks.
Update – and a warning
I’m seeing a lot of different reactions to the information in this post. They vary from “yes I’ve been saying this for ages”, through annoyance that there’s yet another number to think about, all the way to “ah I don’t care, I’ll just turn the limiter output down a little”.
Be very careful about this last option.
The harder you push the loudness into a limiter, the higher the True Peak level will go. And the higher the True Peak levels are, the greater the risk of encoder clipping. So you’re fighting a losing battle. Remember True Peak doesn’t necessarily predict how much clipping will take place, so if you try to upload at the same loudness and just reduce the True Peaks, you could end up with just as many issues with the encode.
Wavelab, Ozone, Sonnox and others offer “codec preview” features which allow you to assess the results of encoding – if you’re chasing extreme loudness then you need to use methods like these to check the results you’re getting.
And as always personally I think the best answer is a perfect balance between the different factors – between loudness and dynamics, and now between loudness and True Peak values.
If you want to know the method I use myself to find the perfect loudness when I’m mastering, and why it works – click here.
[Edit – the original version of this post stated that some encoders can “bake in” clipping, which was misleading. A correctly-implemented encoder won’t do this, and I’ve updated the post to reflect that. However not all encoders are guaranteed to be well-written (!) and many decoders end up reducing the decoded file to fixed bit-depth anyway which does cause this problem. So avoiding high peak levels before encoding is definitely a good idea !]