Reading that back, I realise that the first comment doesn't actually mean much unless you understand a little about MP3.
MP3 is the common name for MPEG audio layer 3. Strictly the MPEG audio it refers to comes from the MPEG 1 and 2 specifications, together with the later 'MPEG 2.5' additions for lower frequencies. MPEG audio consists of three 'layers' of compressing the data. In general these aren't layers of processing but of understanding of the Audio data itself - the processing involved at each layer can be considerably different.
Originally Layer 1 was intended for most purposes. It has a low processing overhead with a moderate amount of compressing resulting in reasonably siZed files. Layer 2 was meant for more demanding audio situations, where processing could be performed more quickly and the compression had to be better to provide improved quality. Layer 3 was again more processing intensive for situations where compression for the same quality was paramount - although invariably the result was that the same siZe of output was required, but higher quality, rather than retaining the quality level and reducing the siZe.
And so we come to what MPEG audio include. Layer 1 is simplest. It understands that there are different frequencies which are heard more by the human audio reception system (ears).
Layer 2 is more complicated, taking into account the temporal effects that are present within the sounds. This means that the sound that is heard isn't necessary that which was being produced at that instant. A high energy sound (loud drum, for example) will hide the sound of a voice - that's obvious from your own experience, I hope. What may not be obvious is that the drum actually hides other sounds after it has stopped. I don't know the reasons for it - I assume that the sound is still being processed and continues to hide the other sounds even when it's stopped. You can find various papers on this sort of thing - that's why it's a clever format; clever people worked on it.
Layer 3 adds more to this, allowing for temporal effects which span into the past. It seems that the sound of the drum not only hides some frequencies that follow it but also some frequencies that occur in those moments before the drum starts to sound. I guess it's the delay between hearing and processing the sound in the brain that gets shortcut by this louder sound, making the brain just give up on the earlier bits and start on the big loud sound that's happening. All this is happening on fractions of a second so you really don't care as you're hearing it, but the fact that it is happening means that the compressor can take advantage of it. Layer 3 also includes much better compression of the data itself, reducing the siZe of the output by 'huffman' compressing the results of its work. In layer 3, the amount of data that can be stored in a frame (instant of time) can change. When compressing, the encoder can be looking ahead and decide that there's something complicated coming up and store less data about the 'current' sound in its frame. This means that it can have space in this frame for more data about the /future/ frame that is more complex. This is known as a 'bit reservoir' and basically means that the encoders knowledge about the upcoming sound complexity can be taken account of, packing more data in. This is also the reason why you can't just flick to any point in a Layer 3 file and expect it to play. Any given frame may require data from a previous frame.
All this processing that goes on, based on the response of the human ear is obviously based on particular generalised form of the effectively heard sound. But of course, general is not equal to specific. *You* may hear things differently to the generalised form. This is why people will say that MP3 sounds awful. They're making wild, and strictly inaccurate, statements. What they mean is that to them it sounds awful. It really does depend on your own hearing. Similarly it depends on what's being played - one sounds right for one thing might not sound right for another. Take nobody's word for it until you've actually listened to MP3s yourself, and the output from different encoders will produce different results because they use different models.
Which brings us back to the lack of a psychoacoustic model in shine. Shine is intentionally designed as an implementation that pays no heed to any of the perceived sound things that I've explained above (and the ones that I haven't). It applies a very simple model of its own which doesn't relate to the particular characteristics of the human perception.
Whereas the quality of a good psychoacoustic model based encoder can actually be perceived to be very high at low bitrates, shine doesn't have that advantage. At low bitrates it can't take advantage of the knowledge about what will actually be heard by the listener because it has no such knowledge. It wades blindly on, producing something akin to noise (IMO). You need something approaching 160kbit/s to get anywhere near a decent reproduction out of shine, and it can't compete with an encoder which uses a good psychoacoustic model at the same bitrate.
So why use a very poor encoder to get a reasonable sound out when your could use a decent encoder to get a good sound out ? Speed! Shine doesn't do much (comparitively). It can be fast. Fast, though, is a comparitive term - as I remember it, shine gets something like real time encoding at 128kbit/s on a SA200 (from memory). LAME (another encoder, wildly regarded as 'good') manages (I'm informed) around 0.1x speed on a SA233 - it takes 10 times as long to compress a file than the sound it contains (1 minute of sound takes 10 minutes to compress). Obviously using shine makes some sense in that respect, but if you actually want to listen to music regularly, rather then you don't want to be listening to something that's rather poor, but a more decent version.
This is one reason why I consider the pitiful 64M siZe of some of the flash MP3 players to be quite pointless - you can't get enough in them at a decent quality. Small they may be, but small does not quality make. (That said, I've got one and it's cute, but only in a 'play with it' way)
My own personal opinion is that encoding MP3s on RISC OS is unproductive. My Laptop encodes the MP3s at 10x speed (1 minute compressed in about 6 seconds) and it's nothing special. I'll just leave RISC OS to do the playback. It's good at that. However, that's just my view... the point was to explain the terminology in, hopefully, a way in which readers might understand rather than to just leave it hanging like an 'I know more than you' comment.
Obligatory disclaimer: This information is based on my own research into what MPEG audio consists of, work on AMPlayer and a lot of MP3s. Oh and my memory. If you don't believe it, it may be wrong. Look it up for yourself; the information's all there on the 'net. Well, all except the actual specifications. But you don't want to read them anyhow. Oh, and understand that I've simplified stuff quite a bit, partly due to memory, and partly 'cos you really don't want to know about some of the inner bits that are ... complicated.
Please login before posting a comment. Use the form on the right to do so or create a free account.
Search the archives
Today's featured article
RISC OS artist wows public with digital artwork A RISC OS-using artist has described exhibiting his digitally-created work in a public gallery as a "rewarding experience". Richard Ashbery, who used ArtWorks and Photodesk to create his images, showed off patterns and colourful illustrations to punters, who told him his work made a change from the oils and watercolour masterpieces usually exhibited. 1 comment, latest by socris on 18/11/08 4:23PM. Published: 17 Nov 2008