Lossy Audio Formats

Although the question "which lossy audio compression format is best" is asked quite regularly, the answer has been more or less the same for quite some time now (read: a year or so...). Currently, there are only a few popular and high quality lossy audio compression formats worth mentioning. The answer to "what is best" can be interpreted in different ways, however...quality?, hardware support?, availability?, portability?, free?, legal? are all questions that must be answered to analyze the candidates and settle for a format to use. Indeed, not everyone will choose the same format - needs and desires vary from person to person, and what may be advantageous to one could be unnecessary to another. Keep this in mind while the "format wars" go on in the various forums.

MP3 (MPEG 1 Layer 3)

MP3 is currently one of the lowest quality lossy audio compression format. However, being one of the first audio formats, and being very popular currently, the future of MP3 seems stable in the near future, although the emergence of Ogg Vorbis and Psytel Advanced Audio Coding to the mass market would surely put an end to that eventually...

Contrary to what many people believe, this is not MPEG3 audio. It's actually MPEG, Layer 3. It was developed in the late 80s by the Fraunhofer Institute in conjunction with the University of Erlangen. Today, the patent rights belong to Thompson and Fraunhofer IIS, and are granted by Thompson. Many people think MP3 is "free", but a license is required to sell products that encode or decode MP3 as well as to broadcast commercial MP3 content. The standard was developed before content protection and online distribution of pirated music became an issue, and thus contains no DRM at all. This, along with the relative high speed of encoding and decoding, has made the format popular with end users, but not with record labels. None of the major for-pay online services use MP3 as their file format. You can read a lot more about MP3 at the Fraunhofer IIS page.

Technical Information

MP3 is standardized as ISO-MPEG audio layer 3 (is 11172-3 and is 13818-3). It became the de facto standard for lossy audio encoding, due to the high compression rates (1/12 of the original size, still remaining considerable quality), the high availability of decoders and the low CPU requirements for playback (486 dx2-66 is enough for real-time decoding). It supports multichannel files (although there's no implementation yet), sampling frequencies from 16 khz to 24 khz (MPEG2 Layer 3) and 32 khz to 48 khz (MPEG1 Layer 3). Formal and informal listening tests have shown that MP3 at the 192-256 kbps range provide encoded results undistinguishable from the original materials in most of the cases.

MP3 uses the following for compression:
  • Huffman coding
  • Quantization
  • M/s matrixing
  • Intensity stereo
  • Channel coupling
  • Modified discrete cosine transform (MDCT)
  • Polyphase filter bank (there is a non-standardized form of MP3 called MP3Pro, which takes advantage of SBR encoding to provide better quality at small bitrates)
Pros of MP3:
  • Transparent quality at alt-preset extreme (Lame) in most of the cases
  • ISO standard
  • Part of MPEG specs
  • Fast decoding
  • Anyone can create it's own implementation (specs and demo sources available)
  • Nearly all portable players support it
  • Relaxed licensing schedule
  • Lower complexity than AAC or Vorbis
Cons of MP3:
  • Problem cases that trip out all transform codecs
  • Slow encoding (using Lame VBR)
  • Sometimes, maximum bitrate (320 kbps) isn't enough
  • No multichannel implementations

MP3 Pro

Thompson acquired this new format in 2001 from its Swedish partner company Coding Technologies, who developed it when researching a hearing device for the deaf. Listening to music compressed with MP3 at 128 kbps sounds great. But if you go to lower bit rates the great sound starts lacking the high frequency components. At bit rates of 64 kbps and below the music may begin to sound dull. The reason is that MP3 at these bit rates runs out of bits to compress the music in full audio bandwidth and with significant detail. In this situation the developers of MP3 had to decide whether their codec should produce MP3 music with distortion (so called "coding artefacts") or with limited bandwidth. They opted for limited bandwidth. And as a result, you experience lower bit rate MP3 as band-limited music with just a few distortions.

What is MP3PRO Technology?

To improve the sound quality of MP3 at lower bit rates, Coding Technologies has developed an enhancement technology that gives back the sound the high frequency components. The technology is called "Spectral Band Replication" (SBR). SBR is a very efficient method to generate the high frequency components of an audio signal. Combining MP3 with the SBR enhancement technology generates an audio signal with high bandwidth at low bit rates. MP3PRO, the resulting audio format is composed out of two components, the MP3 part for the low frequencies and the SBR or "PRO" part for the high frequencies. Since the "PRO" part requires only a few kbps, the format could be done in a way that it is still compatible with the original MP3 format. This fact allows existing MP3 players to play MP3PRO files. They simply ignore the PRO part. The only requirement is that they also have to support sampling rates of 16, 22.5 and 24 kHz along with 32, 44.1 and 48 kHz. While all (MP3 standard compliant) software players fulfil this requirement, not all portable and CD/DVD players do.

Supported Bitrates

MP3PRO technology can support more bit rates than just 64 kbps. The following bit rates are supported by MP3PRO:

  • Mono: 18, 20, 24, 32, 40, 48, 56 kbps
  • LC-stereo: 18, 20, 24, 32, 40, 48, 56 kbps
  • Stereo: 32, 40, 48, 56, 64, 80, 96 kbps
The chief advantage

MP3Pro is compatible with MP3 - a basic MP3 player will play a MP3 Pro file, just without the improvements in sound quality. But MP3 Pro has no DRM capabilities either, so it shares in MP3's inability to gain acceptance by music studios.

MPC

Musepack / MPEGPlus (MPC, MP+) is currently the greatest quality format at any rate above ~160 kbps. It uses 100% variable bitrate (VBR) and was designed (by Andre Buschmann) and later optimized (by Frank Klemm) to achieve "perceptual transparency" at a very low filesize while maintaining unsurpassing quality. Due to this inherent objective in the Musepack format, the desire for low datarates (below ~160) is NOT recommended, but is supported with the "thumb" profile. Musepack supports ID3v1, ID3v1.1, APE v1, and APE v2 tagging methods but is NOT compatible with ID3v2 tags. Due to legal reasons, it appears unlikely that Musepack will have hardware support (think DVD player or portable MP3 player). Musepack is considered shareware, and at stream version 8 (SV8) the encoder may have a price, however, decoding will always be free. It is unlikely that Musepack will gain any commercial acceptance, due to the lack of CBR for streaming; users of Musepack have - in the past, and will continue to be in the future - been limited to the audiophile community.

Technical information

Musepack is a lossy audio compression scheme created by Andree Buschmann. It is strongly based in MPEG audio Layer 2 (MP2) algorithms. It has simple stereo support, and with stream version 7 (SV7) is currently limited to a frequency of 44.100 Hz - although stream version 8 (SV8) will be able to encode 32/48 kHz streams, as well as multichannel. Informal listening tests have demonstrated that Musepack is the best publicly available lossy audio encoder at bitrates higher than 160 kbps. Being a subband encoder and given their inherent nature to be less efficient than transform coders, it is worse than AAC and Ogg Vorbis in bitrates lower than 160 kbps.

Musepack uses the following for compression:
  • MP2 compression technologies
  • Subband-based selectable channel coupling
  • Huffman coding
  • Differential Huffman coding
  • Vastly improved psymodel
  • Non-linear spreading function
  • ANS (adaptive noise shaping)
  • CVD (clear voice detection)
  • Temporal masking with variable time constants
Pros of Musepack:
  • Transparent quality at extreme preset
  • Very fast encoding/decoding
  • Very low CPU usage during playback
  • 100% VBR quality
  • Open-source decoder
Cons of Musepack:
  • Restricted to audiophile community
  • Bad quality at low bitrates
  • Very limited (2 channels, 44.100 Hz, 16 bits only)
  • Undefined licensing scheme, due to MP2 and PNS algorithms patents
  • Lack of portable players support
  • Lack of CBR mode (not suitable for streaming in slow connections)

AAC

Short for Advanced Audio Coding, AAC has been part of the MPEG-2 spec ever since the Motion Picture Experts Group declared it standard in April of 1997. It was developed by the Fraunhofer Institute in conjunction with companies like AT&T, Sony, and Dolby.

Although AAC is not recognized as the greatest quality format for bitrates above ~160 kbps, when addressing all factors dealing with audio compression, AAC seems to be the greatest overall format. AAC is only second to Musepack regarding quality at bitrates above ~160 kbps, and is the greatest quality at bitrates of ~96 kbps to ~160 kbps. AAC is considered a "multi-purpose" format, as virtually anything you throw at it is possible - streaming, constant bitrates, commercial appeal, audiophile acceptance, etc. AAC is currently hardware supported, under Philips Electronics who hold a patent due to a feature of the encoding algorithm - which makes AAC the highest quality lossy audio compression format with available hardware support.

Introduction

Think of Advanced Audio Coding (AAC) as the natural successor to MP3. AAC derives from MP3, but avoids many of the inherent shortcomings of the MP3 format. Like MP3, there are many "flavors" of AAC (FHG, FAAC, Psytel, etc.), but only Pystel and FAAC are attainable among end-users due to the mythical nature FHG AAC - a commercial and sought after "commodity".

Advanced Audio Coding is the brainchild of Ivan Dimkovic, primary (only as of now) developer of the Psytel AACenc (read: AAC encoder). Psytel AAC is the highest quality AAC format available to end users.

Free advanced audio coding (FAAC) is just what it sounds like - a free implementation of the AAC format, meaning the impelmentation will never be proprietary. Menno Baker is the primary developer of FAAC and FAAD (the decoder) which is the recommended decoder of the AAC format.

Technical information

MPEG2/MPEG4 AAC, also known as MPEG2 NBC (not backwards compatible) is considered the actual "state of the art" in audio coding. It allows for the inclusion of up to 48 full-bandwidth audio channels (up to 96 khz) in one stream , plus 15 low frequency enhancement channels (LFE, limited to 120 hz) and up to 15 data streams, with the capacity of being multilingual as well.

MPEG formal listening tests demonstrated that, at 96 kbps, AAC can provide audio quality slightly superior to MP3 at 128 kbps and MP2 at 192 kbps.

AAC uses the following for compression:
  • Huffman coding
  • Quantization and scaling
  • M/s matrixing
  • Intensity stereo
  • Channel coupling
  • Backward adaptive prediction
  • Temporal noise shaping (TNS)
  • Modified discrete cosine transform (IMDCT)
  • Gain control and hybrid filter bank (polyphase quadrature filter (IPQF) + IMDCT)
  • Long term predictor (LTP, MPEG4 AAC only)
  • Perceptual noise substitution (PNS, MPEG4 AAC only)
Future developments should include spectral band replication (SBR) and other tools to improve the overall quality.
Pros of AAC:
  • Transparent quality at "archive", very near transparent at "extreme"
  • An ISO standard
  • Usable in low-delay streaming
  • Part of MPEG4 specs
  • Anyone can create it's own implementation (specs and demo source available)
  • Many sampling rates (8000-96000 hz), up to 256 kbits/s per channel, up to 48 channels
Cons of AAC:
  • Inherent problems as a transform codec
  • Slow encoding
  • Very tight licensing schedule
  • Increased complexity
Usability for music encoding
  • Tape
  • Radio
  • Internet
  • Streaming
  • Normal
  • Extreme
  • Archive
  • Ultra
The quality ranges from tape (lowest VBR quality) to ultra (highest VBR quality). Ultra is considered overkill for most audio tracks, i.e: shouldn't be used except for extremely difficult music signals. example: AACenc extreme if "audio file.wav"

Technically, the AAC format can support up to 48 full frequency sound channels, so 5.1 or 7.1 sound is entirely possible. It also supports sample rates up to 96 KHz, twice the maximum afforded by MP3. Recently, MPEG-4 AAC added a couple of technologies to the spec that improve quality at extremely low bit rates (think cell phones). At higher bit rates, though, it's essentially the same as MPEG-2 AAC. This is the format used for songs downloaded in the popular iTunes Music Store, but AAC does not have any real DRM of its own, so Apple uses its own DRM "wrapper", called FairPlay, on iTunes songs.

Ogg Vorbis

Ogg Vorbis is similar to MP3 or AAC compression formats, but with one important difference. It is completely free, unpatented, and open-source. There are actually two terms here: Ogg is the file container that should one day contain both audio and video, while Vorbis is the actual audio compression designed to be contained within it. The .ogg container may embed other formats, though, like FLAC or Speex. This is important because the Vorbis compression scheme is optimized for music and general-purpose audio, not low-bit rate speech compression, and it has no lossless compression option. Vorbis supports 6-channel (5.1) audio, and is fairly well supported by software but almost unseen in the hardware player market. It's royalty-free nature has made it popular with some game developers, though. It is used several PC games including Unreal Tournament 2003, Serious Sam: The Second Encounter, and Harry Potter and the Chamber of Secrets, to name a few.

Ogg Vorbis is currently the third best format for datarates above ~160 kpbs, the second best format for datarates of ~96 to ~160, but it is also the greatest quality format for bitrates below ~96 kbps. Ogg Vorbis is 100% free and open source (it will never have a cover charge) and it appears development will never cease. At a beta stage, Ogg Vorbis is already among the leaders of lossy audio compression and improvements are occurring steadily. Ogg Vorbis has it's greatest user base within the open-source community and among Linux users - many of whom would not use any format besides Ogg Vorbis due to legalities and a proprietary nature. It is believed that Ogg Vorbis will have hardware support in the near future.

You can convert any audio format to Ogg Vorbis. However, converting from one lossy format, like MP3, to another lossy format, like Vorbis, is generally a bad idea. Both MP3 and Vorbis encoders achieve high compression ratios by throwing away parts of the audio waveform that you probably won't hear. However, the MP3 and Vorbis codecs are very different, so they each will throw away different parts of the audio, although there certainly is some overlap. Converting a MP3 to Vorbis involves decoding the MP3 file back to an uncompressed format, like WAV, and recompressing it using the Ogg Vorbis encoder. The decoded MP3 will be missing the parts of the original audio that the MP3 encoder chose to discard. The Ogg Vorbis encoder will then discard other audio components when it compresses the data. At best, the result will be an Ogg file that sounds the same as your original MP3, but it is most likely that the resulting file will sound worse than your original MP3. In no case you will get a file that sounds better than the original MP3.

Since many music players can play both MP3 and Ogg files, there is no reason that you should have to switch all of your files to one format or the other. If you like Ogg Vorbis, then we would encourage you to use it when you encode from original, lossless audio sources (like CDs). When encoding from originals, you will find that you can make Ogg files that are smaller or of better quality (or both) than your MP3s.

WMA

Microsoft's Windows Media Audio format has undergone many major changes in the past few years, with drastic improvements in quality, efficiency, and features. Today's WMA9 technology includes four separate codecs:

Windows Media Audio 9: Microsoft claims a 20% improvement in quality/bit rate over WMA8, but the big addition here is support for VBR encoding. Fortunately, you can decode WMA9 files with devices and software made to decode previous generations of WMA. Windows Media Audio 9 Professional: The Pro edition is similar to WMA9, but supports up to 24-bit/96 KHz audio and sound formats up to 5.1 and even 7.1. One of its cool features is that the decoder will automatically adapt the audio material to whatever hardware you have. So if you try to play WMA9 Pro file that is 5.1 at 24-bit/96 KHz on sound hardware that can only do stereo 16-bit/48 KHz, it will fold the sound down to that spec.

Windows Media Audio 9 Lossless: This is a VBR-only codec that produces absolutely perfect, mathematically lossless copies of an original audio file, including 24-bit/96 KHz and 5.1 audio. The compression ratio isn't nearly as high as with lossy compression, averaging from around 2:1 to 4:1 (depending on the complexity of the source material). This codec is designed for professionals and audiophiles that want to archive perfect copies of their music.

Windows Media Audio 9 Voice: This codec is optimized for extremely low bit rate files, like those that you would stream over a dial-up Internet connection or cell phone, or that you would use for real-time online voice chat. Note that just because a portable music device or piece of software can play back "Windows Media Audio", that does not mean it can play the Professional, Lossless, or Voice formats. Those require their own decompressors. The WMA format has robust DRM and, coupled with Microsoft's influence and a reasonable royalty rate, has become quite popular among online music services. Napster, MusicMatch, BuyMusic.com, and Wal-Mart's online music store all use WMA.

RealAudio

The RealAudio format from Real Networks started many years ago as a delivery mechanism for streaming audio over the net, primarily with dial-up internet connections. Times have changed, of course, and the latest technology is much more robust. At bit rates less than 128 kbs, RealAudio 10 uses its own proprietary compression technology. At higher bit rates, it uses MPEG-4 AAC. It's backwards compatible with RealAudio 8 players, too. The new Real 10 platform also incorporates RealAudio Lossless for true lossless compression and RealAudio Multichannel for up to 5.1 audio, though these formats require RealPlayer 10 or better for playback.

ATRAC

Sony's ATRAC (Adaptive Transform Acoustic Coding) format is most widely used in its MiniDisc and solid-state Walkman digital music players. The original ATRAC format is arguably not quite as good as a well-encoded MP3 file, though it has gone through several revisions. Newer Sony hardware, like most Clie PDAs, use the superior (and incompatible) ATRAC3 format. The latest format, ATRAC3plus, promises twice the efficiency of ATRAC3, but is only just now seeing widespread support with the release of the Hi-MD MiniDisc players. ATRAC works by splitting the sound signal into separate frequency bands and compressing them separately.

What is the best lossy audio format?

There is not one best format. All formats have both strong and weak points.

Compression Advantages Disadvantages
MP3The best portability, software and hardware supportQuality is not enough for demanding people
Encoder and decoder source code availableMany MP3s available for download are far from optimal quality, because bad quality encoders exists
Quality is good with high bitrate and with high bitrate VBRMany different quality encoders (some are very bad quality)
Future hardware support is pretty much certain
OGG VorbisFree, encoder and decoder are open sourceNo dedicated portable player or hardware support yet
Patent free technology, streaming is free of chargeStill not final, development of encoder release candidates takes a long time
Better quality than MP3 at all bitrates
With recent encoder versions quality gap between Ogg Vorbis and MPC is closing
Bitrate peeling (adaptively scalable bitrate without re-encoding)
Most likely format to achieve the largest user base after MP3
Downloadable files are high quality (no MP3-like quality differences)
AACHigh quality AAC encoder produces much better quality than MP3 Heavily patented
MPEG ISO compatible streams (Psytel AAC encoder)Licence royalties
Hardware support is emerging: future looks goodEncoding is CPU-intensive
Streaming of MPEG2 and MPEG4 AAC is free of charge
Very flexible (multichannel, 8-96 khz, various bit depths)
Can achieve good quality at lower bitrates
Still in development (improvements to come)
Musepack/MPCThe best quality at mid/high bitratesNot very good for 128 kbps and under encoding
Very fast encoding and decodingNo dedicated portable player or hardware support
Decoder is free and source code is available Very limited: 2 channels, 44.100hz, 16bit (will change with SV8)
Current encoder (mppenc 1.0) will remain free No direct filters for movie audio, only for music encoding
Will be playable in future programmable portable PDAs
Downloadable files are high quality (no MP3-like quality differences)
WMA, RealAudio, VQF, MP3PROWMA, RealAudio and MP3pro should be used for very low bitrate music only, and high quality should not be expectedWe don't consider these formats good enough quality wise, to compete with the above formats, except WMA and MP3pro can compete at 64 kbps and lower bitrates
Microsoft's claims that WMA8-64kbps is CD-quality, is very far from the truth, no format can achieve even near transparent results at under 128 kbps
Related Links