Media encoding (course notes by G.v. B. based on Chapters 2 and 3 from Lu's book)

Representation of audio

The physical reality of audio is the air pressure changing over time (see Figure 2.1). The amplitude is the the difference between minimum and maximum pressure. This is an analogue phenomenon, that is, the pressure is a continuous function of time.
Fourier transformation can be used to represent a waveform (function over time) into a an equivalent superposition of sine waves with different frequencies.
The human ear is only sensitive to the frequencies between approximately 20 and 20 000 Hertz (that is, periods per seconds).

Digital representation of audio

For representing digitally any analogue phenomenon, one performs the following steps (see Figures 2.2 and 2.3):
Sampling rate: must be at least two times the period of the highest frequency to be represented (see Figure 2.4)
Quantization error
Advantage of non-linear quantization
Examples of audio standards (see Table 2.1)

Representation of video

Video is a multi-functional analogue phenomenon of three dimensions (while audio is a single function of one dimension/argument: namely time):
Our eyes only distinguish 3 colours. Therefore we can limit our consideration to three light intensities (for more details, see Lu, Section 2.5)
To obtain a digital representation, we have to perform the same steps as for audio:
Limits of human perception: contrast sensitivity is 1% (this is concerning the quantization error)
Aspect ratio (see Figure 2.7)
Colour TV standards: NTCS, PAL, SECAM (see Table 2.2);   HDTV

Summary: digital data rate (uncompressed): see Table 2.5

Compression principles

Redundancy in the data:
Lossless versus lossy compression
Constant versus variable bit rate encodings
Complexity of encodings: symmetric efficiency or efficiency only for receiving (decoding) [e.g. MPEG 1 and 2]; hardware or software implementations.

Audio compression

non-linear quantization (mu or A law)
predictive coding: differential PCM
standards: see Table 3.2
MP3 (see Figure 3.5): it takes into account the accuracy of perception of the human ear (it is not necessary to transmit details that cannot be perceived by the human ear)

Video compression

See Figure 2.9
Predictive coding (see above)
Motion prediction
Transform encoding
Standards: JPEG (for still images), Motion JPEG (no inter-frame compression), H.261 and 263, MPEG 1, 2, 4 (see Table 3.8 for different MPEG-2 profiles)
MPEG's I, P and B frames (see Figure 3.10)
Block structure of frames (see Figure 3.11)
Scalable encodings (several levels of quality): base level, enhancement levels

Created: Sept. 16, 2003; last updated: Sept. 21, 2004