Media encoding (course notes by G.v. B. based on Chapters
2 and 3 from Lu's book)
Representation of audio
The physical reality of audio is the air pressure changing over time (see
Figure 2.1). The amplitude is the the difference between minimum and maximum
pressure. This is an analogue phenomenon, that is, the pressure is a continuous
function of time.
Fourier transformation can be used to represent a waveform (function over
time) into a an equivalent superposition of sine waves with different frequencies.
The human ear is only sensitive to the frequencies between approximately
20 and 20 000 Hertz (that is, periods per seconds).
Digital representation of audio
For representing digitally any analogue phenomenon, one performs the following
steps (see Figures 2.2 and 2.3):
- sampling
- quantization
- coding the digital representation obtained in the above two steps
Sampling rate: must be at least two times the period of the highest frequency
to be represented (see Figure 2.4)
Quantization error
Advantage of non-linear quantization
Examples of audio standards (see Table 2.1)
Representation of video
Video is a multi-functional analogue phenomenon of three dimensions (while
audio is a single function of one dimension/argument: namely time):
- the light we see in a particular direction of our view is a superposition
of light energy for all light wavelengths between approximately 0.4 to 0.8
micrometers.
- The light intensity depends on the direction of our view (x and y coordinates)
and the time.
Our eyes only distinguish 3 colours. Therefore we can limit our consideration
to three light intensities (for more details, see Lu, Section 2.5)
To obtain a digital representation, we have to perform the same steps as
for audio:
- sampling in the three dimentions: (for examples of errors introduced,
see Figure 2.10)
- time sampling: frame rate
- vertical sampling: raster scanning (see Figure 2.5)
- horizontal sampling: horizontal resolution
- quantization: for each of the three colours (non-linear quantization:
Gamma; see Figure 2.15)
- coding
Limits of human perception: contrast sensitivity is 1% (this is concerning
the quantization error)
Aspect ratio (see Figure 2.7)
Colour TV standards: NTCS, PAL, SECAM (see Table 2.2); HDTV
Summary: digital data rate (uncompressed): see Table 2.5
Compression principles
Redundancy in the data:
- Entropy encoding (consider the different probabilities of having each
of the possible values), e.g. Huffman coding
- predictive coding
- for audio (see Table 2.1): differential PCM (e.g. Delta modulation);
separate encoding for different frequency sub-bands (requiring different
precision)
- for video: along the same line, vertically from line to line, temporal
redundancy from frame to frame
- vector quantization: based on a fixed vector of typical "samples"; if a
sample is encountered, only its index is transmitted
Lossless versus lossy compression
- allowable loss depends on quality of desired reproduction, which in
turns will also depend on the possibility of human listening/viewing perception
accuracy
- Examples of lossless encodings: Huffman, Fax.
Constant versus variable bit rate encodings
Complexity of encodings: symmetric efficiency or efficiency only for receiving
(decoding) [e.g. MPEG 1 and 2]; hardware or software implementations.
Audio compression
non-linear quantization (mu or A law)
predictive coding: differential PCM
standards: see Table 3.2
MP3 (see Figure 3.5): it takes into account the accuracy of perception of the
human ear (it is not necessary to transmit details that cannot be perceived by
the human ear)
Video compression
See Figure 2.9
Predictive coding (see above)
Motion prediction
Transform encoding
Standards: JPEG (for still images), Motion JPEG (no inter-frame compression),
H.261 and 263, MPEG 1, 2, 4 (see Table 3.8 for different MPEG-2 profiles)
MPEG's I, P and B frames (see Figure 3.10)
Block structure of frames (see Figure 3.11)
Scalable encodings (several levels of quality): base level, enhancement levels
Created: Sept. 16, 2003; last updated:
Sept. 21, 2004