It is a well known fact that exploiting temporal redundancy in video coding improves compression efficiency. Recent research results have shown that adopting a spatio-temporal multiresolution representation for video coding can represent a flexible base for Scalable Video Coding (SVC). In particular, wavelet-based video coding frameworks provide many attractive features. Scalability is related to the possibility (in any time and system configuration) of having a direct access to the right amount of coded information (i.e. avoiding over-transmission or data format conversion or transcoding) in order to optimally access, communicate and use the desired video content with respect to the allowable transmission throughput and receiving device features. Academic and industrial communities are more and more convinced that a combination of different scalability attributes (here and quite commonly referred to as full scalability) can be achieved without sacrificing coding performance. Full scalability in terms of reconstruction quality (e.g. PSNR), spatial and temporal resolutions is usually required to optimally and dynamically adapt to the size of displaying terminals, to the related frame-rate reproduction capabilities and/or power saving (temporary or structural) needs as well as to the available throughput on communication networks, channels and distribution nodes. This may turn out a natural evolution of the current JPEG 2000 standard which has already been or may be adopted for handling digital image sequences in a variety of contexts (D-Cinema, E-Cinema, HDTV, secure and efficient content distribution on heterogeneous networks and devices,…). By adding temporal prediction, improved coding efficiency of video with scalability functionalities is likely to be established, thus leading to great broadening of the standard features. JPEG2000 compatibility can be granted on both intra-frame and residual information in a very natural way. The compatibility could be extended to the motion field coding part especially in consideration of possible innovative motion prediction and compensation scenarios based e.g. on dense (non block-based) motion vector fields (ideally more appropriate to be used in conjunction with non block based transform). A scalable video codec typically consists of three modules: encoder, extractor and decoder, so that fractions of the bit-stream can be discarded without the need to decode even partially the compressed bit-stream. Figure 1 shows a typical SVC system, referring to the coding of a video signal at an original resolution CIF (288 height, 352 width) and a framerate of 30 fps. In the example, the higher operating point and decoded quality corresponds to a bit rate of 2Mbps referred to the original spatial and temporal resolutions. For a scaled decoding in terms of spatial and/or temporal and/or quality resolution, the decoder only works on a portion of the original coded bit stream according to the indication of the desired working point. Such stream portion is extracted from the originally coded stream by a block called “extractor”. In Figure 1 it is shown to be arranged between coder and decoder. According to the application field, it can be realised as an independent block or it can be an integral part of the coder or decoder. The extractor receives the information relating to the desired working point (in the example of Figure 1, a lower spatial resolution QCIF (144, 176), a lower frame rate (15 fps), and a lower bit-rate (quality) (150 kbps) and extracts a decodable bit stream matching or almost matching the specifications of the indicated working point. One of the main differences between an SVC system and a transcoding system is the very low complexity of the extraction block that does not require coding/decoding operations and typically consists in simple “cut and paste” operations on the coded bit-stream. Given the peculiarity of video signals in time, it is appropriate to use motion-compensated temporal filtering (MCTF) with an adaptive selection of wavelet filters. In the spatial domain an adaptive 2D wavelet transform can be applied. A brief survey of typical approaches is reported in the next section (for more details refer to [1]). Figure 1. SVC rationale.

Efficient Wavelet-Based Video Compression

ADAMI, Nicola;LEONARDI, Riccardo;SIGNORONI, Alberto;
2006-01-01

Abstract

It is a well known fact that exploiting temporal redundancy in video coding improves compression efficiency. Recent research results have shown that adopting a spatio-temporal multiresolution representation for video coding can represent a flexible base for Scalable Video Coding (SVC). In particular, wavelet-based video coding frameworks provide many attractive features. Scalability is related to the possibility (in any time and system configuration) of having a direct access to the right amount of coded information (i.e. avoiding over-transmission or data format conversion or transcoding) in order to optimally access, communicate and use the desired video content with respect to the allowable transmission throughput and receiving device features. Academic and industrial communities are more and more convinced that a combination of different scalability attributes (here and quite commonly referred to as full scalability) can be achieved without sacrificing coding performance. Full scalability in terms of reconstruction quality (e.g. PSNR), spatial and temporal resolutions is usually required to optimally and dynamically adapt to the size of displaying terminals, to the related frame-rate reproduction capabilities and/or power saving (temporary or structural) needs as well as to the available throughput on communication networks, channels and distribution nodes. This may turn out a natural evolution of the current JPEG 2000 standard which has already been or may be adopted for handling digital image sequences in a variety of contexts (D-Cinema, E-Cinema, HDTV, secure and efficient content distribution on heterogeneous networks and devices,…). By adding temporal prediction, improved coding efficiency of video with scalability functionalities is likely to be established, thus leading to great broadening of the standard features. JPEG2000 compatibility can be granted on both intra-frame and residual information in a very natural way. The compatibility could be extended to the motion field coding part especially in consideration of possible innovative motion prediction and compensation scenarios based e.g. on dense (non block-based) motion vector fields (ideally more appropriate to be used in conjunction with non block based transform). A scalable video codec typically consists of three modules: encoder, extractor and decoder, so that fractions of the bit-stream can be discarded without the need to decode even partially the compressed bit-stream. Figure 1 shows a typical SVC system, referring to the coding of a video signal at an original resolution CIF (288 height, 352 width) and a framerate of 30 fps. In the example, the higher operating point and decoded quality corresponds to a bit rate of 2Mbps referred to the original spatial and temporal resolutions. For a scaled decoding in terms of spatial and/or temporal and/or quality resolution, the decoder only works on a portion of the original coded bit stream according to the indication of the desired working point. Such stream portion is extracted from the originally coded stream by a block called “extractor”. In Figure 1 it is shown to be arranged between coder and decoder. According to the application field, it can be realised as an independent block or it can be an integral part of the coder or decoder. The extractor receives the information relating to the desired working point (in the example of Figure 1, a lower spatial resolution QCIF (144, 176), a lower frame rate (15 fps), and a lower bit-rate (quality) (150 kbps) and extracts a decodable bit stream matching or almost matching the specifications of the indicated working point. One of the main differences between an SVC system and a transcoding system is the very low complexity of the extraction block that does not require coding/decoding operations and typically consists in simple “cut and paste” operations on the coded bit-stream. Given the peculiarity of video signals in time, it is appropriate to use motion-compensated temporal filtering (MCTF) with an adaptive selection of wavelet filters. In the spatial domain an adaptive 2D wavelet transform can be applied. A brief survey of typical approaches is reported in the next section (for more details refer to [1]). Figure 1. SVC rationale.
2006
File in questo prodotto:
File Dimensione Formato  
n3954-SMALL.pdf

accesso aperto

Descrizione: ISO/IEC JTC1/SC29/WG1 N3954 39th JPEG meeting, Jul. 2006, Assisi (PG), Italy
Tipologia: Full Text
Licenza: Creative commons
Dimensione 993.26 kB
Formato Adobe PDF
993.26 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/10889
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact