Current 3-D wavelet video coding schemes with Motion Compensated Temporal Filtering (MCTF) can be divided into two main categories. The first performs MCTF on the input video sequence directly in the full resolution spatial domain before spatial transform and is often referred to as spatial domain MCTF. The second performs MCTF in wavelet subband domain generated by spatial transform, being often referred to as in-band MCTF. Figure 1(a) is a general framework which can support both of the above two schemes. Firstly, a pre-spatial decomposition can be applied to the input video sequence. Then a multi-level MCTF decomposes the video frames into several temporal subbands, such as temporal highpass subbands and temporal lowpass subbands. After temporal decomposition, a post-spatial decomposition is applied to each temporal subband to further decompose the frames spatially. In the framework, the whole spatial decomposition operations for each temporal subband are separated into two parts: pre-spatial decomposition operations and post-spatial decomposition operations. The pre-spatial decomposition can be void for some schemes while non-empty for other schemes. Figure 1(b) shows the case of the t+2D scheme where pre-spatial decomposition is empty. Figure 1(c) shows the case of the 2D+t+2D scheme where pre-spatial decomposition is usually a multi-level dyadic wavelet transform. Depending on the results of pre-spatial decomposition, the temporal decomposition should perform different MCTF operations, either in spatial domain or in subband domain. (a) The general coding framework; (b) Case for the t+2D scheme (Pre-spatial decomposition is void); (c) Case for the 2D+t+2D scheme (Pre-spatial decomposition exists). Figure 1: Framework for 3-D wavelet video coding. A first classification of SVC schemes according to the order the spatial and temporal wavelet transform are performed was introduced in the first Scalable Video Models [1], [2] on the base of the Call for Proposals responses at Munich meeting. The so called t+2D schemes (one example is [3]) performs first an MCTF, producing temporal subband frames, then the spatial DWT is applied on each one of these frames. Alternatively, in a 2D+t scheme (one example is [4]), a spatial DWT is applied first to each video frame and then MCTF is made on spatial subbands. A third approach named 2D+t+2D uses a first stage DWT to produce reference video sequences at various resolutions; t+2D transforms are then performed on each resolution level of the obtained spatial pyramid. Each scheme has evidenced its pros and cons [5,6] in terms of coding performance. From a theoretical point of view, the critical aspects of the above SVC scheme mainly reside: i) in the coherence and trustworthiness of the motion estimation at various scales (especially for t+2D schemes); ii) in the difficulties to compensate for the shift-variant nature of the wavelet transform (especially for 2D+t schemes); iii) in the performance of inter-scale prediction (ISP) mechanisms (especially for 2D+t+2D schemes). An analysis of the differences between schemes is also reported in the sequel.

Proposed Status Report on Wavelet Video Coding Exploration

LEONARDI, Riccardo;SIGNORONI, Alberto;
2006-01-01

Abstract

Current 3-D wavelet video coding schemes with Motion Compensated Temporal Filtering (MCTF) can be divided into two main categories. The first performs MCTF on the input video sequence directly in the full resolution spatial domain before spatial transform and is often referred to as spatial domain MCTF. The second performs MCTF in wavelet subband domain generated by spatial transform, being often referred to as in-band MCTF. Figure 1(a) is a general framework which can support both of the above two schemes. Firstly, a pre-spatial decomposition can be applied to the input video sequence. Then a multi-level MCTF decomposes the video frames into several temporal subbands, such as temporal highpass subbands and temporal lowpass subbands. After temporal decomposition, a post-spatial decomposition is applied to each temporal subband to further decompose the frames spatially. In the framework, the whole spatial decomposition operations for each temporal subband are separated into two parts: pre-spatial decomposition operations and post-spatial decomposition operations. The pre-spatial decomposition can be void for some schemes while non-empty for other schemes. Figure 1(b) shows the case of the t+2D scheme where pre-spatial decomposition is empty. Figure 1(c) shows the case of the 2D+t+2D scheme where pre-spatial decomposition is usually a multi-level dyadic wavelet transform. Depending on the results of pre-spatial decomposition, the temporal decomposition should perform different MCTF operations, either in spatial domain or in subband domain. (a) The general coding framework; (b) Case for the t+2D scheme (Pre-spatial decomposition is void); (c) Case for the 2D+t+2D scheme (Pre-spatial decomposition exists). Figure 1: Framework for 3-D wavelet video coding. A first classification of SVC schemes according to the order the spatial and temporal wavelet transform are performed was introduced in the first Scalable Video Models [1], [2] on the base of the Call for Proposals responses at Munich meeting. The so called t+2D schemes (one example is [3]) performs first an MCTF, producing temporal subband frames, then the spatial DWT is applied on each one of these frames. Alternatively, in a 2D+t scheme (one example is [4]), a spatial DWT is applied first to each video frame and then MCTF is made on spatial subbands. A third approach named 2D+t+2D uses a first stage DWT to produce reference video sequences at various resolutions; t+2D transforms are then performed on each resolution level of the obtained spatial pyramid. Each scheme has evidenced its pros and cons [5,6] in terms of coding performance. From a theoretical point of view, the critical aspects of the above SVC scheme mainly reside: i) in the coherence and trustworthiness of the motion estimation at various scales (especially for t+2D schemes); ii) in the difficulties to compensate for the shift-variant nature of the wavelet transform (especially for 2D+t schemes); iii) in the performance of inter-scale prediction (ISP) mechanisms (especially for 2D+t+2D schemes). An analysis of the differences between schemes is also reported in the sequel.
2006
File in questo prodotto:
File Dimensione Formato  
m12970-SMALL.pdf

accesso aperto

Descrizione: ISO/IEC JTC1/SC29/WG11 MPEG2006/m12970, 75th meeting, Jan. 2006, Bangkok, Thailand
Tipologia: Full Text
Licenza: Creative commons
Dimensione 1.43 MB
Formato Adobe PDF
1.43 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/10886
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact