[Standard]/ITU-T

MPEG-4 BIFS white paper

하늘을닮은호수M 2007. 2. 1. 16:07
반응형

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC 1/SC 29/WG 11
CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC 1/SC 29/WG 11N7608

October 2005, Nice

Title

MPEG-4 BIFS white paper

Source

Systems

Status

Proposal

Editors

Alexandre Cotarmanac'h (France Telecom)
Renaud Cazoulat (France Telecom)
Yuval Fisher (Envivio)

Introduction

BIFS[1] is the MPEG-4's scene description language, designed for representing, delivering and rendering interactive and streamable rich-media services (including audio, video, 2D & 3D graphics).

Background

The BIFS specification has been designed to allow for the efficient representation of dynamic and interactive presentations, comprising 2D & 3D[2] graphics, images, text and audiovisual material. The representation of such a presentation includes the description of the spatial and temporal organization of the different scene components as well as user-interaction and animations.

The main features of MPEG-4 BIFS are the following:

Seamless embedding of audio/video content. MPEG-4 BIFS allows integration and control of different audio/video objects seamlessly in a scene.

Rich set of 2D/3D graphical constructs: MPEG-4 BIFS provides a rich set of graphical constructs which enable 2D and 3D graphics. BIFS also provides tools that enable easy authoring of complex Face and Body Animation, tools for 3D mesh encoding, and representation of 2D and 3D natural and synthetic sound models.

Local and Remote Interactivity: BIFS defines elements that can interact with the client-side scene as well as with remote servers. Interactive elements allow for text input, mouse events, and other input devices that can trigger a variety of behaviors.

Local and Remote Animations: Scene properties, such as object positions, colors, and even shapes, etc., can be animated using either predefined scene descriptions or via streams sent from a server.

Reuse of Content: MPEG-4 scenes can contain references to streamed sub-scenes. That means that content can easily be reused, a powerful way to create a very rich user experience from relatively simple building blocks.

Scripted Behavior: MPEG-4 scenes can have two types of scripted behavior. A Java API can control and manipulate the scene graph, as well as built-in ECMA script (javascript) support that can be used to create complex behaviors, animations, and interactivity.

Streamable scene-description: the spatial and temporal graphic layout is carried in a BIFS-Command stream. Such a stream operates on the scene-graph through commands which replace, delete and insert elements in the scene-graph. .

Accurate synchronization. Audio/visual content can be tightly synchronized with other A/V content, client-side, and server-driven scene animation, thanks to the underlying MPEG-4 Systems layer.

Compression: the scene description is binarized and compressed in an efficient way.

BIFS Design

MPEG-4 Systems follows an object-oriented and a stream-based design. All presentations are described in a scene-graph, which is a hierarchical representation of audio, video and graphical objects, each represented by a (BIFS) node abstracting the interfaces to those objects. This allows manipulation of an object’s properties, independent of the object media. For example, a scene can define mechanisms to scale and animate the position of an image object, while the actual image data – a property of the object - is defined dynamically at connection time.

The benefits of such a design are many. First, it makes authoring easier. The scene-graph structure allows high level description of the presentation and makes the coding of media independent (video, images, audio, etc…). Second, it allows for extending or conversely sub-setting the elements needed for applications in a particular market. Consequently a terminal may only understand a subset of BIFS nodes. Subsets of available nodes are called profiles. For instance, it is possible to use 2D-only profiles. Third, it gives a well-defined framework for building up interaction between elements and dealing with user-input. In effect, user input is also abstracted as a BIFS node, e.g. TouchSensor and InputSensor which can be connected to other nodes in order to express complex behaviors, such as for instance, when the user clicks on this button, stop the video, etc…

In MPEG-4, every object is tightly coupled with a stream: such binding is made by the means of the Object Descriptor Framework which links an object to an actual stream. This design seems obvious for video objects that rely on a compressed video stream. It has been pushed a bit further: the scene description and the description of object descriptors are themselves streams. In other words, the presentation itself is a stream which updates the scene graph and relies on a dynamic set of descriptors, which allow referencing the actual media streams.

These design principles can be summarized in the following figure, which gives a visualization of a scene.

Target applications

MPEG-4 BIFS is particularly suited for some applications and is currently used in several market sectors. Below we list a subset of BIFS features and the types of products that these features enable well. Table 1 lists applications and their characteristics. The figures below give an overview of content made possible through BIFS

Application

Size/Bandwidth

Profile

Tiny 2D animation (MMS)

1 KB

Simple2D

Subtitles

~1 kb/s

Core2D

Karaoke

~3kb/s

Main2D

Interactive Multimedia Portals

100 KB / ~20 kb/s

Advanced2D

3D cartoon

150 KB / ~20kb/s

3D profile

2D games

100-500KB

Advanced 2D

Table 1. Application types, their size and the corresponding MPEG-4 Systems profile.

Feature: Integration and Synchronization of Multiple Streams

Corporate Presentation/Education broadcasts benefit from the ability to integrate multiple synchronized streams in an interactive presentation that includes zooming, picture-in-picture, broadcast (in the live case), chapter marks for random access and trick play (in the on-demand case).

Feature: Server Push of Scene Updates

In client-side middleware, such as the kind used for IPTV deployments, the ability to modify a client-side scene from the server is useful for sending meta data (Electronic Program Guide, Video on Demand library, etc.) and updating services (e.g., adding personalized pages).

Figure 3 Origami Mobile Portal (France Telecom) with EPG, Online Weather, Video and Games

Feature: Rich Scene Description

MPEG-4 BIFS’s ability to represent complex scenes can be used for applications ranging from e-commerce to entertainment.

Figure 4 Envivio TV Lounge with interactive shopping cart and video.

Figure 6 2D Cartoon (ENST)

References

[1] ISO/IEC 14496-11, Coding of audio-visual objects, Part 11: Scene description and Application engine (BIFS, XMT, MPEG-J)

[2] ISO/IEC 14496-20, Coding of audio-visual objects, Part 20: Lightweight Scene Representation (Laser)

반응형