A Protocol Review for IPTV and WebTV Multimedia Delivery Systems

The growth of IP Networks for media delivery is increasing day by day. Cisco White Paper [1] and [2] provided interesting data that charts the increase IP Network use and the increase of Internet traffic dedicated to Video. Fig. 1 depicts the 2010 broadband traffic by application subcategory [1] and Fig. 2 shows the global consumer video 2010-2015 by category per month [2]. According to [2], projected data growth over IP Networks between 2010 and 2015 will be 55%, comprising a 32% increase of fixed Internet, and 24% of Managed IP and 92% mobile Internet. Internet video traffic in 2010 was 40% of total traffic although Cisco also forecast that this will reach 62% by 2015. Even geographical areas traditionally behind in Internet Video such as The Middle East and Africa will grow by 105%. Interesting data published in [1] details how Internet Video tripled in 2010 and will grow 17-times by 2015. In 2010 Internet Video watched on a TV set traffic was 7% of Internet video traffic, this is projected to increase to 16% by 2015 [2]. The same source predicts that Video-on-Demand (VoD) will triple by 2015. In 2011 VoD High Definition (HD) video will surpass Standard Definition (SD) and it is projected that 77% of VoD traffic will be HD by 2015.


Introduction
Multimedia traffic over IP Networks includes services such as Internet Video, Voice over IP (VoIP), IPTV, WebTV and Video Calling [1]. Such data transmission requires specialised protocols to accomplish the user quality requirements which for some services includes hard real-time delivery.
The growth of IP Networks for media delivery is increasing day by day. Cisco White Paper [1] and [2] provided interesting data that charts the increase IP Network use and the increase of Internet traffic dedicated to Video. Fig. 1 depicts the 2010 broadband traffic by application subcategory [1] and Fig. 2 shows the global consumer video 2010-2015 by category per month [2]. According to [2], projected data growth over IP Networks between 2010 and 2015 will be 55%, comprising a 32% increase of fixed Internet, and 24% of Managed IP and 92% mobile Internet. Internet video traffic in 2010 was 40% of total traffic although Cisco also forecast that this will reach 62% by 2015. Even geographical areas traditionally behind in Internet Video such as The Middle East and Africa will grow by 105%. Interesting data published in [1] details how Internet Video tripled in 2010 and will grow 17-times by 2015.
In 2010 Internet Video watched on a TV set traffic was 7% of Internet video traffic, this is projected to increase to 16% by 2015 [2]. The same source predicts that Video-on-Demand (VoD) will triple by 2015. In 2011 VoD High Definition (HD) video will surpass Standard Definition (SD) and it is projected that 77% of VoD traffic will be HD by 2015.
The growth of IP Network traffic and the increase in traffic dedicated to multimedia applications (both hard real-time and soft real-time) has led to major developments in protocols and technologies used for multimedia delivery. Multimedia transmission has differing characteristics to other data such as email, etc., such as high network load, real-time delivery and other quality performance criteria.
The delivery of multimedia over IP Networks is typically analysed by two parameters, QoS and Quality of Experience (QoE). The former is based in objective network analysis using metrics, such as delay, jitter and packet loss, whereas the latter is based on the user's subjective opinion about the final media play-out.
Real-time delivery is the main characteristic which distinguishes media delivery from others. Many Internet applications require lossless delivery of information via Internet, such as email where users want to receive the exact content sent by the sender. Multimedia delivery on the other hand is more tolerant of loss but requires that data is received within a time range; otherwise it is not valid. For example, for VoIP and video conferencing, M2E (Mouth to Ear) delays typically should not exceed a round trip time delay of 300msec. Often, relative arrival delays, rather absolute delays, are more important e.g. when we are watching a video over the Internet if the sound related to the image we are watching arrives much later it does not make sense to play it and thus it should be ignored. On the other hand it is better to receive an image of lower quality than one out of synch with audio.
There are two main factors that determine which protocols are used, the media platform and the Transport Protocol used underneath the Application Protocol.
Within the TCP/IP protocols stack, all layers have a big roll to play. Application Layer (AL) protocols are highly dependent on the Network and Transport Layer (NL/AL) protocol used. In this paper we examined protocols mostly at Application and Transport layers and how they are related. Media traffic can use IP unicast or multicast at the Network Layer and User Datagram Protocol (UDP) and Transmission Control Protocol (TCP) at the Transport Layer.
There are two main IP Network delivery platforms, IPTV and WebTV. Due to their different characteristics each one uses different application layer protocols. While IPTV uses RTP combined with RTSP, WebTV is mainly HTTP oriented. HTTP is not a media delivery protocol but the main protocols used in WebTV are based on HTTP media delivery.
The remainder of this paper is structured as follows. Section II depicts the media delivery platform, Section III describes the different media delivery systems such as downloading, progressive downloading, streaming and adaptive downloading. Section IV presents the IP, Transport and Application protocols related to media delivery. Section V briefly describes the testbed developed, and finally Section VI concludes the paper.

Media Platform
The media distribution over IP Networks can be performed via two different systems, WebTV and IPTV. Although both use IP they have different characteristics that affect the media delivery protocol used. The main differences are established by Maisonneave in [3] and they are depicted in Table I. An important fact from the user's perspective is their expectations based on cost. Users paying a monthly quota for e.g. for IPTV expect a high QoS and QoE and they expect to receive the services they pay for. Users using free WebTV will accept poorer quality media delivery.
The other aspect is the protection of the media delivery which guarantees media providers that the media will not be illegally  copied or distributed. Without network protection IPTV companies would never be able to buy the rights to distribute the media content which is the core of their business.
Both systems use IP delivery but the IP protocol used in each case is different. IPTV uses IP Multicast for video delivery, only employing IP unicast for VoD whereas WebTV mainly employs IP unicast protocols.

Media Delivery
There are a few different techniques to deliver and play a media file delivered from the source to the receiver. Different techniques range from downloading and progressive downloading to adaptive streaming and streaming. Downloading and streaming are the most extreme methods whereas progressive downloading and adaptive streaming are hybrid techniques which keep the advantages of downloading and streaming techniques while striving to avoid their drawbacks.
The transport protocol used in the media delivery is a key factor also. The most common transport protocols are TCP [4] and UDP [5]. TCP provides retransmission of lost-packets to guarantee integrity. However it does not assure on-time delivery, whereas UDP provides a low-overhead protocol to provide, quicker, albeit besteffort delivery with no loss-packet retransmission. Table II provides the main differences between both transport protocols revealing the main functionality of UDP to reduce packet delay and traffic overheat to benefit Real-Time delivery. UDP is typically used by RTP/RTCP, described in Section IV.A, for streaming media delivery. This contrasts with HTTP based protocols, such as HLS and MS-SSTR explained later, which use TCP for downloading, progressive downloading and adaptive streaming methods. Although RTMP uses TCP at Transport Layer it does not employ HTTP for the media delivery.
IPTV systems use streaming to deliver TV multicast services, although they can provide unicast VoD services via HTTP, whereas WebTV usually employs unicast progressive and adaptive HTTP streaming.

A. Downloading
The downloading method requires the download of the media file to the receiver prior to its display. The advantage is that once the file is downloaded at client-side the play-out is not interrupted. On the other hand the biggest drawback is the waiting time for the media file to be completely downloaded. The most popular protocol to download files is HTTP based in TCP. Its main drawback is the latency until the video play-out begins. As such it is mainly used for small videos with low quality where latency is minimised [5].

B. Progressive Downloading
HTTP Progressive download was developed to reduce the latency of initial play-out, while still preserving all the advantages of HTTP over TCP. HTTP Progressive Downloading is an optimised HTTP technique to stream media over TCP. Media play-out begins before the media file is completely downloaded to minimise the user's waiting time before being able to start playing the video.
Multiple web media providers such as YouTube use progressive downloading [6].

C. Streaming
The principal objective of streaming is the media delivery at real-time when the main purpose is the media delivery within a time threshold. The underlying protocol used is UDP because it provides a low overhead and minimises packet delay delivery. On the other hand this is also its main drawback as UDP is often blocked by firewalls and penalised by Internet providers.
Streaming benefits include minimising the buffer size at client, low latency, no need to store media file at client-side, efficient use of the bandwidth, and providing the means to analyse the media selection user's behaviour, thus making it the most appropriate delivery method for live video [6] [7].

D. Adaptive Streaming
HTTP adaptive streaming uses progressive downloading to deliver the same multimedia content adapted to each individual client. Once the media session has been initialized, on client request or based on network conditions, media can be adapted, for example, to different bit-rate or quality [7].

IP/Transport/Application Protocols
The Application protocol used for real-time media delivery depends on a range of factors, including the media platform, IPTV or WebTV, IP Protocols, IP unicast or multicast, and the Transport protocol, TCP or UDP. In IPTV the main protocol is RTP over UDP using IP multicast whereas WebTV utilizes HTTP based protocols using IP unicast.
IP multicast is widely used for IPTV service providers to deliver channels to large numbers of clients with IP unicast used for their VoD services. WebTV, although it can utilize IP multicast, is mainly delivered over IP unicast.
HTTP is used by many companies to develop their own HTTP based Application protocol. Apple is currently finalising the Internet-Draft HLS and Microsoft has created their own MS-SSTR. All of these protocols use HTTP progressive download techniques although they are clearly differentiated. Adobe Flash has developed RTMP which is also based on TCP but not based on HTTP.
Typically the relationship between protocols is dictated by their functions. RTP provides functionality similar to HTTP, both actually delivering the media data, and RTSP relates to RTMP, HLS and MS-SSTR because they organise the complete media session and manage information about the network conditions, the media, the server and the client.

A. RTP/RTCP/RTSP
The benefits of media transport using RTP have been widely studied; [8], [9] and [10] focused on MPEG-2 media streams whereas in [11] and [12] the benefits of the use of RTP on MPEG-4 media streams are explained.
RTSP and RTP/RTCP are mainly used in IPTV systems. RTSP in conjunction with RTP, and its companion RTCP, provide a protocol suite for real-time data delivery such as video and audio. RTP  [14] transports the media data, audio and video, while RTCP monitors the media delivery and RTSP controls the delivery of the real-time data.
RTP header provides three important fields for the real-time media delivery which are timestamp, sequence number and payload type (PT). The first has a different meaning depending of the payload type, the second helps to reorder the packets at client-side due to the fact that RTP, via UDP, does not provide ordered transport delivery. Finally, the payload type is also important as it indicates the media type conveyed within the RTP packet. In Fig. 3 the RTP packet header is depicted.
Once the RTP transport of real-time media streams is initiated, RTCP packets are sent between media session senders and receivers to monitor the media delivery. The media sender sends RTCP Sender Report (RTCP SR) packets and receivers send RTCP Receiver Report (RTCP RR) packets. Those RTCP report packets are different. In Fig. 4a the RTCP SR Packet Header shows important fields such as NTP timestamp, RTP timestamp whereas Fig. 4b shows the RTCP RR important fields such as accumulative number of packet lost, last SR and delay since last SR [13].
Fields such as inter-arrival jitter, delay since last SR and cumulative number of packet lost provide information to monitor the media transmission within a media session. The inter-arrival jitter provides the inter-arrival time variance of the RTP data packets, delay since last SR indicates the time in seconds between the reception of the last two RTCP sender packets, and finally cumulative number of packet lost conveys the quantity of packets lost from a source since the beginning of the media session [13].
RTSP maintains the state of the media session. Furthermore with the states and methods, RSTP provides the tools to provide remote control functions over IP Networks. The state is needed to relate a media stream to an RTSP request. States for both client and server are Init, Ready, Playing, and Recording. Not all methods provoke a change of state, options, announce, describe, get-parameter, and set-parameter, only provide information about the media session whereas setup, play, pause, teardown, redirect, and record invoke a change of state. Table III depicts the change of states and a description of the client and server states [14]. Fig. 5 depicts the RTSP communication process previous to the RTP media data transport, which includes RTCP packets interchange between client and server, and the media session finalisation. It is important to note that while RTP is transmitted using UDP, RTSP packets are usually sent using TCP to mitigate packet loss.
The process sends an RSTP Describe method to the server to require information about the media session. The server responds sending the relevant information via Session Description Protocol (SDP) [15], which is outside the scope of this paper. The client receives media information and sends to the server an RTSP Setup command which is responded by server with an RTSP OK message. After the initiation of the media session is finalised the client sends an RTSP Play and the server finally responds and begins the media transmission via RTP packets. During RTP transmission RTCP SS and RS packets are sent by the sender and receiver respectively. When the client desires to finish the media session sends an RTCP Teardown command to the server which accepts with an RTSP OK response.

B. RTMP
Adobe Flash Platform technology utilizes RTMP to stream audio, video and data over TCP. This method is designed to stream flash encapsulated media between server and client. It is used in the Internet to transmit both live Internet Radio and WebTV.
As an introduction to RTMP a few concepts such as message stream, chunk and chunk stream need to be clearly specified.
The message stream is defined in [16] as 'A logical channel of communication that allows the flows of messages'. Messages are fragmented and interleaved to send over the network. A message stream can be an audio, video or data stream.
A chunk is a message fragment that facilitates timestamp ordered delivery, from server to client, of the complete message. Also in [16], a chunk stream is defined as 'A logical channel of communication that allows flow of chunks in a particular direction'. Chunks are parts of the audio and video streams, called audio/ video messages streams, delivered to client and reconstructed into messages. Small messages can be sent within a unique chunk, while a message's partition into chunks is required when messages are bigger than the maximum allowed chunk size within the message.

R E V I E W
RTMP specification consists of the RTMP Chunk Stream Protocol and the RTMP Message Protocol. The former details the message format to convey message chunks and the initial handshake to establish the multimedia connection/session, whereas the latter describes the RTMP message formats, RTMP control and command messages.
Two main concepts are part of the RTMP, NetConnection and NetStream. The former creates a client-server connection whereas the latter represents the communication channel used to deliver audio, video and data streams. Commands are applied to both in a media session. NetConnection commands are connect, call, close, and createStream while NetStream commands are play, play2, deleteStream, closeStream, receiveAudio, receiveVideo, publish, seek, and pause.
RTMP first of all establishes a NetConnection via a handshake, initiating interchange of packages between client (packets C0, C1 and C2) and server (S0, S1, S2). Secondly the server provides the client with information about the media session such as Window acknowledge size, bandwidth, set chunk size and buffer length. The final step is to play the NetStream indicated in the transaction.
The message header has four fields Message Type (1 byte), Length (3 bytes), Timestamp (4 bytes) and Message Stream ID (3 bytes). This message header is used to transmit control and command messages as well as to initialise a stream message. The chunk header is more complex, being composed of four fields Basic Header, Chunk Message Header, Extended Timestamp. The complexity comes from three types of Basic Header plus four types of Chunk Message Header.
In Fig. 6 the communication workflow to access a live Internet Radio media is depicted. First the RTMP handshake between client and server is performed via transmission of C0, C1, C2, S0, S1, and S2. Then information about the netConnection and net-Stream is interchanged and finally the media data transmission is performed via RTMP chunks. Finally the connection client sends a pause command to the server which acknowledges the message.

C. HTTP
HTTP was initially conceived and designed to deliver static documents and consequently rate control was never provided and the underlying use of TCP, with retransmission of lost packets, causes variation in media delivery [17].
HTTP streaming has numerous advantages over RTP streaming mainly caused by the protocol used at the transport layer, TCP. The main differences from [6], [7] and [17] are depicted in Table  IV [22] HTTP mainly uses TCP as a transport protocol and is widely used in WebTV. It is a stateless application layer protocol, based on a connection between client and server to provide a communication channel. The basic communication unit is a message; the client sends to the server a HTTP request message and the server answers with the HTTP response message. HTTP also provides a list of methods applied to the message sent. It is important to note the possibility to transfer a message divided into chunks when needed [18].
One main difference between HTTP streaming and other HTTP transactions is that it does not terminate the response to the client by the server due to the constant request of media segments by the client. HTTP streaming uses persistent connection, Connection: Keep-Alive, to receive all media segment client requests.
HTTP based progressive download has been further developed into the latest HTTP adaptive streaming. There are multiple options among which are HLS developed by Apple and MS-SSTR by Microsoft. The differences between HTTP based protocols HLS and MS-SSTR are depicted in Table V. 1) HTTP Progressive Download: HTTP progressive download is a hybrid technique between HTTP download and RTP streaming. It downloads the media file in very small parts instead of downloading the entire media file before the client's play-out. To accomplish that, the server stores the media file in small segments, usually from 2sec to 20sec, and an index file with segment transmission, segment information and media resources. The process begins by sending the requested media resource index file to the client. Then streaming via HTTP begins with an HTTP request for the segments and the download. [7] An example of HTTP Progressive Download is HLS provided by Apple and described in subsequent section IV-D.
2) HTTP Adaptive Streaming: Adaptive streaming utilizes HTTP progressive download to accomplish streaming features with added functions such as adapting the media delivery to the client's end device or to Network conditions. Adaptive streaming exploits one of the HTTP characteristics which is to deliver a video/audio file fragmented into small segments or chunks. The media file is stored at the server in multiple small segments or in a unique file virtually fragmented into segments [19].
Segments or chunks are usually 2-4 seconds long and usually begin with an I-frame as well as following GOP boundaries to provide chunk independence from previous or posterior chunks to further facilitate the separated chunk decoding at client-side [19].
A server can store the different versions of the same media file and deliver to a client one version or the other. The same media file can be stored in different quality levels or different bit-rates. The segment stored system allows clients to change the version delivered during streaming, reacting to a client's petitions, the client's end device or to varying network conditions [7].
Research in adaptive streaming includes the delivery of realtime content such as sports [20]. The latest research in media delivery at mobile platforms uses adaptive streaming to adapt the media delivery to a device's battery consumption.

D. HLS
HLS is an IETF open standard Internet-Draft developed by Apple. Only the latest versions of QuickTime on IPhone 3.0 support the protocol although any HTTP server can use it to deliver the media data. [21] It follows all characteristics of HTTP Progressive Download, i.e., media delivery via HTTP of small fragments, chunks, and the initial media play-out before the complete media download. Finally the HTTP server stores the media source in different bit-rates and one or the other is chosen depending on client or network conditions. In Fig. 7  is depicted and the HLS communication process between server and client is described in Fig. 8.
The media container is restricted to MPEG-2 Transport (MP2T) streams. Although in theory there are no media codec restrictions, currently only AAC and MP3 for audio and H.264 for video are accepted. Files are stored at server-side in MP2T files. A single media stream is stored in multiple MP2T files, one for each fragment called media segment files. [22] The index file, also located at server stores the information to locate the MP2T media segment files required to play the entire media stream [21].
The index file follows M3U8 specification which is a play list specification extended from MP3 play list (M3U). [21] Every media is encoded in different bit-rates and has its own index file, as shown in Fig. 7. Index files contain metadata with information about the sequence in which to play the fragments related to a single media source, the location of the MP2T files containing the segments and the alternative media files available in case the request bit-rate is not available [21].
The client requests the media source index file from the server. As a result the server sends it to the client, as shown in Fig. 8. The client in possession of the index file is able to request the fragments in the right order, indicating their server location for the media stream play-out [21].

E. MS-SSTR
MS-SSTR is based on HTTP streaming and designed by Microsoft. It uses adaptive streaming to adapt the media delivery based on local bandwidth and client's CPU. Its main characteristics are its exclusive use of MP4 container, 14496-14 [23], and the use of two index files. First of all the media file uses MP4 format and is structured by splitting the media into smaller fragments within the file and secondly it uses two index files, the server and client manifest [7].
Both manifest index files are stored at server. The server index manifest file provides the server information about the media tracks, bit rates and files, whereas the client index manifest lists the tracks available to the client [7]. In Fig. 9 the ISO file format structure used by MS-SSTR is depicted.
The communication model is based in manifest and fragment request. Assuming an HTTP connection between client and server, the client's first step is to request the Client index manifest which is delivered by the server via the manifest response. After the positive reception of the manifest file, client begins the fragment request to server which sends back the fragment response. This process, fragment request and response, is repeated multiple times until the completed media delivery is terminated [24]. In Fig. 10 the HLS communication process between server and client is described.
The manifest index file is used by the client to request the media fragments based on the timing and bit-rate information. The server, considering network conditions, relates the fragment's request to the corresponding MP4 file where the fragment is extracted and sent as an independent file [7].
One of the main advantages of MPS-SSTR is the use of MP4 file format. The use of this media container facilitates smooth streaming. The media file is stored in an MP4 format which stores all fragments containing the media chunks to deliver in each fragment request.

Testbed Development
This protocol review provides a foundation for our experimental research work. We are developing a media synchronisation testbed to evaluate different synchronisation scenarios using media delivered by different methods. A related concept is Hybrid Broadband Broadcast TV (HbbTV) which provides users with IPTV, WebTV and Broadcast TV on a single device. One scenario being evaluated is to synchronise a TV channel delivered via IPTV and a Radio channel delivered via Internet. The TV channel is delivered using RTP and MP2T as indicated in [25]. Regarding the Internet Radio channel, the delivery method can vary. The first step has been to stream to the client via RTP over MP3 and synchronise this mp3 audio with the related video stream. In future work, other options deploying other delivery methods such as HTTP or RTMP will be deployed.
The integration of video and radio streams using RTP provides us with the simplest media delivery method. So far two approaches have been coded, audio substitution and audio addition. The audio substitution replaces the audio from the TV channel and the audio addition creates a new audio channel with the audio from the Internet Radio allowing the user to switch from the original audio and the new audio and vice versa.
We anticipate that such services will be of most potential in the live sporting domain. The media files used in our testbed are related to the same sport event, the Champions League Final FC Barcelona vs. Manchester United year 2011. The video is from SkySportsHD channel with English audio, in MP2T format. The Radio media file, in MP3 format, is the radio transmission of the same event from the Catalan National Radio Station Catalunya Radio.

Conclusion
Media traffic over IP Networks is constantly increasing and various protocols are used, each of them suited to particular scenarios. The choice of protocols depends on a range of factors, including, the media platform, IPTV or WebTV, the Transport protocol used, TCP or UDP, IP protocol multicast or unicast, and the server providers' technology.
In this paper we have reviewed the overall suite of media protocols, outlining how they work both individually and in tandem with others to deliver media streams. IPTV systems typically use RTP and RTSP protocols over UDP and the latest DVB-IPTV standards confirm the future use of these application protocols. Meanwhile WebTV preferences are moving towards HTTP over TCP as a media delivery. The main concern is the multiple HTTP server providers each of which uses their own technology. The principal differences between the protocols are listed in Table VI. A further complication arises as multiple vendors have developed their own protocols. RTMP, by Adobe Acrobat, HLS by Apple, and MS-SSTR by Microsoft are the main ones. We also briefly outlined our research testbed, which focuses on multimedia synchronisation challenges and opportunities.