Avatar

Would you believe me if I say that, using just a few lines of JavaScript and HTML5 you could transform the Photo Booth app (available on Mac OSX) into a cool web based application, or overlay real-time audio and video onto your favorite WebGL based 3D game canvas, or build a plugin-less version of WebEx?

Through this blog, I attempt  to take you on a journey into the latest disruptive Web Standard called WebRTC. My goal in writing this blog, is to provide readers with some background information and dive a bit deeper into what WebRTC has to offer from the standards, and application developer perspective.

Before I jump in, let me introduce Cisco’s WebRTC  crew  –
Cullen Jennings, Ethan Hugg, Enda Mannion, Suhas Nandakumar (that’s me :)).

Background

The Web is evolving at a pace faster than ever before. The last few years has seen tremendous innovations in the Web Technologies, Applications, Infrastructure and Services. The advent of HTML5 has redefined the way Web Applications work by bringing in the capabilities & richness of native applications to the Web platform.

HTML5 technologies such as Web Workers, Browser-Native Media, Web Sockets and the like are redefining the roles and capabilities of the browser and the Web, and  creating experiences that rival native applications.

Building along similar lines, is the introduction of WebRTC/RTCWeb technological standards into the HTML5 standards basket, which is concerned with bringing rich real-time, interactive communications natively to the browsers.

Real-time communications applications like softphones, conferencing applications are not new to the Web. Applications such as WebEx, JabberWeb and Skype already provide ways for people to communicate and collaborate on the Web today.
These applications do come with limitations however:

  1.  Need to install plugin to get things working.
  2.  With plugins comes the challenges of compatibility on the host platform.
  3.  With plugins comes along security issues, since the application no longer runs in the sandboxed environment of the browser.
  4.  With plugins comes the issue of maintenance. Newer versions of the browser or standards might break existing installations.
  5.  Applications based on plugins lack the rich flexibility of native browser resources due to privilege restrictions. This in turn limits the innovation possible with these applications.
  6.  Proprietary solutions brings in interoperability issues.

WebRTC – What is it ?

RTCWeb (Owned by IETF) and the WebRTC (Owned by W3C) standards is an evolving proposal to bringing  the “Rich Interactive Secure Peer to Peer Communications” to the Web in a “Plugin-less Fashion”. These standard bodies together are responsible for defining the following aspects for enabling real-time communications as inherent part of the web infrastructure.

  1.  APIs and access rules for end-user devices such as microphones, cameras etc.
  2.  End-to-end security architecture and protocol.
  3.  NAT traversal techniques for peer connectivity.
  4.  Signaling mechanisms for setting up, updating and tearing down the sessions.
  5.  Support for different media types.
  6.  Media transport requirements.
  7.  Quality of Service, congestion control and reliability requirements for the session over the Best-Effort Internet.
  8.  Identity architecture and mechanisms for peer identification.
  9.  Codecs for audio and video compression.
  10.  Last but not the least, HTML and JavaScript APIs enabling application developers.

With such a detailed charter, WebRTC/RTCWeb has the potential to impact the way people communicate on the web. With the tremendous increase in the usage of browsers and always available nature of the Web, the combination of  “Browser and the Web” revolutionizes real-time communications on one end and possibly poses potential challenges to legacy/traditional solutions of today.

The picture below captures various outcomes from the IETF and W3C standard bodies:

  1. GetUserMedia API specification defines requirements for a Web application to access end-users media sources such as camera, microphone
  2. PeerConnection API specifies SDP-based session description APIs and the state machine to session setup, update and tear-down between the peers.
  3. Data Channel API  will enable peer-to-peer exchange of arbitrary data, with low latency and high throughput.
  4. Under the hood, the browser is responsible for:
  •  Ensuring end-to-end security for media and data sessions via DTLS.
  • Performing NAT traversals procedures for connection setup based on Interactive Connection Establishment (ICE).
  • Establishing media transport based on RTP and UDP.
  • Setting up data-channel transport based on SCTP and UDP
  • Enabling feedback reports for the session based on RTCP.
  • Encoding and decoding audio and video streams.

These requirements may evolve over time till all the aspects of the standards are frozen.

Use-Cases and Architecture Preview


Shifting gears, let me introduce few sample use-cases that are quite easily achievable with WebRTC

1.  Seamless Conferencing:

This use case represents a Web-Conferencing scenario built with lightweight HTML5 components and WebRTC APIs with no plug-in installation. Such an application can allow plug-n-play of components such as chat, file transfer, screen share with few lines of Javascript code.

 

 

2.  Personal Shopper/Instant Customer Care:

This use-case captures  consumer-to-business scenario where a web application provider like Amazon, might provide “Click to Call” service to their customers with few WebRTC APIs. Such a service would enable converting a   mundane search into rich 2-way audio and video interaction with the customer care representative thus implying higher transaction conversion ratios.

 

 

 

3. Multimedia based Rich 3D Games:

This scenario enables audio, video, and data streams into gaming environments with WebRTC, HTML5 and WebGL APIs. Such an combination provides options for combining real-time media with WebGL canvas innovatively

 

 

Architecturally, a WebRTC based system falls into following broad categories

                      
Browser <-> Browser                                                        Browser <-> VOIP End Point

The web server can be any application server that provides required identity and authentication procedures for the end-users at the minimum.

In either case secure media flows directly between the peer. In the VoIP scenario an intermediate gateway setup is required to handle signaling  and any required translations depending upon the capability limitations by the VoIP endpoint. This might include things like, “unable to perform ICE check”, “no support for secure RTP”, and so on. A detailed analysis of architectural solutions and potential differences between these systems are out of the scope of this blog.

Cisco’s Involvement with the RTCWeb

Cisco has been actively participating in standards development and the implementation.
With respect to standards participation, Cisco has taken leadership roles in help shape the requirements from both the IETF and W3C perspective.

At the IETF, a working group (WG) called RTCWeb is been responsible for driving “on-the-wire” standards. Cullen Jennings from Cisco is Co-Chairing this WG. He is also Co-Author on  the W3C specification that is responsible for defining the browser API requirements. Aside from these, there has been lot of thought leadership established across several areas of standards such as QoS, codecs, API development, and signaling.

On the implementation front, Cisco open-sourced its VoIP code-base from our soft-phones with the following components

  1. RFC3261 Compliant SIP stack.
  2. RFC4566 Compliant SDP Engine
  3. Call Control Application Logic for Soft-phone Application.

The open source project can be found as a GitHub project under  the name Ikran.
The Cisco team is working with the Mozilla for joint implementation of WebRTC standards into Firefox. For this purposes the components (2) and (3) from Ikran are being reused for implementing session control  and session description aspects of the PeerConnection object.

WebRTC in Action – Getting Hands Dirty

It’s time to get hand’s dirty and try few demos in action. WebRTC for desktop is now in Firefox Nightly and also in Firefox Aurora releases. The difference being, Nightly versions has the latest and hottest up-to-date fixes while Aurora being pre-beta build is a slightly older but a stabler version.
For the purpose of this blog, let us consider using Firefox Aurora build, the setup instructions below applies for Firefox Nightly as well.
The demo page enables one to try out following aspects
– GetUserMedia based audio capture, video capture and picture snapshot.
– PeerConnection based 2-way audio/video call.
– DataChannel based session.

Let’s get started ….
Step1: Getting Firefox Aurora

Step2: Configure Aurora
Currently the code is behind preference setting.  To enable the WebRTC code, browse to “about:config” and do the following
2a. Set media.navigator.enabled to true to enable calls to GetUserMedia only.
2b. Set media.navigator.permission.disabled to true to automatically gain permission to access camera/microphone
2c. Set media.peerconnection.enabled to true to enable PeerConnection functionality

Step3: Running the demos.
On your Aurora build, browse to WebRTC Demo Page and try out the demos listed above.

Interested in Learning More ?

1. Cullen Jennings has provided a detailed explanation about everything here.
2. Justin Uberti, from Google explains WebRTC implementation in Chrome here
3. IETF Standards Page
4. W3C Standards Page
5. Mozilla Wiki and Blog Pages

If interested, I am more than happy to discuss further with anyone who wants to hear the gory details.

Thanks for reading and enjoy the WebRTC revolution.



Authors

Suhas Nandakumar

Senior Software Engineer

Collaboration Technologies Group