Wednesday, March 27, 2013

WebRTC, SDP and Fata Morgana

There were many discussions lately about the format of session description to be used for real time communications between web browsers. Apparently it was a decision (or at least ongoing work) of IETF to reuse SDP, same format used by other protocols, including SIP.

As usual following a decision, the most verbose begin to be those against, somehow natural, trying to get their last chance to change the things for something they like more.

For what it worth, I am working with SIP continuously for more than a decade in the Kamailio project, but mostly on signaling part for routing between peers. The real interaction with SDP was just to update media IP and port to help with NAT traversal, plus a set of functions to remove codecs or media streams from offerings. Session description document (which is transferred as payload of SIP messages) is none of a SIP signaling server business. Topped with the fact that in webrtc support for ICE is mandatory, therefore no more need for server side NAT traversal processing, it is barely any interaction from SIP server point of view.

Probably SDP is not the ideal format, but it is something deployed world wide, therefore implemented at large scale. It is claimed it got to a very complex structure, that may be true, as initial specs didn't took in consideration NAT at all, IETF striving for years to fix it afterwards, keeping patching formats and protocols.

However, I was not and I am not defending SDP in any way, particularly as it is not something I rely on. What I am against is to decide to go for something not known yet. I haven't seen any proper alternative proposal by those not liking SDP, just few ideas and sketches, more like JSONified or XMLized SDP. I don't think that just adding tags or curly braces around here and there will change dramatically the situation. The complexity level stays the same if that was the concern of using SDP.

Running after fata Morgana proved to be wrong in many past IETF activities. Let's remember how they tried to specify from scratch the presence and instant messaging extensions for SIP. A long and laborious process, trying to cover extreme corner cases, and the results we know very well, not easy at all to put in practice.

From interaction perspective, there are two major use cases for webrtc:
  • communications only within a service (aka, walled garden environment)
  • communications between different services (aka, peering/interconnect networks)
I am concerned only about the second. For the first case, the service provider should not even bother waiting for indications from IETF. The role of IETF is to define interoperable protocols that can connect different implementations from different providers, not to do house keeping for each cottage business.

Here is why I consider to be wrong deciding to start now and wait for specifying a brand new session description format:
  • deployments of webrtc services will be delayed indefinitely, there are parts of the market that want it now. I doubt one can set an accurate timeline for writing specs and do a clear estimate of the time to the market. Even so, it will take at least 2-3 years.
  • many good, practical oriented people will leave at some point - there will be lot of irrelevant discussions about xml vs. json vs. clear text vs. binary vs. whatsoever base format, then each attribute will have few rounds about the name, upper or lower cases, font color, etc. as well as the same for values. If practical people are gone, there are high chances that the result is yet another ideal theoretic specification, but hard to implement
  • many will want the front seat, so there will be many proposals in the first round, then terms to debate and decide on finalists for the world cup tournament to get the *one*, adding more and more delays to the entire process, with harsh fights not missing from the stage
  • no matter how 'perfect' is going to sound the new proposal at first sight, definitely expect many voices against. It will result again in frictions and adverse groups, practically back to current situation
IETF has to do the decisions based on current facts and shall not leave parts required for interoperability unspecified. Anything selected now does not stop working on new ideas, if anyone comes with a better alternative over the time, it will be adopted anyhow by the market.

Web environment offer huge opportunities for innovations, with the major components open source (browsers, servers, development languages and tools) and a non-regulated structure, so nothing has to be started from scratch nor wait for bureaucratic procedures.

Don't waste time looking only at the bad things about SDP, they are known, but there are many good things by reusing a deployed technology, avoided in presentations from dislikers. Get your hands dirty in C/C++ code or other programming language and extend the browsers with your ideal session description format, show the results to the world and how that makes everything easier and better for business and user experience, then the market will simply follow!

Let everyone work in quite and focus on their needs, those that want to deploy now by reusing existing technologies as much as possible, as well as those willing to come up with something different and eventually better for the benefits of everyone.

PS. It is not relevant for interconnect what is the API provided by the browser to upper layer of programming language/JavaScript. For interoperability is important that the format exchanged between different services follow a standard specification. A browser can simplify the presentation of the session description format sent to or received from the network.

PPS. The webrtc is not referring here strictly to a particular specification named the same, but more to the broader concept of real time communications over web technologies.


No comments:

Post a Comment