Tuesday, August 10, 2010

Introducing JAIN SIP HA... or how you can replicate and failover your JAIN SIP application with no changes to your code.

History : (you can skip to the next section if you're not very fond of history :-))

In Mobicents, we use the great Open Source NIST SIP Stack (which is incidentally the Reference Implementation of the JAIN SIP specification as well) on both Mobicents Sip Servlets and Mobicents JAIN SLEE Resource Adaptor, when we started to work on High Availability we started replicating the full dialog to be able to fail over SIP Calls when one node in the cluster died. First, this was not very efficient since a lot the data structure could be recreated upon failover instead of replicating most of the data within the dialog. Then, for re-INVITE the dialog was not found since it was injected back into the stack later by the application layer (in this case Mobicents) instead of checking if the dialog id was present into the stack or replicated cache during the message processing done by the stack. We first choose this solution because we didn't want to introduce any external dependencies into the NIST SIP Stack for people not interested in HA and minimize code changed in the NIST SIP Stack. Since we hit the above mentioned issues we decided it was time to create our own extension to the NIST SIP Stack allowing HA and failover in a non intrusive way (i.e. without any code change in your application, everything is done under the cover transparently by configuration) so that it benefits not only the Mobicents Community but the whole NIST SIP community. Ok enough history, let's get our hands dirty and see how it works, shall we ?

How does it work ?

It currently supports only established dialog replication & failover. So the replication occurs only when the dialog goes to CONFIRMED state. There is also a mode where everytime the dialog application data (data set by the application on the dialog used to link the call state of this dialog to other state, SIP or otherwise in the system. For example this method could be used by a B2BUA to link the dialogs of the ingress and egress sides) is updated, it replicates the changes. There is some other modes that can be added like early dialog replication and transaction replication but we didn't see the value in that and will be implemented only on a strong use case (one could be handling 911 calls where one doesn't want emergency call setup to fail at all) because it will have a lot of overhead and will decrease the performance a lot.

  1. UAS modeSo if a node fails after the dialog goes to CONFIRMED, when a subsequent request will hit another node, the Mobicents NIST SIP HA Stack will check first if it has the dialog locally and if not, it will check the cache, get the dialog data from the cache, recreate the dialog and add it to the local stack and handle the subsequent request without any problem
  2. UAC modeIn UAC mode, it needs a bit more work and modifications at the application code level actually. So I lied but it was to get you interested ;-)So if a node fails after the dialog goes to CONFIRMED, and an external event makes the failover node to create a subsequent request on the dialog that was originally present on the node that crashed. You actually need to know the Dialog Id of the node to get it from the local stack through
    ((ClusteredSipStack)sipProvider.getSipStack()).getDialog(dialogId);
    the Mobicents NIST SIP HA Stack will check first if it has the dialog locally and if not, it will check the cache, get the dialog data from the cache, recreate the dialog and add it to the local stack and return it. Then you will be able to create the subsequent request and proceed normally. Please not that you can cache the dialogIds in the same JBoss Cache instance used by the Mobicents NIST SIP HA Stack. To access it just use :((MobicentsSipCache)((ClusteredSipStack)sipProvider.getSipStack()).getSipCache()).getMobicentsCache().getJBossCache()
Please note that our NIST SIP HA layer is based on JBoss technologies (JBoss CacheJGroups and Mobicents Cluster framework) but has an abstract layer so that it can be extended to use other technology such as infinispan, terracota or whatever fits your architecture.

No code change in my application ? You gotta be kiddin' me ?

That's right, as seen above everything is handled automagically by the Mobicents NIST SIP HA layer the only thing needed is a bit more configuration and adding libraries to your application classpath. This presuppose the configuration of the stack is not hard coded in the application but externalized in a properties file or some other way.

But I lied a bit, there is a code change needed, you have to tell your application to use the Mobicents NIST SIP HA Stack (which is just an extension of the regular NIST SIP Stack) :

sipFactory.setPathName("org.mobicents.ha");

But this can be avoided if the path name is looked up from the configuration properties passed to the application as said above.

So if you want to use it, you basically need to set up 2 configuration properties :

#STACK PATH NAME
javax.sip.stack.PATH_NAME=org.mobicents.ha
# whether or not the cache should be standalone or looked up from the JBoss AS if the jain sip stack is running in a JBoss container
org.mobicents.ha.javax.sip.cache.MobicentsSipCache.standalone=true

The rest of the properties are optional :
#org.mobicents.ha.javax.sip.cache.MobicentsSipCache.cacheName=standard-session-cache
# path to the configuration file of jboss cache, defaults to META-INF/cache-configuration.xml
#org.mobicents.ha.javax.sip.JBOSS_CACHE_CONFIG_PATH=META-INF/cache-configuration.xml
# Replication strategy one of ConfirmedDialog or ConfirmedDialogNoApplicationData, defaults to the latter
#org.mobicents.ha.javax.sip.REPLICATION_STRATEGY=ConfirmedDialogNoApplicationData
#the class name of the class responsible for replicating the dialog etc, this allows to plug your own replication implementation (such as one based on Terracota), defaults to JBossCache 3
#org.mobicents.ha.javax.sip.CACHE_CLASS_NAME=org.mobicents.ha.javax.sip.cache.ManagedMobicentsSipCache

The libraries needed to be added to your classpath are the following :


or the following dependencies if you use Maven

<dependency>
<groupId>org.mobicents.ha.javax.sip</groupId>
<artifactId>mobicents-jain-sip-ha-core</artifactId>
<version>0.11</version>
</dependency>

<dependency>
<groupId>org.mobicents.ha.javax.sip</groupId>
<artifactId>mobicents-jain-sip-jboss5</artifactId>
<version>0.11</version>
</dependency>

<dependency>
<groupId>org.mobicents.cluster</groupId>
<artifactId>cache</artifactId>
<version>1.6</version>
</dependency>


There is quite a few jboss libraries that shouldn't be needed in reality, this is because mobicents-jain-sip-jboss5 have an option to integrate directly with the JBoss AS5 Cache Manager.

You will need to include the jboss cache configuration file cache-configuration.xml in a META-INF folder as well. And that's it !

Where can I find an example to play with it that I check it out and start to play with it?

We developped a sample application for you to try and have fun with in a form of a maven project that includes a junit test case showcasing the HA and Dialog failover.
You have 2 choices here : 

It's available directly on github at http://github.com/deruelle/nist-sip-ha-test
or as a packaged download here http://mobicents.googlecode.com/files/nist-sip-ha-test.zip

To run it just do mvn test and this will be it.

This test aims to test Mobicents NIST SIP HA Dialog failover recovery in UAS mode.
 * There is Shootist on port 5060 that acts as a UAC and shoots at a stateless proxy on port 5050 (scaled down version of a balancer)
 * There is Balancer which is a very simple stateless proxy that proxies the requests from the UAC to the first UAS node (Shootme) on port 5070 that will reply with 180 Ringing and 200 OK
 * The dialog state is updated to CONFIRMED and triggers the replication to JBoss Cache
 * on ACK, the first UAS node stops itself.
 * The UAC sends a BYE that the stateless proxy forwards to the second UAS node (shootme_recovery) on port 5080
 * Shootme recovery on BYE gets the dialog from the cache and recreates the dialog locally based
 * Shootme recovery handles the BYE and sends OK to BYE without any issue.

That's very cool, if I want to go further but where can I find a cheap and powerful SIP Load Balancer for my cluster to balance my SIP load and ensure failover ?

That's easy, Mobicents provides such a SIP Load Balancer

You will need to add 2 new properties to your stack so that the load balancers is pinged automatically by the NIST SIP HA Stack when a stack starts up or dies :

# implementation used to ping the Mobicents SIP Load Balancer
org.mobicents.ha.javax.sip.LoadBalancerHeartBeatingServiceClassName=org.mobicents.ha.javax.sip.LoadBalancerHeartBeatingServiceImpl
# the IP Address of the Mobicents SIP Load Balancer to send keepalives to
org.mobicents.ha.javax.sip.BALANCERS=127.0.0.1