High availability

Overview

MiaRec implements a redundant, high availability architecture.

Below diagram show a network design of redundant recording in BroadWorks environment. Similar design applies to Cisco Built-in-Bridge recording interface, SIPREC recording interface for Metaswitch CFS, Metaswitch Perimeta SBC, Avaya SBC, Oracle/AcmePacket SBC.

Broadworks SIPREC redundant recorder

Supported features

  • Automatic fail over to the next available server in a cluster
  • Load balancing of recording traffic between multiple servers
  • More than 2 master servers in a cluster
  • Geographical redundancy
  • Replication of data may be continuous (immediately upon call completion) or by schedule (at night during low load hours).

How it works

A MiaRec cluster supports 2 and more servers. Any server in a cluster may receive recordings at any time. Upon call completion, audio files and call metadata is automatically uploaded/synchronized to other servers in a cluster.

This document describes implementation of redundancy for BroadWorks SIPREC and Cisco SIP Trunk built-in-bridge recording methods. Implementation of recording interface for these two platforms is based on similar principles with some variations.

Redundancy - new recordings

At the beginning of call recording, the phone system (Broadworks / Cisco UCM) sends SIP INVITE to the first available server in a cluster. If the primary server is down or its network is disconnected, it cannot respond to the SIP invitation. The usual SIP processing in this case is to deliver the invitation to the next recording server in the preference list.

Redundancy - in-progress recordings

If a recording server fails, all active recordings will be interrupted. If failure was caused by issues with network, then call recordings will be completed automatically by timeout (configurable). If failure was caused by hardware/software issue with recording process, then such recordings will remain in ACTIVE state till administrator manually mark them as completed. In both cases, the recording data will contain media from the beginning of call till the failure moment (unless there is issue with disk system).

MiaRec supports advanced architecture in order to achieve fault-tolerant architecture for in-progress calls. This architecture involves a dedicated recording server, which is configured in passive recording mode. Currently it is tested only for Cisco BiB protocol, but may work for SIPREC protocol with other phone platforms as well. The Cisco BiB network traffic, which is sent to the primary recording server, should be mirrored to a redundant server, which works in passive recording mode. This server records a copy of each call that is captured by the primary server. In case of the primary server failure in a middle of call, the redundant server has ability to continue recording of such call till the call disconnect. Such mechanism is based on architecture of Cisco Built-in-Bridge mechanism. Once media forking is activated, Cisco IP phone continues to send RTP packets to the primary recorder even if the latter is not reachable anymore. The phone doesn’t stop sending of RTP packets even if it receives “port is unreachable” ICMP error message. The redundant server continues to capture such RTP packets till call completes. This allows to achieve 100% redundancy for call recording.

Redundancy - completed recordings

After a recording is complete, MiaRec adds the call recording into queue for automatic replication to other server(s) in a cluster. Such data replication may be started immediately upon call completion or scheduled to specific time of day (for example, at night).

Geographical redundancy

MiaRec servers in a cluster may reside in different datacenter for geographical redundancy. There is no requirement for minimum latency between servers. It is only required that bandwidth between datacenters is enough to process data replication.

Data replication may configured as continuous (immediately upon call completion) or by schedule at specific time (for example, at night during low load hours).

Although there is no requirement to the 100% of availability of network link between datacenters. In case of unavailability of the target replication server, the replication process will be retried when network connection is restored.

The source replication server uses queue for data replication. The call recording is removed from queue only after successful replication. Overhead on queue is insignificant (it uses only a hundred of bytes per call recording in replication queue).

High availability for BroadWorks SIPREC recording

High availability and automatic failover for SIPREC interface is based on two technologies:

How it works

BroadWorks platform supports DNS SRV records for SIPREC interface. This allows building of the following configurations:

  • Multiple recording servers and split SIPREC traffic between them (load balancing)
  • Multiple recording servers with automatic failover from a primary server to a secondary one.
  • A combination of above two variants.

MiaRec supports automatic call replication between two or more recording servers. Audio file and call metadata is automatically uploaded to replication target server(s) upon call completion or by schedule (for example, at night).

Broadworks SIPREC redundant recorder

Example of DNS SVR records

# _service._proto.name.  TTL    class   SRV   priority  weight    port   target.
_sip._tcp.example.com.   86400  IN      SRV   10        40        5060   miarec1.example.com.
_sip._tcp.example.com.   86400  IN      SRV   10        30        5060   miarec2.example.com.
_sip._tcp.example.com.   86400  IN      SRV   10        30        5060   miarec3.example.com.
_sip._tcp.example.com.   86400  IN      SRV   20        0         5060   miarec4.example.com.

The first three records share a priority of 10, so the weight field’s value will be used by BroadWorks to determine which recording server to contact. The sum of all three values is 100, so “miarec1” will be used 40% of the time. The remaining two hosts “miarec2” and “miarec3” will be used for 30% of requests each. If “miarec1” is unavailable, these two remaining servers will share the load equally, since they will each be selected 50% of the time.

If all three servers with a priority of 10 are unavailable, the records with the next lowest priority value will be chosen, which is “miarec4”. This might be a machine in another physical location, presumably not vulnerable to anything that would cause the first three servers to become unavailable.

Limitations:

  • The load balancing provided by SRV records is inherently limited, since the information is essentially static. Current load of servers is not taken into account.
  • In case of failover from one server to another, the currently active recordings on the failed server are interrupted. A new recording server will handle only new SIPREC requests.

Check also: MiaRec automatic replication

High availability for Cisco Built-in-bridge recording

High availability and automatic failover for Cisco active recording interface is based on the following technologies:

  • MiaRec automatic replication between multiple servers in a cluster
  • Multiple SIP Trunks or DNS SRV for automatic failover and/or load balancing
  • SIP OPTIONS Ping feature in Cisco UCM for fast detection of server unavailability

How it works

Cisco Built-in-Bridge redundant recorder

The recording server in Cisco UCM is configured as a SIP Trunk. Cisco UCM supports configuration of multiple SIP Trunks with automatic failover between them.

Additionally, Cisco UCM starting from v.8.5(1) supports SIP OPTIONS Ping feature. Cisco UCM periodically sends a SIP OPTIONS (ping) message to each recording server to detect its availability. If the recording server is unavailable – indicated by either no response, response of “408 Request Timeout” response of “503 Service Unavailable”, Cisco UCM marks this recording server as unavailable. If the recording server is available – indicated by any other responses other than “503” or “408”, Cisco UCM marks this recording server as available. Cisco UCM will send new INVITE only to “available” recording servers.

MiaRec supports automatic call replication between two or more recording servers. Audio file and call metadata is automatically uploaded to replication target server(s) upon call completion or by schedule (for example, at night).

Alternatively, instead of configuring multiple SIP Trunks in Cisco UCM it is possible to configure a single SIP Trunk pointing to DNS SRV records. The multiple recording servers are configured as SRV records. Such configuration allows to build automatic failover and load balancing configurations with multiple recording servers.

Example of DNS SRV records:

# _service._proto.name. TTL class SRV priority weight port target.
_sip._tcp.example.com. 86400 IN SRV 10 40 5060 miarec1.example.com.
_sip._tcp.example.com. 86400 IN SRV 10 30 5060 miarec2.example.com.
_sip._tcp.example.com. 86400 IN SRV 10 30 5060 miarec3.example.com.
_sip._tcp.example.com. 86400 IN SRV 20 0  5060 miarec4.example.com.

The first three records share a priority of 10, so the weight field’s value will be used by Cisco UCM to determine which recording server to contact. The sum of all three values is 100, so “miarec1” will be used 40% of the time. The remaining two hosts “miarec2” and “miarec3” will be used for 30% of requests each. If “miarec1” is unavailable, these two remaining servers will share the load equally, since they will each be selected 50% of the time.

If all three servers with priority 10 are unavailable, the records with the next lowest priority value will be chosen, which is “miarec4”. This might me a machine in another physical location, presumably not vulnerable to anything that would cause the first three servers to become unavailable.

Limitations:

  • Load balancing provided by SRV records is inherently limited, since the information is essentially static. Current load of servers is not taken into account.
  • In case of failover from one server to another, the currently active recordings on a failed server are interrupted. The new recording server will handle only new SIP requests.

Check also: MiaRec automatic replication