Increased scalability of Erlang for improved telecommunications and internet server capability

Erlang is an open-source specialist programming language governing >90% of internet traffic and ~50% of landline phone calls globally. UofG's Professor Phil Trinder identified fundamental issues which constrained Erlang’s performance on multiple cores (processors) and hosts (machines), i.e. its scalability. He initiated, assembled and led an international academic/industrial consortium to address these issues. UofG research designed, developed and evaluated better language-level scalability for Erlang, creating the Scalable Distributed (SD) Erlang libraries. These allow Erlang to deliver fault-tolerant, scalable programmes (by organisations including Ericsson, WhatsApp and NHS Digital) for tens of thousands of applications in areas such as fintech, security, blockchain and Internet of Things.

Context and societal impact

Erlang is one of the world’s most widely used programming languages, designed to produce scalable, fault-tolerant, concurrent, distributed, non-stop, soft real-time applications. Around 2009, UofG’s Professor Trinder analysed the scaling issues associated with Erlang, which made it difficult to obtain good software performance on large multi-core computer architectures, e.g. servers. Professor Trinder initiated, established and led, as EU project coordinator, an international academic consortium (RELEASE http://www.release-project.eu/) to overcome these issues. The RELEASE project focused on addressing the increasingly critical scaling problems with Erlang, and hence addressed the widening gap between state of the art in hardware and software.

While leading the RELEASE consortium, UofG’s Dr Chechina and Professor Trinder designed, developed and evaluated better language-level scalability for Erlang, in the form of a new library: Scalable Distributed (SD) Erlang. The new technology is employed on servers and in conjunction with the Erlang Open Telecom Platform (OTP).

Following the development of SD Erlang, a UofG Knowledge Exchange (GKE) grant was awarded to Professor Trinder (2015–2017) to maintain the cyber presence of SD Erlang and RELEASE, e.g. website, ensuring it contained up to date information and could maximally benefit the programming community. This funding also allowed the maintenance and restructuring of the SD Erlang libraries. After discussions with industrial users of SD Erlang, it was noted that in the dynamic world of Erlang/OTP development with yearly major releases and quarterly minor releases, the SD Erlang users preferred up-to-date versions of SD Erlang to use in production, rather than well-tested but outdated versions. Therefore, the UofG team automated the rebasing of SD Erlang for new releases of Erlang/OTP. The primary activities undertaken to achieve this were:

  1. Rebasing SD Erlang manually twice to versions 19.2 and 19.3. The automated rebasement of SD Erlang will be available from the major Erlang/OTP release 20.
  2. Restructuring SD Erlang, separating some specific SD Erlang functions from global.erl.

Professor Trinder’s leadership of the RELEASE consortium has led to numerous key achievements including the development of WombatOAM, a proprietary software tool providing a scalable infrastructure for the deployment of Erlang/SD Erlang to thousands of computers.

Ericsson confirms that RELEASE: "made 10 important changes and architectural improvements….These changes mean that Erlang is more responsive and works better on NUMA [non-uniform memory access] machines. Such improvements benefit almost all Erlang users and the current download rate is approximately 50,000 per month".

The open-source Erlang language has users in major sectors including telecommunications, banking, online gaming and social networks. The broader economic impacts are diverse and, in some cases, hard to quantify. However, with over 40% of the world’s mobile traffic and over 50% of landline calls carried over Ericsson networks using Erlang, and 90% of all internet traffic going through routers and switches controlled by Erlang, the customer base served spans 180 countries across the globe. Maintaining these networks and ensuring they function optimally is critical to Ericsson’s economic success and that of the industries and people using Ericsson mobile networks.

As part of RELEASE, Trinder and Chechina designed, developed and evaluated better language-level scalability in the Scalable Distributed (SD) Erlang library, with first modifications released in December 2014. Overall, the project has produced, or contributed to, 8 new open-source software tools which enhance the scalability and function of apps and communication services around the world.

The Erlang improvements made in RELEASE are now part of the Erlang/OTP software platform that drives mobile communication and some of the world’s most famous large-scale apps. The improved Erlang/OTP has enabled:

  1. improved capacity to work at a larger scale – it is far easier to engineer software for hundreds and thousands of hosts in SD Erlang than in Erlang; and
  2. reduced software development times, reducing time to market in a plethora of extremely competitive industry sectors.

Erlang’s improved programming language technology is utilised by thousands of companies producing a wide diversity of applications. It has resulted in the production of successful apps in sectors as diverse as: communication (WhatsApp: 2 billion global Monthly Active Users [MAU]); gaming (Nintendo Switch); banking (Goldman Sachs, Mastercard) and personal finance (Klarna, 460,000 MAU in the UK alone); social networking (Grindr (3.8 million Daily Active Users [DAU]); real-time advertising (Adroll, Tapklik; bookmaking (Bet365); web servicing (Amazon Web Solutions; and national providers of information data and IT systems (NHS Digital).

The NHS Spine II system connects clinicians and patients to essential national healthcare services across the UK, providing the Electronic Prescription Service, Summary Care Record, the e-Referrals Service etc. It is used by 500,000 healthcare professionals daily and supports 28,000 IT healthcare systems across 21,000 organisations. It provides the central data storage and messaging systems for the NHS and uses two Erlang products: Riak and RabbitMQ. Riak is a leading internet-scale NoSQL database management system, while RabbitMQ provides a stable, well used, mature, open source, Erlang, queue implementation.

Spine is the backbone of the NHS Digital infrastructure. The transfer of Spine II to a system based in Erlang has been a success story. These products combine fault tolerant resilient behaviour with the scalability necessary to facilitate health and social care organisations and workers access patient records when they need, wherever they need, to improve patient care”.
NHS Digital

Riak and RabbitMQ were improved by the RELEASE project and have been central to enabling Spine II to scale to meet rising user demand. Spine II handles over 10 million transactions a day (with up to 600 transactions per second). October 2020 was their busiest month ever, with more than 1.2 billion messages processed. This has been achieved with 100% availability on key messaging services. The enhanced response times of the new Spine II save the NHS 800 working-hours per day compared to the out-sourced, Oracle-based Spine I. In the first 5 years of its operation Spine II has saved the service GBP150 million, as its open-source design negates the need for expensive licenses for bespoke hardware and software.

Erlang is also employed by Amazon servers in their business venture, Amazon Web Solutions (AWS), that allows users to build ‘sophisticated applications with increased flexibility, scalability and reliability’. Maximising their use of Erlang’s scalability, AWS have dominated the market with a 34% share, well ahead of Microsoft at 11% (2017).

Bet365, a UK-based bookmaker, credited the switch from Java to Erlang for their ability to deliver innovative, competitive technology to their digital customers  while also rapidly expanding their customer base (from 2 million to 35 million customers across the globe) without increasing the risk of failure.

"The standard Erlang release has a certain limitation in the way it organizes communication between nodes….SD Erlang…solves these issuesand that SD Erlang is ‘enabling Tapklik bidder to achieve better performance on the same resources and better scalability.’  
Tapklik

UofG continues to maintain the SD Erlang libraries to ensure that worldwide users of those apps employing Erlang on their servers continue to benefit from SD Erlang’s enhanced scalability. UofG expertise is also supporting industrial users of SD Erlang, such as TapKlik; this continued support has helped the company to gain investment due to the size of its international userbase, achieved as a result of the SD Erlang scalability enhancements. 

UofG continues to collaborate with the Ericsson Erlang/OTP team to assist them in applying scaling techniques at the Virtual Machine Level.