Scalability

Overview

We invested a lot of time in thinking about how to create a highly scalable system. We achieved it with a slim architecture and some interesting design concepts. The diagram below shows the basic architecture. It can be installed on any environment meeting our requirements but we have optimized it for hosting on Amazon web services. But as we don't want to bind ourselves to Amazon web services we have already experimented replacing Lucene Index with Google Big Table and run Halvr on the Google AppEngine environment.


Load Balancer

For every distributed system you need load balancing up front. Currently we achieve this with Apache web server and HAProxy module installed. But we are investigating the Amazon web services "Elastic Load Balancing" and "Auto Scaling". This allows us to remove the Apache layer and results in a much more flexible and scalable system.

Halvr Instances

Most of the Halvr development team have already created many distributed web applications. We know simply creating an application and distributing it across several server instances is not straightforward. So we implemented several concepts in the application layer to test solutions out. The most interesting are:
  • Sessionless application
  • Efficient and distributed read/write locking on Lucene index files
  • Application performance and distributed caching

Sessionless Application

A web application uses the user session to store user related data like the user object. This is pretty comfortable for an application developer but makes the infrastructure more complex. Usually this is solved using session stickiness (each user request is routed to the same server instance by the load balancer) or session replication. Both solutions have their disadvantages.

We analysed our application and identified that only a few attributes (2 or 3 values) are absolutely required. Those values can be encrypted and added to the session cookie (like the session id). This results in an application that is independent of the user session and makes load balancing and distribution much easier.

Lucene Read/Write Locking

We use Lucene index files to store the data. Therefore the different processes and instances have to be synchronized when writing to the index. We already have a good solution but expect to continuously improve alongside the Lucene developer community.

Application Performance and Distributed Caching

Caching is a good way to increase application performance. But caching can be problematic if you introduce too many caching layers, and finding and fixing problems can be very hard.

We took an approach to start our development with absolutely no caching. After the product was pretty stable in January 2009, we did extensive profiling of the application to identify areas where caching would increase performance significantly. We were surprised that only few areas where needed to increase the performance but these improvements were significant.

This resulted in a cache that has a simple structure and requirements. So we decided to implement the cache and distribution on our own.

Finally, Amazon EC2 together with the other Amazon web service tools provides a perfect infrastructure to host our products

Storage

As mentioned we use Lucene search index to store our data. We don't store everything in one big index file but use a separate index for each form. e.g. pages, blog posts and tasks all have their own small Lucene index. Therefore a typical Halvr instance has around 5 to 10 separate Lucene indexes.

Amazon Elastic Block Storage together with S3 provides us with a perfect infrastructure for the Lucene index files.
netociety: Home / Über uns / Kontakt / Facebook / Twitter
Legal: ©2010 Netociety GmbH
Powered by: Halvr