DJUGL: Inside Lanyrd architecture

by Álex 2013-03-12 20:30 talks djugl march lanyrd django solr celery postgresql redis mongodb mysql varnish haproxy s3

This talk was just amazing! Andrew Godwin (@andrewgodwin or http://areacode.org) gave us talk about the internal architecture that they are using at lanyrd.com.

I would say 2 things:

After this lines you will fine almost a C&P of the Andrew slide that you can find here: https://speakerdeck.com/andrewgodwin/inside-lanyrds-architecture, but there are some additional notes added by me.

The Origin Story

They launch in Aug’10 and after half an hour going down because the load was too high

In Sep’11 they got some inversion that allows them to start with the curren architecture.

The ecosystem that they need to take care of:

And all this with just 6 technical guys! That know this:

What they run on

Almost all the site is written in django with some spices:

The main services that you can find on, are this:

PostreSQL

Redis

Solr

Varnish

HAProxy

S3

What they have eliminated

It’s a shame because I am reading it in several places, but it seems that after the hype a lot of companies are eliminating MongoDB from their backends.

MongoDB

Nevertheless they think that it’s a really useful tool for quick prototyping.

MySQL

The great move of 2012

They move from EC2 to Softlayer basically because it’s real hardware, if something fail, just change it). From MySQL to PostgreSQL for the reasons that he explained before.

Why?

It seems that lanyrd has a very predictable traffic, they can know months in advanced what is the expected load.

How

Both moves required database downtime, couple of tables were really big, any change on that table means around 20-30min of downtime.

  1. Replicate Solr and Redis across to new servers.
  2. Enter RO mode.
  3. Dump MySQL data.
  4. Convert MySQL dump into PostgreSQL dump.
  5. Load PostgresSQL dump.
  6. Re-point DNS, proxy request from old server.
  7. Exit RO mode.

After all this process they can say that they have been 1 hour and a half in Read Only mode but without any downtime at all.

From their experience, the advantages of have a content site are that the RO mode is completely viable. They logged out all the people from the site and in the mean time Varnish was blocking all the POST request & cache aggressively.

Always be deploying

Just a quick note: if you had never used this feature you should try something like gargoyle. It’s just amazing to deploy some functionalities to just some of your users. I don’t know what they are using, but if it’s not this, it should be something similar.

Legacy code & decisions

Awareness (every ppl know what is going on) & always deployable (master branch always shippable).

Small and nimble

Fix it while you can


Comments

comments powered by Disqus

polo is made with by @agonzalezro