about

rss

(cons 'ider 'this)

by Mark McGranaghan

Back of the envelope: how many servers does AWS use?

So I want to try to figure out the size of the server network backing AWS.

I’m going to come at this three ways: by revenue, by number of users, and by comparison to another major site. By the way if someone has found a more direct way (e.g. Amazon says somewhere how many servers back AWS) please let me know

Revenue: We don’t know exactly how much Amazon grosses from AWS, but it seems like its on the order of $10-$100 million – let’s call it $50 million. Lets say also they have 50% gross margins against their infrastructure costs (conveniently, this guess is both inherently reasonable and guaranteed not to be off by more than a factor of 2). $25 million dollars a year is about $2 million a month, which would get you about 15,000 2GB servers and corresponding bandwidth from slicehost at retail. Assuming slicehost has similar gross margin rates, we end up with about 30,000 mid-range servers backing AWS.

Now we can also say that $50 million buys 500 million small instance-hours on EC2, which works out to about 60,000 instance-years. If a small instance is physically backed by the equivalent of a low-to-mid-range server, the price-to-server ratio for EC2 is comparable to other AWS services, and Amazon gets about half of its revenue from data transfer as opposed to computing/storage, then we again end up with about 30,000 physical backing servers.

Customers: Eric reported in the TC article linked above that an Amazon executive said that they have 60,000 AWS customers. Its difficult to put an upper limit on how many servers that would correspond to – as some customers could easily use tens or hundreds of machines each – but it does suggest that Amazon almost certainly has 10s of thousands of servers backing AWS.

Comparison: Facebook has about half as many unique visitors as Amazon sites and runs on about 12,500 servers, which would suggest that Amazon proper might need about 25,000 servers. AWS recently passed Amazon proper in terms of bandwidth usage, suggesting that AWS’s infrasturucture is within a small constant factor and probably slightly larger than Amazon’s. This again suggests that AWS is backed by servers numbering in the low-to-mid tens-of-thousands.

Final triangulated guess: ~30,000 commodity servers back AWS.

Recently Read: The World We Have Lost by Peter Laslett

The World We Have Lost frames the study of history as the discovery of both what happened in our past and how those historical facts are related to each other. The book discusses in particular the social and economic structure of England from around 1600 to 1900. The historical record of this period is spare, especially with respect to the everyday lives of the non-elite, but the author does his best to use existing scholarship and his original work to understand the “world we have lost”.

The book’s dozen chapters are quite independent, and if you read at least the first you could safely read as many or as few of the remaining ones as you like. I favored the chapters about the actual day-today conditions and realities of people’s lives, like “The Village Community” and “Did the Peasants Really Starve?”, to chapters making more abstract, philosophical arguments, like “Social Change and Revolution in the Traditional World”.

PersonRank on Twitter

A few posts ago I mentioned the idea of applying a PageRank-like algorithm to networks of people. One simple application of that idea would be creating a useful measure of popularity on Twitter.

The basic idea would be that a user’s popularity is the sum of popularity contributions of all of his followers, where the popularity contribution is the follower’s popularity divided by the number of people that they follow. This measure is recursive but nonetheless tractable to calculate.

This approach to measuring popularity would probably be more robust than the other ones that are often used, such as the number of followers or ratio of followers to followed.

The measure could also be extended to incorporate tweets, where an @username reference could count as a weighted popularity vote in the same way that a following does.

I know its lame to talk about this idea without implementing a prototype but maybe I’ll get around to that it in a bit.


## More Benchmark.nbmbm ##

Benchmark.nbmbm(100_000) do |x|
  ar = [1,2,3]
  
  x.report("each") do
    ar.each { |i| i * 1 }
  end
  
  x.report("loop") do
    i = 0
    while i < ar.size
      i * 1
      i += 1
    end
  end
end


# Abbreviated Results
           user     system      total        real
each   1.500000   0.680000   2.180000 (  2.179588)
loop   0.660000   0.160000   0.820000 (  0.820617)
+

## Benchmark.nbmbm ##

require 'benchmark'

# A pattern for benchmarking that I see a lot:
N = 100_000

Bucket = Struct.new(:a, :b, :c)

puts "N: #{N}\n\n"

Benchmark.bmbm do |x|
  x.report("Hash") do
    N.times do
      h = {}
      h[:a] = "1"; h[:b] = "2"; h[:c] = "3"
      h[:a]; h[:b]; h[:c]; h[:a]; h[:b]; h[:c]
    end
  end
  
  x.report("Struct") do
    N.times do
      b = Bucket.new
      b.a = "1"; b.b = "2"; b.c = "3"
      b.a; b.b; b.c; b.a; b.b; b.c
    end
  end
end

# But it seems lame to keep writting N.times in each block:
module Benchmark
  class JobProxy
    attr_reader :benches
    def report(label, &block)
      (@benches ||= []) << [label, block]
    end
  end
  
  def self.nbmbm(n)
    puts "N: #{n}\n\n"
    
    job_proxy = JobProxy.new
    yield(job_proxy)
    
    bmbm do |x|
      job_proxy.benches.each do |(label, block)|
        x.report(label) do
          n.times(&block)
        end
      end
    end
  end
end

# With this we can write the original benchmark as:
Benchmark.nbmbm(100_000) do |x|
  x.report("Hash") do
    h = {}
    h[:a] = "1"; h[:b] = "2"; h[:c] = "3"
    h[:a]; h[:b]; h[:c]; h[:a]; h[:b]; h[:c]
  end
  
  x.report("Struct") do
    b = Bucket.new
    b.a = "1"; b.b = "2"; b.c = "3"
    b.a; b.b; b.c; b.a; b.b; b.c
  end
end