Great Architectures, Stacks & DevOps at Webscale

By Chris Ueland

Utilizing GPUs in The Cloud


I got to see Mike Houston from Nvida speak at Velocity 2014 in Santa Clara and was inspired to put this article together. I’d always been aware of GPUs but they were always a hassle to implement. After researching a bit boxes are easily available from managed and cloud providers (listed below). Nvidia has a startup inside of their giant 8,800 person company trying to bring better APIs and accessibility to the GPUs. I went up to Nvidia’s headquarters to meet with Mike and Han Vanholder to get a peak at what they’re up to. I’d like to thank Joe Savage for putting this together with me.

–Chris / ScaleScale / MaxCDN

HPC, or High Performance Computing, is an area of research all about having more powerful computers. Increasingly we’re seeing more and more computing power available on-demand in the cloud, and one interesting development in this area is the usage of GPUs – Graphics Processing Units – in servers.

CPUs and GPUs have significantly different architectures. For starters, CPUs tend to have a few powerful cores whereas GPUs have a large number of less powerful cores – this makes GPUs very good at extremely parallel tasks, but less good at processing only a single data stream. For a more real-world comparison, the NVIDIA GTX 670 – a consumer-grade graphics card – boasts 1344 CUDA cores at around 980 MHz each, compared to the 4 cores at 3.4 GHz in an Intel Sandybridge i7-2600 processor.



This graphics card boasts 1344 CUDA cores at around 980 MHz each

GPU architectures also tend to be catered towards towards parallel vector and floating point operations as these are used commonly in computer graphics, while CPUs tend to be catered more towards fast and frequent application branching. As a result, certain tasks are faster on the CPU while others are faster on the GPU.

Architecture Comparison of a CPU and a GPU


Architecture Comparison of a CPU and a GPU

When it comes down to web servers though, the vast majority use only CPUs for all processing. For many cases this serves its purpose, but as outlined previously, some tasks are just better suited to the GPU. Thus, if you need to scale certain types of operations, adding GPUs into your nodes may be a vastly effective way to do so.

More Powerful Processing

When Shazam – the music recognition service – started to attract a lot of attention, they needed to scale their service out fast. Not being able to deploy new servers fast enough, they began to utilize more powerful processing in their nodes through the power of GPUs. Other examples of companies deploying GPUs in servers include Adobe, Flickr, IBM, Netflix, and Yandex.

Specifically, operations that may be more effective on the GPU include:

  • Image manipulation (resizing, filtering, etc.) and analysis
  • Speech recognition
  • Data compression and transformation
  • Frequent cryptographic operations
  • Generally, many kinds of very parallel algorithms (types of recommendation and ranking algorithms, etc.)

Services with GPU instances

Hosting providers such as Amazon’s AWS, Nimbix, Peer1 Hosting, Penguin Computing, RapidSwitch, and SoftLayer are all currently offering different services that involve GPUs (both in the cloud and in managed servers). Amazon’s on-demand EC2 G2 instances are one of the more entry-level options here, offering an NVIDIA GPU with 1,536 cores and 4GB of video memory along with an on-board video encoder for around $0.650 an hour (~$475/mo). All the providers here offer a range of diverse services making them very difficult to directly compare, so if you’re interested it might be worth just browsing their websites to see what best suits your needs.

So, what’s the catch?

If GPUs are so great at all these operations – and are even readily available in the cloud – why aren’t we seeing them used more often? Well, the difficulty at current with deploying GPUs in servers is simply the complication in utilizing them. For those not familiar with GPU programming, it is somewhat different to traditional programming. There is a lot to take in, and moving a section of your application over to using the GPU will likely introduce a lot of complexity.

The Future

The solution to this for the future is hopefully an increase in APIs and tools that make this process easier. Ideally, an API would simplify the process of balancing tasks between the CPU and GPU, and between multiple GPUs, and make it easier to utilize the power of the GPU in a number of different situations. A group at NVIDIA which focuses on the use of GPUs in servers has been working on exactly this task recently. You can watch a talk from one of the members about this topic below or watch it from youtube.

Michael Houston’s Ignite talk

Harnessing Heterogeneous Nodes

Popular search terms:

  • content
  • cpu vs gpu architecture
  • gpu vs cpu architecture picture
  • how are gpus deployed in the cloud

Chris Ueland

Wanting to call out all the good stuff when it comes to scaling, Chris Ueland created this blog, ScaleScale.