tl;dr: If you only need single hosts Rackspace Cloud is good value but it lacks key features needed to build a serious infrastructure. This may be because RS don’t want to cannibalize their dedicated server business. Biggest attraction is stable long-lived instances, worst irritation is the lack of a secure internal network.
For the last year I’ve been running web hosting platforms on Rackspace Cloud. Some recent discussion on HN left me thinking about whether cloud hosting really suits most startups and it’s prodded me into writing a balanced assessment of what the stack has been like to work with.
This article applies to the traditional flavour of Rackspace Cloud which has seen little innovation in the last year. A long-awaited rewrite based upon OpenStack is in the works which may fill in some of the missing features.
Let’s get this part out of the way first: Rackspace’s “Cloud Servers” aren’t very cloudy. They’re basically VPS’s – standalone instances on a large, public network that you rent by the hour. This is no surprise since Rackspace’s present offering was based on the guts of Slicehost, a successful VPS provider they acquired a couple of years back.
Right now you get a choice of two locations: US and EU. The two implementations are completely separate right down to needing their own logins. It’s almost as if there are two completely separate, self-contained Rackspace Clouds. Choose your location wisely because moving entails significant work.
You get the basic cloud building blocks…
- Cloud Servers
- Cloud Files (which wants to be S3)
- Cloud Load Balancers
- + other ancillary services (e.g. email sending), often supplied through external companies
Conspicuously lacking at present is attachable volumes a-la Amazon’s EBS. There’s some logic to this; given Rackspace’s Cloud Servers are non-ephemeral (i.e. Rackspace try not to destroy them) they’re a safe place to keep your data and you should not need to mount storage from an external system. The downside to this is you’ll never get more performance than the disk on a single host.
All current RS Cloud instances seem to have four cores. Paying more just gets you more RAM and disk space, and presumably means you’re on a physical host with fewer other instances contending for its resources. Instance sizes range from 256Mb to 32GB RAM and the cost scales linearly. You don’t get anything like Amazon’s smorgasbord of options (e.g. you can’t say “give me a shitload of RAM but easy on the CPU”) but what’s there is satisfactory for most needs.
RS Cloud’s biggest draw is that instances are not ephemeral. On many cloud providers (certainly Amazon) your instance could vanish in a puff of logic at any moment. Amazon make no secret of this: if they need to mess with their hardware your VM’s running on it will sink without trace. They often warn you but it’s not always practicable to.
A lot of people seem surprised by this so I’ll repeat it: host on EC2 and you must be ready for any of your instances to vanish. Putting your data and boot volumes on EBS can help but opens up a whole new can of performance issues. Big players deal with the ephemeral node problem by building an expectation of failure (cf. Netflix’s Chaos Monkey) into their applications, but legacy stacks will need significant reworking and small shops often can’t justify the investment of developer time required to make apps truly fault tolerant.
In keeping with its Slicehost history Rackspace take a much more VPS-like approach. They attempt to preserve your instances: if hardware fails they migrate VM’s off to another physical host and in theory all you notice is a reboot. This is way more appropriate to the way small shops host web apps – you can live without full redundancy and not lose sleep worrying whether your rarely-tested failover scripts actually work.
Because these instances are long-lived you get functionality to shrink and grow them. This is a great idea (scale hosts without rebuilding? fantastic) but the process can be painfully slow. You would not want to automate resizing as a response to high load. I’ll hazard a guess that behind the scenes most resizes involve migrating your instance across the network to a new physical host and moving those GB’s around can take upwards of half an hour.
Potted Host Images
You can use RS’s images or roll one of your own by snapshotting a running instance. This will need to be based on an existing instance of Rackspace’s; I don’t believe there’s a way to create them from scratch or out of an existing machine elsewhere like one can on Amazon. Images are stored in Rackspace’s Cloud Files, a system similar to Amazon S3.
Rackspace have templates for a variety of OS (Ubuntu 12 was recently added) and you can usually bring up a new host in minutes with a couple of clicks in their web interface or some calls to the API. There’s no metadata associated with an image since the network & auth options are one-size-fits-all.
A form of backups are supported; basically these are just automatic regular snapshots of your hosts. They work well enough but you’ll get better value if you use something like duplicity to make incremental backups and store them on the cloud.
Rackspace bring up your instance with two interfaces: public and “ServiceNet” for unbilled back-end communication. As you’d expect from a huge megaprovider there’s tons of bandwidth and connectivity is reliable.
ServiceNet is really a huge RFC1918 network (10.x.x.x) shared between all instances on Rackspace Cloud with no firewalling between clients. Unless you implement your own host-level firewalling with iptables the soft, chewy centre of your network will be wide open to any attacker with an instance of her own. “It’s okay” you say, “I have strong passwords on my database” – but really too many assumptions are made in the open source web stack for the internal network of a platform to ever be considered secure. The total absence of auth on memcached (see go-derper) and by default on MongoDB are good examples why.
People have bugged Rackspace over this but their standard answer is that you should use iptables or setup a VPN between the hosts in your cloud infrastructure. Between one or two hosts this is a realistic proposition (though an unwelcome increase in admin overhead) but imagine the hell of sharing ACL’s or VPN keys between a dozen regularly-changing hosts. Worse, most VPN options are point-to-point so in a platform of a dozen machines you’ll need 2^N connections. IMHO a genuinely private back-end network is absolutely vital for a platform of any size.
If you press the network security issue RS will recommend their Private Cloud offering; last time I checked this meant renting expensive, dedicated VMWare servers with their own VLAN. You could do this yourself and besides VMWare is expensive, high-overhead & aimed at enterprise rather than the dinky startups we all know and love.
Just like Amazon’s offering you get two ways to admin your cloud instances:
- Web interface. It feels shonky in places and is slow at peak times (yes, a cloud control panel that doesn’t seem to scale) but works okay. My chief bugbear is that only one account can own/administer your instances so all engineers on your team must share the same account and, amusingly, the same name.
- RESTful API. You can interface with RS’s cloud from your own scripts to request new hosts, resize things, instantiate new Cloud Load Balancers etc. Client libraries exist for several languages; we’ve been using the Python one but found it short on documentation and a pain to work with. Apache Libcloud might be better.
We tried really hard to make the API’s work (for our own auto-scaling scripts) but kept on running into problems. The API endpoint would sometimes break so we couldn’t rely on it and our code rapidly disintegrated into a spaghetti-like mess of edge cases. I get the feeling few people are using the RS Cloud API in anger and the real power users are all on EC2.
No complaints here. It’s Xen-based so there isn’t any memory contention. Likewise Rackspace’s network is damn fast and unlikely to ever give you problems.
root@testbox:~# dd if=/dev/zero of=bigfile bs=1M count=10240 && iostat -m 5 ... avg-cpu: %user %nice %system %iowait %steal %idle 0.03 0.00 7.38 21.36 0.05 71.18 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn xvdap1 3734.12 0.24 151.25 1 771
Considering the price point of these hosts that’s great IO performance. Unless you’re really huge you should not fear running your database on an instance with local storage.
As for the processors…
root@testbox:~# cat /proc/cpuinfo | grep model ... model name : Quad-Core AMD Opteron(tm) Processor 2374 HE
I expect generations of hardware in their platform come with different processors but all seem fairly modern. Instances get plenty of cycles to themselves so there’s none of the throwing-away-instances-until-one-benchmarks-well nonsense you hear of people doing on EC2.
Uptime & Support
Reliability of individual instances is okay but not stellar. You get what you pay for. I have never seen a machine irretrievably lost but the odd patch of downtime is not unusual, particularly if Rackspace detect an underlying hardware failure and need to migrate your instance to another physical host. They’re pretty good at reporting this via email but it’d be nice if the on-call guys got a SMS as well.
Rackspace are enterprise-focused and this means they take support seriously. You can open a ticket or call a real human and get a response 24×7. Most of their support engineers are pleasant, knowledgeable people and they really do want to solve your problem. A small minority seemed over-eager to close tickets though, and getting complex problems fixed can take a lot of persistence.
Support is hard for a business to scale. They’re mostly procedure-oriented (“given symptoms X and Y, try Z”) and you’ll run into problems if you go off the beaten track. I get the feeling they’re trained to support workaday RedHat/Apache instances but not the more esoteric stacks we use in startup-land.
In particular, getting extra IP addresses attached to a host (even when you have a great justification for it) was excruciating. I had to explain “it’s a node.js app that really needs to listen on port 80 on two separate IP’s” about a zillion times before some kind soul believed me and agreed to rethink their procedures.
Rackspace’s Cloud pricing is a big draw. Normally one associates Rackspace with premium features and high prices [& heavy marketing] to match but they seem pretty clued up that cloud is a commoditized game. There’s no free tier but otherwise pricing’s pretty generous.
Costs are in the same ballpark as Amazon EC2 and much less than a lot of other cloud providers. Bandwidth is cheap and plentiful. They take your credit card details and bill by the month.
While individual Cloud Servers are great value the offering as a whole feels a bit Fisher Price. It’s suited to single-host VPS hosting but lacks features you’d want for a more serious deployment. For me the killer flaw is the absence of a truly private back-end network. Amazon had this since day 1 and building a serious hosting platform without leads to a lot of fiddly workarounds. You’ll never get something like this past through PCI compliance.
TBH there’s a me-too-ish air pervading the stack. There isn’t any new thinking about clouds. My cynical side wonders if the current version exists only so RS can say to investors and buzzword-hungry enterprise customers “yeah we do cloud” but channel any serious deployments into their higher-margin dedicated server business.
But enough of the negatives. The long-lived hosts bring great peace of mind and mean your developers don’t get bogged down writing a codebase that’s paranoid about the infrastructure it runs on. For early-stage startups this is appropriate: you have limited resources and need to spend them on features not plumbing.
Rackspace have a new OpenStack-derived incarnation of their cloud on the way. Once this is publicly available it’ll be worth another look.