Why we need new server hardware. The full story.
In 2018 I was offered a second hand server by Servercow.de (now tinc.gmbh) - the hosting company that I still trust and that still hosts my main infrastructure for 650thz.de and some private stuff.
I took the opportunity and bought the Dell R620. Servercow.de took care of the server housing. The machine has two Intel Xeon E5-2640 v2 processors (= 16 phy. cores, 32 threads) and 64 GB of DDR3-1600 RAM. I added 2x 1 TB SSDs and 2x 2 TB SSDs in a RAID-1 config, which was plenty of storage back then. I’d had several years of experience in the Linux server field, but this was my first own physical machine and it was powerful enough to host all my privately run applications.
In 2017 I decided to host my own Mastodon instance - metalhead.club - on this machine after I first had heard of Mastodon in December 2016. I wanted to give something back to the Open Source community that had been providing me with such great software for years.
Mastodon (and my XMPP service trashserver.net) have been running just great since then. There was more than enough storage and CPU power - despite the fact that the CPU entered the market in Q3/2013. Power was sufficient and the server was affordable.
Why do we need an upgrade? ๐
Now in 2023 - after almost 6 years of metalhead.club, several waves of new users have hit the Mastodon network. The biggest one happened at the end of October / beginning of November 2022. The server gained more than 1,000 new users in November and registrations were temporarily closed after metalhead.club had reached 4,000 active users.
You could definitely feel the server reaching its limits. Connections were slow, Sidekiq federation queues were delayed, the server was overloaded. The situation became better when everything settled down and after I did some optimizations on the server. But still - even with only 1,500 active users in January 2023 - you could feel the age of the hardware.
Usually the server is doing well. During working days if not too many users are on metalhead.club at the same time, you won’t notice many delays when using the platform. With some exceptions, e.g.
- “Explore” page: It takes several seconds until trending posts and links are displayed. Sometimes 20, sometimes 30 seconds.
- Opening popular posts with many comments: Sometimes it takes up to 30 seconds until all comments of a post are displayed. They will not be loaded one after another, but all at once, which means: If a user does not watch the loading indicator, they might assume that there are no comments at all. You need to be patient to be able to see the comment section of a popular post.
… and then there is this time window of 5 pm until about 10 pm (UTC+1) that got my attention multiple times. It does not always happen with the same intensity - but usually the server faces a big load during that time and clicking though the metalhead.club web interface takes quite some patience.
Investigations showed that during those load-intensive workloads like opening the explore page, both of the the CPUs are sometimes mostly “idling”- at least if there are not too many users active on metalhead.club. You might think “aren’t there 32 threads ready to serve requests?” Yes, there are. But some operations, esp. complex database queries or “Puma” web server requests can only be handled by a single CPU thread (per request). The server is only fully loaded if there are enough concurrent requests. More CPU cores do not help with the performance for a single user. It’s not only the number of CPU cores, that matters. For a single user request it’s also single core performance that matters.
… and this is the weak spot of my current setup. The E5-2640v2 CPUs didn’t age very well and while they offer an acceptable multi core performance for usual workloads, they simply can’t catch up with today’s demand regarding single core performance.
A single E5-2640v2 core is way slower than one of my Dell XPS 13 9360 (2017) mobile processor cores (i7-7560U).
(… But in multicore benchmark, the E3-2640v2 is more than twice as fast as the i7-7560U !)
First idea: Let’s upgrade the server CPUs. While this would definitely cause some server downtime, it can be done. But my R620 is too old to handle any modern, faster CPUs. We’d basically be stuck with compatible HW that is of a similar age as the server base itself. The upgrade possibilities are very much limited and we can’t expect to get a decent performance boost.
Well, that leaves us to getting new hardware or switching to rented hardware …
Renting or buying hardware? What about a cloud VM? ๐
Buying new server hardware is expensive. Why would you buy new hardware that gets outdated in a couple of years when you could rent hardware from a server hoster? Well, there are some considerations to be made.
Cloud VM:
- Is flexible. You need more cores? Click some more cores. More RAM? You can adjust and reboot the machine.
But:
- Usually you share HW with other users. No guaranteed CPU power (there are exceptions!)
- Cloud VMs are hard to compare. There is no platform for comparing vendor x to vendor y VMs and choose a VM that provides best performance for a dollar.
- High monthly fix costs. At least with the more powerful machines.
- Expensive storage upgrades. You’re paying the SSD storage space every month.
Rented hardware:
- Somebody else takes care of replacing parts (usually 4 free)
- Battle tested
But:
- Limited upgradability / customizable only for extra fee
- You don’t own it. You can’t move it to another hoster
Own hardware:
- You choose the hardware that best suits your workload
- Big one-time investment, but affordable monthly costs (not valid for small servers)
- Cheap storage upgrades: Need another 2 TB of SSD? Buy cheap HW, add it to the server. Buy once, use “for free”.
- Take control: You control the hardware. You can perform upgrades whenever you want. You decide your security measures. Your machine is pretty much isolated from the other customers.
But:
- Cost for failing parts / replacement parts
- More responsibility and initial work to set everything up
- Only makes sense for bigger servers due to cost effects
…
This is not a complete list, just some thoughts that came to mind.
As I’ve had a great experience with my own hardware, I would like to buy new HW and not rely on some sort of cloud VM. As an extra bonus, this makes me learn new stuff about servers and network infrastructure. I want to take full control and responsibility of my servers.
Nevertheless I was curious if it made sense to rent some off-the-shelf hardware at Hetzner instead of buying my own. I did some rough calculations and estimations and well - it turns out that after about 3-4 years the own hardware gets cheaper per month than the rented hardware.
This is just a rough estimation of course and it heavily depends on the hardware. Generally speaking: The more powerful hardware, the more expensive rented machines are in comparison to own machines + housing.
The final decision is not taken. At this time I’d prefer to buy my own, customized hardware that best suits the purpose. However, my calculation also depends on the housing costs - and those costs are about to get increased. Yesterday my current hoster told me that he will need to raise the prices for housing due to extreme power costs. I still don’t know how much I will be charged in a few weeks.
Until then I cannot decide which solution is the most cost-effective one. I’ll need to wait for the new prices, then do my calculations and estimations again and find a proper solution.
I’ll keep you updated on this via metalhead.club and this blog!
Btw: Thanks to all the users who participated in the poll that I am announcing at metalhead.club! It provided valuable feedback to me and tells me whether a new server could be crowdfunded or not. If you have not voted, yet, I’d like to encourage you to do that to get a more precise result. You can find the survey in the announcement section of your Mastodon app or web page.