HP Moonshot 1500 And The ARM-On-Server Madness

There is no apparent connection between these two, but Moonshot is actually in the same trend, even though it uses Intel x86 (Atom). Many people advocate for ARM on servers, mainly because it is low-power, allowing a high CPU count per rack. But no one would mind I guess with a low-power CPU from Intel either.

Unfortunately, not many have a good overall picture on the cloud challenges, not to mention software engineering. And this is pretty complicated.

Software And Hardware

Software may be CPU bound, memory bound or I/O bound. For example, CPU bound means that the CPU power is the limiting factor. If we increase the CPU power, we expect the software performance to increase as well.

With these in mind, someone would assume that it is possible to get computing instances (for example, on Amazon EC2) that strictly follow these principles. And while this apparently happens – Amazon labels some instances as High-memory, High-CPU, High-IO, the truth is that all resources scale, not only the one mentioned in the title. This is because in real life even CPU bound software most likely needs a lot of memory and a lot of I/O. The reason is that the CPU must keep the data somewhere (in memory) and must be fueled from some source (I/O).



Using the CPU efficiently is one of the most important problems in software engineering. There is a lot of hype regarding thread concurrency, with people crying for more and more cores. CPUs in mobile phones got to eight cores and, apparently, it’s not good enough. When a lot of disconnected software is running on the system, the CPU cores might be used efficiently, as the kernel scheduler is doing the job.

Still, many CPU bound software do not efficiently use all CPU cores available on a system. For example, when working with software threads, some resources require controlled access, meaning that they can only be accessed by one thread at a time. All the other threads are waiting to get access to that particular resource. This might not be a big problem when the resource is memory, but it’s a huge issue when threads compete to write on disk.

In the end, getting software that was designed 10-15 years ago to use all CPUs efficiently is not as simple as it seems.

And remember, all of the above happens in the best-case scenario, when all threads are running *on the same computing machine*.

The Big I/O Problem

With most applications, the CPU power is not the limiting factor. Even if we do not efficiently use CPUs, the CPU power is increasing nicely and at a constant pace. So, basically we might have very powerful CPUs, but feeding them properly can be quite a challenge.

In the medium term, SSDs will practically replace mechanical disks. This will improve the storage performance dramatically. Furthermore, it will increase the premises for future improvements, as mechanical disks have been physically limited for a long time.

Unfortunately, the networking future does not look so bright. The problem is that in data centers networking is a huge investment and it takes a lot of time to be replaced. 10 Gbs have been available for a while, but it’s still not widely deployed on the computing node level.

I know that promising technologies such as InfiniBand exist, but they are not so popular. And this has a reason, but maybe in a future article.

Why Is I/O So Important?

Lots of smart engineers say that the solution to performance issues is distribution. By putting lots of cores to work, we can get better performance. However, if we do it, we must make sure that these cores can communicate properly. And they can do it using networking.

Now, let’s talk history. Back in 2007, Sun delivered a very threads capable CPU (UltraSPARC T2 CPU – 64 threads). It was a single CPU, so all threads had access to the same memory. Still, Sun, which was a very smart engineering company, bundled two 10 Gbs Ethernet ports embedded with the CPU! They advertised the CPU for specific workloads, most notable web servers that required a lot of concurrency and relatively low computing power per thread.

As far as I know, the project was not very successful. The most plausible explanation is that the CPU was not Intel and the Sun market share was pretty low. Apart from this, the CPU specialization had something to do with it as well.

And then again… in 2007, Sun recognized the importance of networking and featured 2 x 10 Gbs ports in the CPU, while the CPU was capable of 64 threads and had access to the same memory space. By contrast, HP’s approach in 2013 with Moonshot features memory independent computing nodes. This requires a far more capable interconnect.

HP obviously though about it, so Moonshot has networking and cluster fabric. Although nice on paper, this means that they do not connect the computing nodes through Ethernet. While this approach is typical for supercomputers, it’s not common in the end-user space.

Management Is Vital

Without *standard* management tools, there is no cloud. People love centralized management. They like big pools of resources that can be easily allocated. It’s not because they are stupid, but because it’s easier. That’s why powerful servers sell – these boxes can be easily virtualized, split in tens or even hundreds of VM, and some vendors are deeply focused on making management easy. When you have a continuous pool of 24 CPU cores and 256GB of memory, you are able to split them on VM as you wish. When you have 24 cores and 256GB memory on 8 different machines you cannot build a 12 cores VM with 64GB memory. Today, especially on enterprise level, many applications are not capable of service distribution. This means that the only way to increase performance is by scaling the computer power.

Oh, and networking management is even more annoying than CPU/memory management. Typically, we have less than 200 network ports on a rack. Can you imagine what is it like to have 1000? It does not matter that switches are integrated, still each machine is a different device on the network. And SDN is far from being a reality.

Is This Elasticity? No!

With the new wave of computing that HP proposes, we are actually going back in time. Some computing nodes will not be efficiently used, while others might be very crowded. Automated provisioning systems would allocate more nodes to address the limitations. The CPUs are indeed more energy efficient, but in the end it’s not the power per rack that matters, but the throughput. Real life is much different than benchmarks and the total throughput is not the sum of node level throughput. Even if we get a slightly better throughput per rack, we still have a problem, because management costs money.

Cloud is about elasticity and flexibility. IaaS providers don’t know what their customers are going to run on their hardware; they can only provide different types of computing instances and their customers will pick the right one.

From this perspective, large continuous chunks in resource pools are economically better and HP Moonshot is a disaster.

The Perfect Analogy In Real Life

Distribution has been seen as *the future* only forced by reality. I am sure that any engineer would prefer a computer 100 times more powerful vs. 100 different computers. That’s why excessive distribution should be avoided, especially when the roads are crowded (networking). Distribution translates into overhead, so the natural way is to limit it.

Even humans are much more productive in large cities, not on small villages. Additionally, the smaller the village, the higher the management costs (electricity, roads, supplies etc.).


As for ARM on servers, it’s simply fighting against Moore law plus the architectural differences. I am not sure how many developers are willing to port their code and then fix architecture related bugs. Actually, we already have the answer, check Windows RT.

Personally, I think that the future is about making server CPUs more powerful and energy-efficient. Today’s virtualization technology is mature enough to split them nicely. Industry must solve the I/O issues pretty fast, big cities require large highways. 🙂

As for HP Moonshot 1500, I like it from the engineering perspective, but it’s not going to be successful for good reasons.

Post A Reply