The IT industry changes quickly, and that’s a fact. However, some concepts stick around for a while. Distributed computing is one of those: chances are you have heard it. In fact, it is so common that professionals assume anyone knows about this term. That’s not always the case, but luckily the concept behind this technology is simple. In this article, we will address the important question “What is distributed computing?“. Once we have a clear idea on that, we can try to understand where we can leverage it.
What is distributed computing?
Distributed computing is the approach of sharing computing load among many devices. Computing is just a fancy term for the processing power of a digital device. To have more computing, you can get a bigger and more powerful device. However, we have hard limitations on that: we have a certain level of power that is the cap for any device. A quick solution is to add multiple CPUs inside a single device: this will give you more power. However, even that comes with some problems.
Having a single device with extreme power is of little use if it cannot share its power on the network. In fact, the problem now might be the network: the device could be too powerful for the network. In simple terms, it would talk faster than the network could listen. Thus, we need a better alternative to increase our computing power.
Meet distributed computing. With it, you share the same load (works you want a computer to do) across multiple devices. That’s the opposite of what we used to do, running a task on a single PC. Following the distributed computing approach requires some considerations, as using a single device is way simpler. However, it comes with enormous advantages we simply couldn’t have otherwise.
The advantages of distributed computing
We won’t go too technical here. Instead, we will show you some of the high-level advantages.
- Scalability is the key driver for adopting distributed computing. This means you can increase your computing power easily, and as simple as adding a new device to your network. It will join the other and make its computing power available for the whole cluster. In turn, this also means down-scaling is easy in case you need it.
- Redundancy reaches unprecedented levels with distributed computing. Since you have hundreds if not thousands of devices, a failure of a single device does not significantly reduce your total computing power.
- Fault Tolerance is another key benefit, related to redundancy. In case of a device breaks while doing calculations, you don’t have to re-do the whole calculation, but only the ones that the device was made.
- Last but not least, distributed computing works well on commodity hardware.
Now, there is one thing we need to keep in mind: we cannot use distributed computing to run any load we want. In fact, distributed computing means parallel computing: doing many things at once. If you need to make calculations that are based on the result of a previous calculation (sequential), distributed computing can’t help you. However, you may rethink your algorithm to avoid sequential processing and make it work.
Infrastructure changes
Distributed computing is a game-changer. Implementing it in the right way means shifting from old to new paradigms. This is particularly evident at the infrastructure level.
In the past, the trend was to have big and shiny storage (or more of them) for the entire data center. In other words, you had special systems storing huge amounts of data for the entire DC. Applications and servers that wanted to work on those data asked them to the storage systems. Those systems provided redundancy, high availability, and so on.
That’s a problem for distributed computing. With it, you have a central server asking some slave servers to do the calculations on the data. Since the slaves need to see the data, they would query the storage system all at once. It will become a bottleneck, slowing down everything. The new paradigm is simple, data and computing should be on the same machine.
Data should be as close as possible to where it is used (e.g. on the same machine).
It might be fun to know that this is exactly the paradigm we shifted away from with the advent of networking. But, as we know, things keep changing in our industry. The new trend, motivated by results, is having many small units of computing and data that do small pieces of work.
Distributed computing technologies
You may want to implement your in-house distributed computing technology. That might even be a good idea if you have special requirements. However, some well-established solutions exist, and they do a good job implementing what we explained in this article. Two famous examples are Hadoop, from the Apache team, and MongoDB. You can start working with them for free: they are open source. Furthermore, MongoDB has an enterprise version where you can pay a fee to have enterprise-grade support.
MongoDB offers an easy solution to create a database across multiple devices. However, what you can ask the slave devices to do is relatively limited. You may need to tweak it to your needs or adopt a custom solution depending on your objectives.
Wrapping it up
Distributed computing is here to stay. It allows us to work on an unprecedented amount of data at the same time, and this is why the industry loves it. In simple terms, distributed computing is coordinating many devices to work together and process that huge load of data. Having tons of tiny actors in the game means better redundancy and fault tolerance, and this “sharing” approach means scalability as well. However, this is possible only if your load can be processed in parallel, and does not require sequential processing.
What do you think about distributed computing? Do you see a bright future ahead because of it? Let me know your opinions in the comments.