Saturday 28 March 2015

Introduction

Cluster Computing


Cluster of computer used to compute task in distributed manner is called cluster computing. Generally building single computer with high end configuration is costly and scaling the hardware is not possible, So the best way to get high performance with cost effective is cluster computing.

Cluster computing is linking/grouping commodity computers together to make the big powerful system to process large data, just linking multiple system is not going to provide power of processing large data but we need a framework which can distribute the task to multiple system for processing and finally providing the result.


  There are many cluster computing framework is available, There are two types of cluster computing framework.

1.  Task parallelism
2.  Data parallelism


1. Task parallelism

Slicing the program into multiple parts and distributing across system is called task parallelism.

2. Data parallelism

In task parallelism the distributed machine code is different but in data parallelism the machine code is same but the data is sliced into multiple parts.

We are going to explore Data parallelism..! Apache Spark