Saturday, 28 March 2015

Apache Spark components


Today we will discuss about the Apache Spark components. Its really important to understand below components

1. SparkContext
2. Driver program
3. Executor
4. Task


What is Driver program?

The one who create the SparkContext is the driver program.


what is SparkContext?

SparkContext is the connection to Cluster, and used to create RDDS and few other stuff which we can discuss later.


What is Executor?

Executor is a process which run on each slave node.

What is Task?

Task is the actual computation whihc run inside the executors, Each Task is an thread inside the executors.



 When we run any application in Apache Spark, application computation is spread across multiple nodes(computers) to process the data and finallly the results will be out. Each node runs a process called "Executor" which will be responsible for executing the task. Each Task is a thread inside executor, which is the one actually does computation.

An Executor can run multiple Task if the number of core is more than one. Each task is assigned to one core to make use parallel comuting.