Saturday 28 March 2015

Apache Spark

Apache Spark


Apache spark is open source cluster computing framework, Its completly written in Scala. It internally uses Akka framework for distributting task to multiple nodes. Here node is nothing but a system which consist of RAM, HardDisk, Processor.

Basic building block of Apache Spark is RDD(Resilient Distributed Datasets), It provides fault tollerence, RDD has metadata about the computation information which is used to re-compute the failed task. I will talk about fault tollerence little later when we start coding because that will be the right context and time. :-)

Download Apache Spark, And follow the steps to install.