Abstract:

Hadoop is the most popular open source implementation of HDFS and MapReduce , this powerful tools are designed for deep analysis and for faster data processing of large sets of data. Hadoop is the system that allows unstructured data to be distributed across thousands of machines forming shared clusters, and execution of Map/Reduce routines to run on the data in that cluster. Due to some flaws in the MapReduce its better to replace the MapReduce with CLOUD DATAFLOW. This cloud dataflow allows to build pipeline , monitor their execution, and transform &analyse data, all in the cloud. Inaddition to Dataflow , to fetch the data effectively we are adding bigquery.


Keywords — Hadoop, HDFS, MapReduce, Dataflow;