First developed by Doug Cutting and Mike Cafarella in 2005, Licence umder for Apache License 2.0Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. The Hadoop Distributed File System (HDFS) is Hadoop's storage layer. Housed on multiple servers, data is divided into blocks based on file size. These blocks are then randomly distributed and stored across slave machines. HDFS in Hadoop Architecture divides large data into different blocks Cloudera Hadoop Cloudera's open source platform, is the most popular distribution of Hadoop and related projects in the world .
Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size. For this we can use Apache Hadoop and cloudera hadoop
To get started with Hadoop, you need to download and install it on your machines. You can use Apache Hadoop, or choose from distributions like Cloudera, Hortonworks, or MapR. It's important to note that the actual steps for installation may vary based on your operating .
To get started with Hadoop, It is a framework which is based on java programming. It is intended to work upon from a single server to thousands of machines each offering local computation and storage. It supports the large collection of data set in a distributed computing environment. You can check next article , we discuss about component and architecture.