Hadoop

Hadoop allows the distributed processing of large data sets stored across clusters of computers.

The Hadoop framework consists of two main components

Hadoop Distributed File System (HDFS)
- HDFS is an open source variant of the Google File System (GFS)
MapReduce programming framework
- Hadoop MapReduce is the open source variant of Google MapReduce

ZooKeeper

ZooKeeper is used to coordinate the cluster in hadoop framework

Several Hadoop projects are already using ZooKeeper to coordinate the cluster and provide highly-available distributed services

Apache Avro

Apache Avro is used for Remote Procedure Calls (RPC) in the system. It supports self describing schema for the data, where data is described by a schema and also stored in the same ﬁle as the data it describes.

Pig
- Pig provides an engine for executing data ﬂows in parallel on Hadoop. It uses MapReduce to execute all of its data processing
Hive
- Hive is the data warehouse infrastructure developed by Facebook. It is used for data summarization, query, and analysis
- HiveQL is a SQL-like language
HBase
- HBase is a Hadoop database inspired from Google BigTable and non-relational distributed database. It is used as a storage system for MapReduce jobs outputs.
- Most useful to store column-oriented, very large tables for random, real-time read/write operations.
Sqoop
- Sqoop is a system for bulk data transfer between HDFS and structured data-stores as RDBMS. Spark is a framework for writing fast, distributed programs
Spark
- Spark solves similar problems as Hadoop MapReduce does, but with a fast in-memory approach and a clean functional style API

Reference

FEATURED TAGS

algorithm algorithmproblem architecturalpattern architecture aws c# cachesystem codis compile concurrentcontrol database dataformat datastructure debug design designpattern distributedsystem django docker domain engineering freebsd git golang grafana hackintosh hadoop hardware hexo http hugo ios iot java javaee javascript kafka kubernetes linux linuxcommand linuxio lock macos markdown microservices mysql nas network networkprogramming nginx node.js npm oop openwrt operatingsystem padavan performance programming prometheus protobuf python redis router security shell software testing spring sql systemdesign truenas ubuntu vmware vpn windows wmware wordpress xml zookeeper

【Hadoop】学习

Hadoop

ZooKeeper

Apache Avro

Reference

FEATURED TAGS

FRIENDS

TOC

Hadoop

ZooKeeper

Apache Avro

Related projects in Apache Hadoop

Reference

FEATURED TAGS

FRIENDS