Hadoop
Hadoop allows the distributed processing of large data sets stored across clusters of computers.
The Hadoop framework consists of two main components
- Hadoop Distributed File System (HDFS)
- HDFS is an open source variant of the Google File System (GFS)
- MapReduce programming framework
- Hadoop MapReduce is the open source variant of Google MapReduce
ZooKeeper
ZooKeeper is used to coordinate the cluster in hadoop framework
Several Hadoop projects are already using ZooKeeper to coordinate the cluster and provide highly-available distributed services
Apache Avro
Apache Avro is used for Remote Procedure Calls (RPC) in the system. It supports self describing schema for the data, where data is described by a schema and also stored in the same file as the data it describes.
Related projects in Apache Hadoop
- Pig
- Pig provides an engine for executing data flows in parallel on Hadoop. It uses MapReduce to execute all of its data processing
- Hive
- Hive is the data warehouse infrastructure developed by Facebook. It is used for data summarization, query, and analysis
- HiveQL is a SQL-like language
- HBase
- HBase is a Hadoop database inspired from Google BigTable and non-relational distributed database. It is used as a storage system for MapReduce jobs outputs.
- Most useful to store column-oriented, very large tables for random, real-time read/write operations.
- Sqoop
- Sqoop is a system for bulk data transfer between HDFS and structured data-stores as RDBMS. Spark is a framework for writing fast, distributed programs
- Spark
- Spark solves similar problems as Hadoop MapReduce does, but with a fast in-memory approach and a clean functional style API
Reference
FEATURED TAGS
algorithm
algorithmproblem
architecturalpattern
architecture
aws
c#
cachesystem
codis
compile
concurrentcontrol
database
dataformat
datastructure
debug
design
designpattern
distributedsystem
django
docker
domain
engineering
freebsd
git
golang
grafana
hackintosh
hadoop
hardware
hexo
http
hugo
ios
iot
java
javaee
javascript
kafka
kubernetes
linux
linuxcommand
linuxio
lock
macos
markdown
microservices
mysql
nas
network
networkprogramming
nginx
node.js
npm
oop
openwrt
operatingsystem
padavan
performance
programming
prometheus
protobuf
python
redis
router
security
shell
software testing
spring
sql
systemdesign
truenas
ubuntu
vmware
vpn
windows
wmware
wordpress
xml
zookeeper