【Hadoop】Hive

Posted by 西维蜀黍 on 2023-09-22, Last Modified on 2023-09-22

Apache Hive

Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale.

Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.

Apache Hive supports the analysis of large datasets stored in Hadoop’s HDFS and compatible file systems such as Amazon S3 filesystem and Alluxio. It provides a SQL-like query language called HiveQL with schema on read and transparently converts queries to MapReduce, Apache Tez and Spark jobs.

HiveQL

While based on SQL, HiveQL does not strictly follow the full SQL-92 standard. HiveQL offers extensions not in SQL, including multi-table inserts, and creates tables as select. HiveQL lacked support for transactions and materialized views, and only limited subquery support. Spport for insert, update, and delete with full ACID functionality was made available with release 0.14.

Internally, a compiler translates HiveQL statements into a directed acyclic graph of MapReduce, Tez, or Spark jobs, which are submitted to Hadoop for execution.

Reference