Background
A relational database management system provides data that represents a two-dimensional table of columns and rows. For example, a database might have this table:
RowId |
EmpId |
Lastname |
Firstname |
Salary |
001 |
10 |
Smith |
Joe |
60000 |
002 |
12 |
Jones |
Mary |
80000 |
003 |
11 |
Johnson |
Cathy |
94000 |
004 |
22 |
Jones |
Bob |
55000 |
This simple table includes an employee identifier (EmpId), name fields (Lastname and Firstname) and a salary (Salary). This two-dimensional format is an abstraction. In an actual implementation, storage hardware requires the data to be serialized into one form or another.
The most expensive operations involving hard disks are seeks. In order to improve overall performance, related data should be stored in a fashion to minimize the number of seeks. This is known as locality of reference, and the basic concept appears in a number of different contexts. Hard disks are organized into a series of blocks of a fixed size, typically enough to store several rows of the table. By organizing the table’s data so rows fit within these blocks, and grouping related rows onto sequential blocks, the number of blocks that need to be read or sought is minimized in many cases, along with the number of seeks.
A survey by Pinnecke et al.[1] covers techniques for column-/row hybridization as of 2017.
...