【Linux】内存布局(memory Layout)

Posted by 西维蜀黍 on 2021-11-02, Last Modified on 2022-02-19

Background

When the program runs, the processing is performed in two spaces called Kernel Space and User Space on the system. The two processing spaces implicitly interfere with each other and the processing of the program proceeds.

  • Kernel Space

The kernel space can be accessed by user processes only through the use of system calls that are requests in a Unix-like operating system such as input/output (I/O) or process creation.

  • User Space

The user space is a computational resource allocated to a user, and it is a resource that the executing program can directly access. This space can be categorized into some segments.

Overview

High Addresses ---> .----------------------.
                    |      Kernel Space    |
                    |----------------------|
                    |                      |   Functions and variable are declared
                    |         STACK        |   on the stack.
base pointer ->     | - - - - - - - - - - -|
                    |           |          |
                    |           v          |
                    :                      :
                    .                      .   The stack grows down into unused space
                    .         Empty        .   while the heap grows up. 
                    .                      .
                    .                      .   (other memory maps do occur here, such 
                    .                      .    as dynamic libraries, and different memory
                    :                      :    allocate)
                    |           ^          |
                    |           |          |
 brk point ->       | - - - - - - - - - - -|   Dynamic memory is declared on the heap
                    |          HEAP        |
                    |                      |
                    |----------------------|
                    |          BSS         |   Uninitialized data (BSS)
                    |----------------------|   
                    |          Data        |   Initialized data (DS)
                    |----------------------|
                    |          Text        |   Binary code
Low Addresses ----> '----------------------'

Stack

It located at a higher address and grows downwards to lower addresses

LIFO

The stack is LIFO ( last-in-first-out ) data structure. In computer science, a stack is an abstract data type that serves as a collection of elements, with two principal operations:

  • push, which adds an element to the collection, and
  • pop, which removes the most recently added element that was not yet removed.

Call a function with Stack Frames

Calling a function is the same as pushing the called function execution onto the top of the stack, and once that function completes, the results are returned popping the function off the stack. The dataset pushed for function call is named a stack frame, and it contains the following data.

  • the arguments (parameter values) passed to the routine
  • the return address back to the routine’s caller
  • space for the local variables of the routine

And

  • Each function has one stack frame.
  • Stack frames contain the function’s local variables arguments and returned values.
  • SP (stack pointer) register tracks the top of the stack.

Demo

The following is an example of C program and picture of stack memory allocation.

int main() {
  int result = getResult();
}int getResult() {
  int num1 = getNum1();
  int num2 = getNum2();
  return num1 + num2;
}int getNum1() {
  return 10;
}int getNum2() {
  return 20;
}

When the function is called, the stack frame is pushed to the top of stack. Then the process is executed and the function goes out of scope, the stack frame pops from the top.

As described above, it can only store limited scope data. However, In memory management, it runs very fast because the stack pointer register simply tracks the top of the stack.

Heap

The Heap is the segment where dynamic memory allocation usually takes place. This area commonly begins at the end of the BSS segment and grows upwards to higher memory addresses.

  • The Heap area is shared by all shared libraries and dynamically loaded modules in a process.
  • It grows and shrinks in the opposite direction of the stack.

garbage collection

It’s our responsibility to free memory on the heap. The objects on the heap lead to memory leaks if they are not freed. In garbage-collected languages, the garbage collector frees memory on the heap and prevents memory leaks.

The unused area may be generated on the heap by repetition of allocation and release of the area. A state in which “unused nodes” and “in use” nodes are mixed, that is, The state in which unused areas are divided into pieces by garbage, is called a fragmentation state. In this state, the overhead of searching for free space and degradation for “locality of reference” of the data, so the performance is relatively low.

management in heap

  • Heap area managed by the memory management functions like malloc(), calloc(), free(), etc
    • The malloc() and calloc() functions are used to allocate the memory in the heap.
    • The free() function is used to deallocate the memory from the heap.
#include <stdio.h>
int main(void)
{
    char *pStr = malloc(sizeof(char)*4); //stored in heap
    return 0;
}

Stack vs Heap

The stack is faster because all free memory is always contiguous. Unlike heap, No list need to keep a list of all the free memory, only one pointer to the current top of the stack. Each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor’s cache, making it very fast. Therefore, I recommend using stack as long as you don’t need to use heap.

BSS (Block Started by Symbol) / Uninitialized data segment

  • It contains all uninitialized global and static variables.
  • All variables in this segment initialized by the zero(0) and pointer with the null pointer.
  • The program loader allocates memory for the BSS section when it loads the program.
#include <stdio.h>
int data1; // Uninitialized global variable stored in BSS
int main(void)
{
    static int data2;  // Uninitialized static variable stored in BSS
    return 0;
}

DS (Initialized data segment)

  • It contains the explicitly initialized global and static variables.
  • The size of this segment is determined by the size of the values in the program’s source code and does not change at run time.
  • It has read-write permission so the value of the variable of this segment can be changed at run time.
  • This segment can be further classified into an initialized read-only area and an initialized read-write area.
#include <stdio.h>
int data1 = 10 ; //Initialized global variable stored in DS
int main(void)
{
    static int data2 = 3;  //Initialized static variable stored in DS
    return 0;
}

Text

  • The text segment contains a binary of the compiled program.
  • The text segment is a read-only segment that prevents a program from being accidentally modified.
  • It is sharable so that only a single copy needs to be in memory for frequently executed programs such as text editors etc.

Demo

Let see few examples to understand the memory layout of the C program.

#include <stdio.h> 
  
int main(void) 
{ 
    return 0; 
}
$ gcc memory-layout.c -o memory-layout
$ size memory-laout
   text	   data	    bss	    dec	    hex	filename
   1418	    544	      8	   1970	    7b2	memory-laout
  • Now add the global and static uninitialized variable and check the size.
#include <stdio.h> 
 
int data1; //Stored in uninitialized area
 
int main(void) 
{ 
    static int data2; //Stored in uninitialized area
   
    return 0; 
}
$ gcc memory-layout.c -o memory-layout
$ size memory-laout
   text	   data	    bss	    dec	    hex	filename
   1418	    544	     16	   1978	    7ba	memory-laout

The size of .bss increases as per the uninitialized global and static variables.

If add the initialized static variable and check the size.

#include <stdio.h> 
    
int main(void) 
{ 
    static int data =10; // Stored in initialized area
    return 0; 
}
$ gcc memory-layout.c -o memory-layout
$ size memory-laout
   text	   data	    bss	    dec	    hex	filename
   1418	    548	      4	   1970	    7b2	memory-laout

You can see the size of the data segment has been increased.

Reference