Heap

Heap is very platform and implementation specific. So windows and linux have different heap implementation. We are looking at glibc (linux) heap implementation.

Arena

every thread has a separate arena
Arena is basically a separate heap per thread.
there is a parent arena that is basically the first arena of the program.
there are a total of 8 * number of cores arenas.

malloc chunk

chunks are the basic unit of memory in heap.

When the heap manager allocates memory from the heap, it typically allocates a chunk of memory that is larger than the requested size. This is because the heap manager also needs to store alignment padding and metadata with the allocated chunk, such as information about the size and status of the chunk. The metadata is stored alongside the chunk of memory, and so the overall size of the allocated block is larger than the size requested by the program.

alignment of chunk = 2 * sizeof(size_t)

so for 32 bit systems, allocation will be 8 byte aligned. for 64 bit systems, allocation will be 16 bit aligned.

The heap manager will mark the chunk allocated and returns a pointer to the aligned “user data” region inside the chunk.

Chunk Allocation strategies

previously freed chunk of memory and chunk is big enough that same chunk gets allocated
space available at top of heap, allocate space from top of heap
otherwise, ask kernel to add new memory, then ask heap to allocate from top of heap
if all fails, malloc returns NULL

Allocating from freed chunks

Heap manager keeps track of freed chunks in a series of different linked list called “bins”.
when an allocation request is made, heap managers searches these bins for a chunk that is big enough to service the request.
removes the chunk from the bin and mark it as allocated.
returns pointer to user data.

there are different types of bins:

fast bins
unsorted bins
small bins
large bins
per-thread tcache.

    An allocated chunk looks like this:


    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of previous chunk, if unallocated (P clear)  |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of chunk, in bytes                     |A|M|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             User data starts here...                          .
	    .                                                               .
	    .             (malloc_usable_size() bytes)                      .
	    .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             (size of chunk, but used for application data)    |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of next chunk, in bytes                |A|0|1|
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Allocating from top of heap

if no free chunks available that can service request, heap manager allocates space from top of heap.
heap manager first looks at the free space at the top of heap otherwise known as top chunk.

Asking kernel from more memory at the top of the heap

Once the free space at the top of the heap is used up, the heap manager will have to ask the kernel to add more memory to the end of the heap.
on the initial heap, the heap manager asks the kernel to allocate memory at the end of heap by calling the sbrk syscall.
sbrk internally uses brk syscall.
Eventually, expanding the heap with sbrk will fail. Heap will grow so large that it will collide with libraries, or a thread’s stack region. Once the heap reaches this point, the heap manager will resort to attaching new non-contiguous memory to the initial program heap using calls to mmap.

Chunk Metadata

definition of chunk:

struct malloc_chunk {

  INTERNAL_SIZE_T   mchunk_prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T   mchunk_size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

During malloc, before returning the pointer to user data, chunk size is written in mchunk_size field. mchunk_size contains chunk size and 3 flags

the 3 flags are:

A flag if bit is 1, chunk is not from main arena else chunk comes from main arena
M flag if bit is 1, chunk was allocated with mmap call and is not a part of heap and was allocated off-heap. When freed the whole chunk is returned back to OS via munmap syscall rather than attempting to recycle it.
P flag if bit is 1, previous chunk is in use, and should not used to safely join with previous chunk to create much larger chunk

How does free works

Check that allocation is aligned on an 8-byte boundary, since malloc ensures all allocations are aligned.
Check that the chunk’s size field isn’t impossible - either because it is too small, too large, not aligned size or would overlap the with end of process’ address space.
check that chunk lies within boundaries of arena.
check that chunk is not already marked as free by checking the corresponding “P” bit that lies in the metadata at the start of next chunk.

Free chunk metadata

Free chunks also have metadata.

They store chunk size, A and P field. They do not store M field because mmaped chunks get unmmaped right away.

Bins

a list structure (doubly or singly linked list) of non-allocated chunks.
differentiated based on the size of chunks
Bins keep the track of freed chunks.
for performance reasons heap manager keeps track by maintaining a series of lists called “bins”, which are designed to maximize speed of allocation and frees.

Small bins

There are 62 of them.
Every chunk less than 512 bytes on 32 bits systems (and 1024 bytes in 64 bit systems) has a corressponding small bin.
Each small bin stores only one size of chunk, they are automatically ordered, so insertion and removal of entries on these lists is incredibly fast.
act as queue

Large Bins

For chunks over 512 bytes (or 1024 bytes on 64 bit), heap manager uses “large bins”.
instead storing the chunks of same size, they store chunks within a size range.
insertions onto the bins have to be manually sorted, and allocations from the list require traversing the list. This makes large bins slower.
Large bins are used less frequently because program tend to allocate small allocations.
act as queue

Unsorted bins

Small and large chunks when freed end up in this bin.
primary purpose of this bin is to act as a cache layer to speed up allocations and deallocations requests.

Fastbins

A total of 10 fast bins.
Contains chunk of same fixed size - 16, 24, 32, 40, 48, 56, 64, 72, 80 or 88 bytes.
Act as a stack.

tcache

tcache is a caching structure in ptmalloc to speed up repeated (small) allocations. Like other bins, tcache comes into play when freeing a chunk.
tcache is implemented by a single linked list.
each tcache entry is differentiated by 16 bytes. So first entry is of 16 bytes, second is of 32 bytes, third is of 48 bytes so on …

tcache structure looks like the following:

/* There is one of these for each thread, which contains the
   per-thread cache (hence "tcache_perthread_struct").  Keeping
   overall size low is mildly important.  Note that COUNTS and ENTRIES
   are redundant (we could have just counted the linked list each
   time), this is for performance reasons.  
*/

typedef struct tcache_perthread_struct
{
  uint16_t counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

counts is an array of TCACHE_MAX_BINS elements. Each element store the count of i^th entries. entries is a linked list that acts as a stack. Each element stores the pointer to latest freed tcache_entry. By dividing the allocation size by 16 we can get the right bin to store the tcache entry.

pwn.college has given a great graphical explanation:

typedef struct tcache_entry
{
  struct tcache_entry *next;
  /* This field exists to detect double frees.  */
  uintptr_t key;
} tcache_entry;

next is a pointer to previously freed entry/chunk (tcache acts as stack). key is used to verify which thread the entry corresponds to.

/* Process-wide key to try and catch a double-free in the same thread.  */
static uintptr_t tcache_key;

When freeing

we select the right bin.
```
idx = (freed_allocation_size - 1) / 16
```
We check using the key field to see if entry hasn’t already been freed before(double-free):
We push the freed allocation to the front of the disk.
We record the tcache_perthread_struct(otherwise the key) associated with the freed allocation. This is because we need to record the association of chunk with the thread i.e with what thread does this chunk belong to.

When allocating

We do everything we did during free but in reverse order.
We select the bin.
We check the appropiate cache for available entries.
We reuse the allocation in front of the list if available.

These are just the basic terminoligies and knowledge required to understand heap attacks. Next article will talk about different types of heap attacks

References

https://azeria-labs.com/heap-exploitation-part-1-understanding-the-glibc-heap-implementation/
https://azeria-labs.com/heap-exploitation-part-2-glibc-heap-free-bins/
https://jackfromeast.site/2023-01/understand-the-heap-a-beautiful-mess.html
https://infosecwriteups.com/the-toddlers-introduction-to-heap-exploitation-part-2-d1f325b74286 
https://ir0nstone.gitbook.io/notes/types/heap/bins
http://tukan.farm/2017/07/08/tcache/
https://pwn.college/cse494-s2023/dynamic-allocator-misuse
https://codebrowser.dev/glibc/glibc/malloc/malloc.c.html
https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/
https://guyinatuxedo.github.io/25-heap/index.html 
https://github.com/limitedeternity/HeapLAB 
https://www.youtube.com/watch?v=A-Qf_Q_AeFw