Anatomy of an ELF Part 1: The Header

With this post I will start the first of a series of articles about ELFs. Unfortunately I will not be dealing with small and funny creatures, but with the Executable and Linkable Format files.

These files are the standard in Unix for executable binaries, shared libraries, object files and core dumps, to name the most important use cases.

Why ELFs are important

Both exploitation attacks and infections use the internal structure and the mechanisms involving ELF files. To make an example, an attacker might want to exploit an heap overflow by overriding an entry of the GOT (Global Offset Table). In order to have a deep understanding of the current attack techniques and to predict the future ones it is necessary to know in detail how the files that we are trying to protect are structured and what processes they go thorough.

ELF files. A general overview

As stated before, ELF stands for Executable and Linkable Format. This format was initially designed for 32bit architectures, but the differences with the 64bit format involve mostly the size of pointers and therefore the extension to this architecture happened very easily and without major modifications.

There are mainly three types of ELF files:

Relocatable files. These are often called ‘object files’. In general Relocatable files are pieces of position independent code suitable for linking. They can be linked with other object files in order to create a process image.
Executable files. These files are often called ‘programs’. They are the most interesting for exploitation (overflows, string format bugs etc..) and they represent the ‘entry point’ for a process.
Shared Objects. Often called ‘shared libraries’, these files can be first linked with other relocatable or shared object files and then dynamically linked creating a process image.

There are other examples of types of ELFs, such as core dumps or also the kernel boot image, but these are not really very interesting. Overview of the ELF structure

ELF structure is designed so that it offers two different views, one from the linking perspective and one from the execution perspective.

An overview can be observed in this figure, taken from the official documentation.

Elf structure from linking and execution view.

As we can see, from the linking view the Program Header table is optional and the file is divided in many sections, which are referenced and described in the Section header table.

On the other hand, from the execution view the file is divided in segments, that include multiple sections, described in the Program header table. In this case is the section header table to be optional.

The only part that it is present for sure in both cases is the ELF header.

ELF header

The ELF header is the core topic of this article. It is present in every ELF file and it contains meta-information about the file itself. Let’s see its structure in detail.

if we run

man elf(5)

we can see that the header is a struct like this:

#define EI_NIDENT 16

typedef struct {
    unsigned char e_ident[EI_NIDENT];
    uint16_t  e_type;
    uint16_t e_machine;
    uint32_t e_version;
    ElfN_Addr e_entry;
    ElfN_Off e_phoff;
    ElfN_Off e_shoff;
    uint32_t e_flags;
    uint16_t e_ehsize;
    uint16_t e_phentsize;
    uint16_t e_phnum;
    uint16_t e_shentsize;
    uint16_t e_shnum;
    uint16_t e_shstrndx;
} ElfN_Ehdr;

In this struct, N stands for either 32 or 64, depending on the architecture. Before making some practical examples, let’s see what each of these fields means.

e_ident[EI_NIDENT]

This field of size 16 in our case is composed like this:

4 bytes of magic number to identify the ELF files
1 byte of class (32bit or 64bit)
1 byte to specify the data encoding (Little endian or big endian)
1 byte for the ELF version
The rest of the bytes are used as padding.

e_type

This field specifies the object file type in 2 bytes. Possible values for this are:

0 for NONE
1 for Relocatable
2 for Executable
3 for Shared object (ET_DYN)
4 for core file
from 0xff00 to 0xffff for processor specific types

e_machine

In these 16 bits the architecture required for the file is specified. There are quite a few possible values for this. Just to name a few, the value 3 stands for i386 while 7 stands for 8086.

e_version

This is not a very useful field. In general this is 1, which stands for ‘current’ version. 0 means invalid.

e_entry

This field concerns mostly executable files. In this case in fact, the value of e_entry is the virtual address to which the system has to transfer the control. If the file is not an executable (for example an object file), this is 0.

e_phoff

This value is the byte offset from the beginning of the file to reach the Program Header Table.

e_shoff

Similarly to the previous field, this offset points to the Section Header table.

e_flags

This field carries processor specific flags.

e_ehsize

This is the size in bytes of the ELF header.

e_phentsize

This field contains the size in bytes of one entry of the Program Header Table.

e_phnum

This value is the number of entries in the Program Header Table

e_shentsize

Similarly to e_phentsize, this is the size of one entry of the Section Header Table.

e_shnum

Similarly to e_phnum, this value is the number of entries in the Section Header Table.

e_shstrndx

This field is a bit more complicated and harder to understand without knowing more about symbols in ELFs. The value here represents the index of the Section Header Table that contains the section name string table. This will be discussed further in a future articles about symbols, but for now we can say that this is a pointer to a table that contains the section names.

ELF header in action

After having discussed the structure of the ELF header, let’s see in practice how to get the ELF header and what information we can get from it.

In order to do this, I will take the code used for the previous article and I will first compile it producing an object file. Note that this file is not linked and is not ready for the execution.

gcc -c code1.c

Now in order to inspect the header I run

readelf -h code1.o

The result is the following

Parsed header of object file

Readelf takes away most of the fun from us by already giving an interpretation of the field names and values, but we can still get some information just by looking at this header.

The OS is Unix
The file is a Relocatable File
The machine is based on a x64 architecture
The file is (obviously) not ready for execution, the entry value is 0
There is not a Program Headers Table (and this is ok for an object file)

Now, let’s perform also linking for this file and let’s create an executable.

gcc code1.o -o code1

We get this result:

Parsed header of executable ELF

As we can see, many fields are the same (the architecture is the same, the operating system is the same and also the data encoding).

The main differences are :

This ELF is of type 2, meaning an Executable file
The entry point for this file is at 0x400450
Since the file is now ready for the execution, it contains the mandatory Program Header Table

Useful resources

In order to read more about this topic I suggest the following readings:

Dennis, Andriesse. Practical Binary Analysis. NoStarch Press 2018.
Tool Interface Standard Executable and Linking Format Specification. It can be found here.
man elf(5).
Ryan ‘Elfmaster’ O’Neill, Learning Linux Binary Analysis. Packt 2016.

Conclusion

With this small practical example we concluded our short tour in the ELF header. In the next posts I will talk more about the Program Header Table.

As always, for any correction, feedback or question feel free to drop a mail to security[at]coolbyte[dot]eu.