Having recently spent considerable time breaking my head over a particular file format – Portable Executable (PE), I thought it would be wiser to jot learnings from it and use it as a reference. Understanding this format can give us valuable information about an executable, its OS structure, its environment and what it intends to do ( benign or malicious ) – without actually executing it! The PE file format was introduced by Microsoft since its Windows NT v3.1 (July ’93) and has been in use ever since then . It is used by majority of windows executables and Dynamic Link Libraries (DLL). While it is uncommon in normal programs to include DLLs, malware commonly use them to remote execute their attacks and malicious codes. Throughout this article, apart from deconstructing the PE file format, I will also point out to features/values which make an executable look suspicious/malicious.
Note : For those wondering what DLLs are ( Hey, I did that too!) , they are nothing but shared libraries used by processes. They are exposed by Windows API (WinAPI). Multiple processes can link to a DLL during runtime without any need for re-linking or re-compiling. Wiki has a nice article on DLLs for further reading.
PE File Structure
PE files contain data in a linear stream, beginning with the MS-DOS header, program stub, PE file header, PE Optional header and PE Sections. It contains information about the code, libraries used by it, physical address mappings and space requirements when its executed. From these, we can get an idea about the intent of the executable.
However, before we jump into individual sections of the PE file format, I would like to highlight a few ‘gotchas’.
1. The eternal big-endian vs little-endian. Network packets use big-endian format for packing bytes. While x86 architecture uses little-endian format. So all the data bits mentioned below are in little-endian format. And just to jog your memory, the diagram below depicts big vs little-endian ways of storing information. 2. PE file structure differs on 32 and 64-bit architectures albeit very slightly. The PE format for 64-bit is called PE32+ , which contains all but one fields of PE. PE32+ also has other minor modifications, like having 64 bit width length, instead of 32 bit.
3. The structures are named IMAGE_*** and the naming threw me off for a while, even though it might not be the same case for all! I was just expecting them to be named as PE_***
MS-DOS Header
The first section of the PE file format is a legacy section stuck since MS-DOS days. MS-DOS header has been retained to make PE file formats compatible with MS-DOS (< v2) and Windows (< v3) OS. That said, it contains 2 fields of importance even today – e_lfanew and e_magic (in bold below). USHORT e_magic for MS-DOS should contain 0x5A4D – the ASCII representation of MZ , initials of one of the architects of MS-DOS. Now, isn’t that a neat way to sign your designs ?! 😉 LONG e_lfanew contains offset to the PE header section, which helps in locating its start in physical address. The physical address is calculated as MS_DOS header base + e_lfanew offset.
typedef struct _IMAGE_DOS_HEADER { // DOS .EXE header
USHORT e_magic; // Magic number
USHORT e_cblp; // Bytes on last page of file
USHORT e_cp; // Pages in file
USHORT e_crlc; // Relocations
USHORT e_cparhdr; // Size of header in paragraphs
USHORT e_minalloc; // Minimum extra paragraphs needed
USHORT e_maxalloc; // Maximum extra paragraphs needed
USHORT e_ss; // Initial (relative) SS value
USHORT e_sp; // Initial SP value
USHORT e_csum; // Checksum
USHORT e_ip; // Initial IP value
USHORT e_cs; // Initial (relative) CS value
USHORT e_lfarlc; // File address of relocation table
USHORT e_ovno; // Overlay number
USHORT e_res[4]; // Reserved words
USHORT e_oemid; // OEM identifier (for e_oeminfo)
USHORT e_oeminfo; // OEM information; e_oemid specific
USHORT e_res2[10]; // Reserved words
LONG e_lfanew; // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
MS-DOS Real-Mode Stub Program
In older OS versions, this contained code from where the application began executing. For Windows v3 OS onwards, It contains a “stub” program – code which does nothing more than print error statements if the OS version is incompatible. By default it is Winstub.exe, which prints compatible version of OS for the PE.
PE file signature & header
For valid PE files, the signature contains version of target OS. It should be set to 0x00004550, ASCII for “PE00”. File header contains metadata about the file. Of special interest to a malware analyst are Number of Sections, Time Date Stamp and Size Of Optional Header. Time Date Stamp gives us an idea of whether it is an old virus or new attack. An exception to this are Delphi programs. They have a default compilation date of June 19, 1992.
typedef struct _IMAGE_FILE_HEADER {
USHORT Machine;
USHORT NumberOfSections; // Number of PE sections
ULONG TimeDateStamp; // Date of compilation.
ULONG PointerToSymbolTable; // Rarely used for offset to symbol table
ULONG NumberOfSymbols; // Number of entries in symbol table
USHORT SizeOfOptionalHeader; // Calculates offset of PE Optional Header
USHORT Characteristics; // Flag bits for file attributes
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;
#define IMAGE_SIZEOF_FILE_HEADER 20
PE Optional Header
The optional Header section is a misnomer , because it is a required section in every PE file. This section contains a mine of information which can be used by malicious programs to calculate physical memory addresses. It includes initial stack size, heap size, ImageBase address , OS version , Image version, program entry point location and DLL characteristics . The total size of the OptionalHeader is 224 bytes.
typedef struct _IMAGE_OPTIONAL_HEADER { // // Standard fields. // USHORT Magic; // Type of header (0x10b for HDR32 , 0x20b for HDR64) UCHAR MajorLinkerVersion; // Major version of linker used for this executable UCHAR MinorLinkerVersion; // Minor version of linker used for this executable ULONG SizeOfCode; // Size of all sections in executable code ULONG SizeOfInitializedData; // Size of combined initialized data segments ULONG SizeOfUninitializedData; // Usually 0, since linker optimizes to append uninitialized data sections at end of regular section ULONG AddressOfEntryPoint; // Points to runtime lib code to run main/DLLMain/WinMain ULONG BaseOfCode; // Base address of 1st byte of code (from .text section) ULONG BaseOfData; // Base address of 1st byte of data // // NT additional fields. // ULONG ImageBase; // Load address of this file. Default is 0x400000 (EXE's) , 0x10000000 (DLLs) ULONG SectionAlignment; // Must be >= file alignment ULONG FileAlignment; // For x86 , usually 0x200 or 0x1000 USHORT MajorOperatingSystemVersion; // Target OS version for executable USHORT MinorOperatingSystemVersion; // Target minor OS version USHORT MajorImageVersion; // Usually set same as Linker versions USHORT MinorImageVersion; // Usually set same as Linker versions USHORT MajorSubsystemVersion; // For target subsystem if any specific env requirements USHORT MinorSubsystemVersion; // Same as above ULONG Reserved1; ULONG SizeOfImage; // Amount of memory needed by image to execute ULONG SizeOfHeaders; // (MS-DOS header + PE headers + section header) rounded up to multiple of file alignment ULONG CheckSum; // Checksum calc by IMAGEHELP.DLL's CheckSumMappedFile API USHORT Subsystem; // Expected subsystem by the PE USHORT DllCharacteristics; // 8 bit flags for DLL operating modes & options ULONG SizeOfStackReserve; // Maximum size of stack per thread per process, default 1 MB ULONG SizeOfStackCommit; // Part of Stack reserve committed initially, default 4 KB ULONG SizeOfHeapReserve; // Maximum size of heap per process per thread, default 1 MB ULONG SizeOfHeapCommit; // Part of heap reserve committed initially, default 4 KB ULONG LoaderFlags; // Deprecated ULONG NumberOfRvaAndSizes; // Number of data directory entries IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]; } IMAGE_OPTIONAL_HEADER, *PIMAGE_OPTIONAL_HEADER;
PE section headers
Apart from base addresses in PE Optional Header, sections of the PE file contain useful information too. Each section header is of 40 bytes, defined in no particular order. Section names are used to identify them. The structure of section header is show below.
#define IMAGE_SIZEOF_SHORT_NAME 8
typedef struct _IMAGE_SECTION_HEADER {
UCHAR Name[IMAGE_SIZEOF_SHORT_NAME]; // ASCII name of section , not guaranteed to end in
union {
ULONG PhysicalAddress; // Deprecated
ULONG VirtualSize; // Used size of section. Can be > SizeOfRawData
} Misc;
ULONG VirtualAddress; // Address where section begins in memory.
ULONG SizeOfRawData; // Size of data stored, in multiple of file alignment
ULONG PointerToRawData; // Offset to where data for section begins
ULONG PointerToRelocations; // Offset for relocations, set to 0 in .exe
ULONG PointerToLinenumbers; // Offset for line_number sections, if used
USHORT NumberOfRelocations; // Set to 0 in .exe
USHORT NumberOfLinenumbers; // Number of lines set in executables
ULONG Characteristics; // Flags of attributes set in linker's options
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;
Some useful sections of the PE file are :
- .text – Contains the executable code , with opcodes and supporting data.
- .rdata – Contains global scoped, read-only data like strings, constants, debug information. It also contains information about imported and exported functions if the .idata & .edata sections are missing.
- .data – Contains global & static variables accessed throughout the program. In C , any variable declared as extern or static storage class , is stored in the .data section
- .idata – If present contains import functions information.
- .edata – If present contains export functions information.
- .bss – Contains uninitialized data of application.
- .pdata – Contains exception handling information. It is absent in 32-bit executables.
- .rsrc – Contains resource information for a module, in the form of a resource tree.
- .reloc – Contains information about relocation of library files.
- .debug – Usually at the end of the sections, it contains debug information.
A simple HelloWorld program has just the .text and .data sections. With increase in lines of code, there is an increase in the number & size of sections. Some commonly used DLLs are :
- Kernel32.dll – Core functionality (memory , files & hardware access)
- Advapi32.dll – Advanced functionality (access to service manager , registry)
- User32.dll – Access to user interfaces (buttons, UI, I/O devices)
- Ntdll.dll – Interface to windows kernel . Sometimes malicious rootkits use Ntdll.dll to perform kernel activties
- WSock32.dll – API to access transport protocols like FTP , SNMP and other networking APIs
- Wininet.dll – Application level transport protocols implementations.
On a concluding note, the PE file format headers contain a mine of information which can guide towards the intent and behavior of the program, OS and its environment, without even running it dynamically. Any DLL called which grants kernel/ root access should be looked upon as suspicious. Eg: About 93% of suspicious samples studied use WSock32.dll with malicious intent. Thus, information from analyzing a PE file gives a huge edge when you have a suspicious executable sample, but don’t really want to run it and make your own host a target system!
References :
- http://msdn.microsoft.com/en-us/magazine/cc301805.aspx
- http://www.csn.ul.ie/~caolan/publink/winresdump/winresdump/doc/pefile2.html
- Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software by Michael Sikorski , Andrew Honig