Tags

, , ,

Having recently spent considerable time breaking my head over a particular file format – Portable Executable (PE), I thought it would be wiser to jot learnings from it and use it as a reference. Understanding this format can give us valuable information about an executable, its OS structure, its environment and what it intends to do ( benign or malicious ) – without actually executing it! The PE file format was introduced by Microsoft since its Windows NT v3.1 (July ’93) and has been in use ever since then . It is used by majority of windows executables and Dynamic Link Libraries (DLL). While it is uncommon in normal programs to include DLLs, malware commonly use them to remote execute their attacks and malicious codes. Throughout this article, apart from deconstructing the PE file format, I will also point out to features/values which make an executable look suspicious/malicious.

Note : For those wondering what DLLs are ( Hey, I did that too!) , they are nothing but shared libraries used by processes. They are exposed by Windows API (WinAPI). Multiple processes can link to a DLL during runtime without any need for re-linking or re-compiling. Wiki has a nice article on DLLs for further reading.

PE File Structure

PE files contain data in a linear stream, beginning with the MS-DOS header, program stub, PE file header,  PE Optional header and PE Sections. It contains information about the code, libraries used by it, physical address mappings and space requirements when its executed. From these, we can get an idea about the intent of the executable.

PE file format structure

PE file format structure

However, before we jump into individual sections of the PE file format, I would like to highlight a few ‘gotchas’.

1. The eternal big-endian vs little-endian. Network packets use big-endian format for packing bytes. While x86 architecture uses little-endian format. So all the data bits mentioned below are in little-endian format. And just to jog your memory, the diagram below depicts big vs little-endian ways of storing information.   Big endian vs Little endian  2. PE file structure differs on 32 and 64-bit architectures albeit very slightly. The PE format for 64-bit is called PE32+ , which contains all but one fields of PE. PE32+ also has other minor modifications, like having 64 bit width length, instead of 32 bit.

3. The structures are named IMAGE_*** and the naming threw me off for a while, even though it might not be the same case for all! I was just expecting them to be named as PE_***

MS-DOS Header

The first section of the PE file format is a legacy section stuck since MS-DOS days. MS-DOS header has been retained to make PE file formats compatible with MS-DOS (< v2) and Windows (< v3) OS. That said, it contains 2 fields of importance even today – e_lfanew and e_magic (in bold below). USHORT e_magic for MS-DOS should contain 0x5A4D – the ASCII representation of MZ , initials of one of the architects of MS-DOS. Now, isn’t that a neat way to sign your designs ?! 😉 LONG e_lfanew contains offset to the PE header section, which helps in locating its start in physical address. The physical address is calculated as MS_DOS header base + e_lfanew offset.

typedef struct _IMAGE_DOS_HEADER {  // DOS .EXE header
    USHORT e_magic;         // Magic number
    USHORT e_cblp;          // Bytes on last page of file
    USHORT e_cp;            // Pages in file
    USHORT e_crlc;          // Relocations
    USHORT e_cparhdr;       // Size of header in paragraphs
    USHORT e_minalloc;      // Minimum extra paragraphs needed
    USHORT e_maxalloc;      // Maximum extra paragraphs needed
    USHORT e_ss;            // Initial (relative) SS value
    USHORT e_sp;            // Initial SP value
    USHORT e_csum;          // Checksum
    USHORT e_ip;            // Initial IP value
    USHORT e_cs;            // Initial (relative) CS value
    USHORT e_lfarlc;        // File address of relocation table
    USHORT e_ovno;          // Overlay number
    USHORT e_res[4];        // Reserved words
    USHORT e_oemid;         // OEM identifier (for e_oeminfo)
    USHORT e_oeminfo;       // OEM information; e_oemid specific
    USHORT e_res2[10];      // Reserved words
    LONG   e_lfanew;        // File address of new exe header
  } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

MS-DOS  Real-Mode Stub Program

In older OS versions, this contained code from where the application began executing. For Windows v3 OS onwards, It contains a “stub” program – code which does nothing more than print error statements if the OS version is incompatible. By default it is Winstub.exe, which prints compatible version of OS for the PE.

PE file signature & header

For valid PE files, the signature contains version of target OS. It should be set to 0x00004550, ASCII for “PE00”. File header contains metadata about the file. Of special interest to a malware analyst are Number of Sections, Time Date Stamp and Size Of Optional Header. Time Date Stamp gives us an idea of whether it is an old virus or new attack. An exception to this are Delphi programs. They have a default compilation date of June 19, 1992.

typedef struct _IMAGE_FILE_HEADER {
    USHORT  Machine;
    USHORT  NumberOfSections;     // Number of PE sections 
    ULONG   TimeDateStamp;        // Date of compilation.
    ULONG   PointerToSymbolTable; // Rarely used for offset to symbol table
    ULONG   NumberOfSymbols;      // Number of entries in symbol table
    USHORT  SizeOfOptionalHeader; // Calculates offset of PE Optional Header
    USHORT  Characteristics;      // Flag bits for file attributes
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

#define IMAGE_SIZEOF_FILE_HEADER             20

PE Optional Header

The optional Header section is a misnomer , because it is a required section in every PE file. This section contains a mine of information which can be used by malicious programs to calculate physical memory addresses. It includes initial stack size, heap size, ImageBase address , OS version , Image version, program entry point location and DLL characteristics . The total size of the OptionalHeader is 224 bytes.

typedef struct _IMAGE_OPTIONAL_HEADER {
    //
    // Standard fields.
    //
    USHORT  Magic;                 // Type of header (0x10b for HDR32 , 0x20b for HDR64)
    UCHAR   MajorLinkerVersion;    // Major version of linker used for this executable
    UCHAR   MinorLinkerVersion;    // Minor version of linker used for this executable
    ULONG   SizeOfCode;            // Size of all sections in executable code
    ULONG   SizeOfInitializedData; // Size of combined initialized data segments
    ULONG   SizeOfUninitializedData; // Usually 0, since linker optimizes to append uninitialized data sections at end of regular section
    ULONG   AddressOfEntryPoint;   // Points to runtime lib code to run main/DLLMain/WinMain
    ULONG   BaseOfCode;            // Base address of 1st byte of code (from .text section)
    ULONG   BaseOfData;            // Base address of 1st byte of data 
    //
    // NT additional fields.
    //
    ULONG   ImageBase;             // Load address of this file. Default is 0x400000 (EXE's) , 0x10000000 (DLLs)
    ULONG   SectionAlignment;      // Must be >= file alignment
    ULONG   FileAlignment;         // For x86 , usually 0x200 or 0x1000
    USHORT  MajorOperatingSystemVersion; // Target OS version for executable
    USHORT  MinorOperatingSystemVersion; // Target minor OS version 
    USHORT  MajorImageVersion;     // Usually set same as Linker versions
    USHORT  MinorImageVersion;     // Usually set same as Linker versions 
    USHORT  MajorSubsystemVersion; // For target subsystem if any specific env requirements
    USHORT  MinorSubsystemVersion; // Same as above
    ULONG   Reserved1;
    ULONG   SizeOfImage;           // Amount of memory needed by image to execute
    ULONG   SizeOfHeaders;         // (MS-DOS header + PE headers + section header) rounded up to multiple of file alignment
    ULONG   CheckSum;              // Checksum calc by IMAGEHELP.DLL's CheckSumMappedFile API
    USHORT  Subsystem;             // Expected subsystem by the PE
    USHORT  DllCharacteristics;    // 8 bit flags for DLL operating modes & options
    ULONG   SizeOfStackReserve;    // Maximum size of stack per thread per process, default 1 MB
    ULONG   SizeOfStackCommit;     // Part of Stack reserve committed initially, default 4 KB 
    ULONG   SizeOfHeapReserve;     // Maximum size of heap per process per thread, default 1 MB
    ULONG   SizeOfHeapCommit;      // Part of heap reserve committed initially, default 4 KB
    ULONG   LoaderFlags;           // Deprecated 
    ULONG   NumberOfRvaAndSizes;   // Number of data directory entries
    IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]; 
} IMAGE_OPTIONAL_HEADER, *PIMAGE_OPTIONAL_HEADER;

PE section headers

Apart from base addresses in PE Optional Header, sections of the PE file contain useful information too. Each section header is of 40 bytes, defined in no particular order. Section names are used to identify them. The structure of section header is show below.

#define IMAGE_SIZEOF_SHORT_NAME              8

typedef struct _IMAGE_SECTION_HEADER {
    UCHAR   Name[IMAGE_SIZEOF_SHORT_NAME]; // ASCII name of section , not guaranteed to end in 
    union {
            ULONG   PhysicalAddress;   // Deprecated
            ULONG   VirtualSize;      // Used size of section. Can be > SizeOfRawData
    } Misc;
    ULONG   VirtualAddress;         // Address where section begins in memory.     
    ULONG   SizeOfRawData;          // Size of data stored, in multiple of file alignment 
    ULONG   PointerToRawData;       // Offset to where data for section begins
    ULONG   PointerToRelocations;   // Offset for relocations, set to 0 in .exe
    ULONG   PointerToLinenumbers;   // Offset for line_number sections, if used
    USHORT  NumberOfRelocations;    // Set to 0 in .exe
    USHORT  NumberOfLinenumbers;    // Number of lines set in executables
    ULONG   Characteristics;        // Flags of attributes set in linker's options
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

Some useful sections of the PE file are :

    • .text – Contains the executable code , with opcodes and supporting data.
    • .rdata – Contains global scoped, read-only data like strings, constants, debug information. It also contains information about imported and exported functions if the .idata & .edata sections are missing.
    • .data – Contains global & static variables accessed throughout the program. In C , any variable declared as extern or static storage class , is stored in the .data section
    • .idata – If present contains import functions information.
    • .edata – If present contains export functions information.
    • .bss – Contains uninitialized data of application.
    • .pdata – Contains exception handling information. It is absent in 32-bit executables.
    • .rsrc – Contains resource information for a module, in the form of a resource tree.
    • .reloc – Contains information about relocation of library files.
    • .debug – Usually at the end of the sections, it contains debug information.

A simple HelloWorld program has just the .text and .data sections. With increase in lines of code, there is an increase in the number & size of sections. Some commonly used DLLs are :

    • Kernel32.dll – Core functionality (memory , files & hardware access)
    • Advapi32.dll – Advanced functionality (access to service manager , registry)
    • User32.dll – Access to user interfaces (buttons, UI, I/O devices)
    • Ntdll.dll – Interface to windows kernel . Sometimes malicious rootkits use Ntdll.dll to perform kernel activties
    • WSock32.dll – API to access transport protocols like FTP , SNMP  and other networking APIs
    • Wininet.dll – Application level transport protocols implementations.

On a concluding note, the PE file format headers contain a mine of information which can guide towards the intent and behavior of the program, OS and its environment, without even running it dynamically. Any DLL called which grants kernel/ root access should be looked upon as suspicious. Eg: About 93% of suspicious samples studied use WSock32.dll with malicious intent. Thus, information from analyzing a PE file gives a huge edge when you have a suspicious executable sample, but don’t really want to run it and make your own host a target system!

References :

  1. http://msdn.microsoft.com/en-us/magazine/cc301805.aspx
  2. http://www.csn.ul.ie/~caolan/publink/winresdump/winresdump/doc/pefile2.html
  3. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software by Michael Sikorski , Andrew Honig