An adaptive approach for Linux memory analysis based on kernel code reconstruction

Zhang, Shuhui; Meng, Xiangxu; Wang, Lianhai

doi:10.1186/s13635-016-0038-z

Research
Open access
Published: 27 June 2016

An adaptive approach for Linux memory analysis based on kernel code reconstruction

Shuhui Zhang^1,2,
Xiangxu Meng¹ &
Lianhai Wang²

EURASIP Journal on Information Security volume 2016, Article number: 14 (2016) Cite this article

8887 Accesses
8 Citations
6 Altmetric
Metrics details

Abstract

Memory forensics plays an important role in security and forensic investigations. Hence, numerous studies have investigated Windows memory forensics, and considerable progress has been made. In contrast, research on Linux memory forensics is relatively sparse, and the current knowledge does not meet the requirements of forensic investigators. Existing solutions are not especially sophisticated, and their complicated operation and limited treatment range are unsatisfactory. This paper describes an adaptive approach for Linux memory analysis that can automatically identify the kernel version and recovery symbol information from an image. In particular, given a memory image or a memory snapshot without any additional information, the proposed technique can automatically reconstruct the kernel code, identify the kernel version, recover symbol table files, and extract live system information. Experimental results indicate that our method runs satisfactorily across a wide range of operating system versions.

1 Introduction

The physical memory of a computer is highly useful but can be a challenging resource for the collection of digital evidence. Physical memory may first appear to be a large, amorphous, and unstructured collection of data. In fact, by examining a memory image, we can extract details of volatile data, such as running processes, logged-in users, current network connections, users’ sessions, drivers, and open files. Although criminals tend to avoid leaving any evidence in a computer’s persistent storage, it is extremely hard for them to completely remove their footprints from the memory. In some cases, physical memory is the only place where evidence can be found. In a computer operating system (OS) that boots and runs completely from CD-ROM, nearly all of the valuable information exists in the physical memory of the computer. Therefore, memory forensics is becoming increasingly important.

Before 2005, the physical memory of a computer was mainly captured to retrieve strings, e.g., passwords, credit card numbers, fragments of chat conversations, IP addresses, or email addresses. In 2005, the Digital Forensics Research Workshop (DFRWS) organized a memory analysis challenge [1]. Since then, the capture and analysis of the content of physical memory, known as memory forensics, has become an area of intense research and experimentation [2]. Numerous studies have analyzed Windows memory images. The MemParser tool enables an examiner to load a physical memory dump of certain Windows systems, reconstruct process information, and extract data relating to specific processes [3]. PTFinder is a proof-of-concept implementation with the ability to reveal hidden and terminated processes and threads [4]. A method based on the Kernel Processor Control Region (KPCR) structure in Windows was proposed to determine the OS version and realize the translation from virtual address to physical address [5]. Moreover, technical details related to memory analysis have been discussed, including address translation [6, 7], pool allocation [8], swap integration [6], carving out memory [9, 10], sensitive information extraction [11, 12], and malicious code detection [13]. In short, memory analysis has been used in the wider context of digital forensics, virtual machine introspection, and malware detection [14].

Compared with Windows memory forensics, memory analysis of Linux systems presents some practical challenges. Current Linux memory analysis technologies require precise knowledge of the OS edition and kernel symbol information, which is generated at compile time. Kernel symbol information varies with the different OS editions. Furthermore, the Linux kernel is highly configurable. During the kernel build process, users may specify a large number of different options through the kernel’s configuration system. These options affect the kernel symbol information, resulting in distinct key structures for the same OS version. To obtain kernel symbol information, one must configure an environment in which the OS version and configuration options are exactly the same as those in the target system. For incident response applications, obtaining precise and relevant information is currently a slow, manual process, which limits its usefulness in rapid triaging.

To overcome these problems, this paper describes techniques that allow for the automatic adaptation of memory analysis tools for a wide range of kernel versions. Using dynamic reconstruction of the kernel code, it is possible to identify the OS version, disassemble correlative functions, and acquire kernel symbol information. Some other memory analysis systems rely on information that may not be available, whereas the proposed system only needs the analyzed memory dump. The main contributions of this paper are as follows:

We present a multi-aspect approach to automatically identify the precise kernel version when provided with only a physical memory dump. The approach is universal, and does not rely on any prior knowledge for particular OSs.
We devise a set of novel techniques to obtain kernel symbols from the physical memory dump instead of obtaining symbol information from the target kernel’s “System.map” file. Each time a new kernel is compiled, various symbols are assigned different addresses. New kernel versions of Linux are released frequently, and it is inconvenient to find all System.map files.
As the symbols in the System.map file are important, and symbols exported from modules are critical for investigators, a method of parsing symbols exported from modules is presented. To recover and analyze loaded kernel module information, it is essential to understand the relevant data structures used by the target OS. As the inclusion or exclusion of a kernel configuration option can cause the insertion or removal of several members of key structures, analysis methods that rely on a stable key structure layout are inadvisable. A method to accurately and dynamically build representative data structures is also presented.

Based on the above techniques, we develop a new Linux memory analysis system named RAMAnalyzer that can identify the OS version and acquire symbol information automatically. Live system information can subsequently be retrieved. We examine the performance of RAMAnalyzer on various recent Linux kernels, and show that it is an adaptive solution for the Linux memory analysis problem.

The remainder of this paper is organized as follows. Section 2 introduces some background information. The proposed techniques based on dynamic reconstruction are described thoroughly in Sections 3 and 4. In Section 5, we evaluate the proposed forensics tool in terms of effectiveness and performance. The final section summarizes this study and states our conclusions and indicates some opportunities for future research in this area.

2 Background and related work

2.1 Problem statement

The main problem encountered by memory analysis tools when parsing the Linux kernel memory is the need for prior knowledge of the precise kernel version and symbol information. In an incident response and live analysis context, this prior knowledge may not always be obtainable. We assume the following scenarios:

The specific target kernel version is unknown.
The kernel version is known but neither the System.map file nor /proc/kallsyms information of the target system is available.

Under these scenarios, our system has three major goals: precision, efficiency, and generality.

Precision. The OS family and precise version are both required. For instance, given a Linux kernel, we need to know not only its major version (e.g., 2.6 or 3.10), but also its minor version because the symbol’s information and data structures of various Linux kernels are different.
Efficiency. It is necessary to automatically obtain information for the OS version and symbols within a short period of time.
Generality. The system should be adaptive and analyze the mainstream Linux kernel memory image, rather than support only certain versions of Linux.

2.2 Kernel symbols

In the Linux kernel 2.6. ×, kallsyms is used to extract all the non-stack symbols from a kernel and build a data blob. CONFIG_KALLSYMS should be configured as follows:

In the last stage of the kernel compile, the following command is executed:

nm -n vmlinux |scripts/kallsyms

Therefore, all the kernel symbols are generated and sorted according to their addresses. This list is used to create the “kallsyms.S” file, which includes several special symbols: kallsyms_addresses, kallsyms_num_syms, kallsyms_names, kallsyms_makers, kallsyms_token_table, and kallsyms_token_index. Among these symbols, kallsyms_addresses points to the addresses of all kernel symbols in order, kallsyms_num_syms points to the “num” value of kernel symbols, and kallsyms_names corresponds to the symbols’ name arrays. For convenience, kallsyms_markers, kallsyms_token_table, and kallsyms_token_index are used for the offset index and high-frequency string compression.

The acquisition of kernel symbols is essential for analyzing the information contained within a physical memory dump. For example, if system calls are needed during an investigation, their addresses are stored in a kernel structure called the system call table. The sys_call_table symbol stores an address for this table, and may be used to enumerate the addresses of system calls. There are several ways to obtain the symbols:

Copy /proc/kallsyms or System.map and analyze the file [15, 16]. Care should be taken when copying the System.map file because systems with multiple kernels have multiple System.map files. Unlike /boot/System.map, /proc/kallsyms is a “proc file” that is created when a kernel boots up. This is not actually a disk file and is always correct for the kernel that is currently running. Furthermore, /proc/kallsyms contains not only kernel symbols but also symbols exported from modules.
Additionally, the kernel build system puts the System.map inside the kernel’s executable and linkable format (ELF) executable. Symbols can be extracted using the following commands:

Even a simple recompile of the same kernel is sufficient to change the symbol addresses. In previous solutions for obtaining symbol tables, methods that select symbol table profiles according to the kernel versions are obviously inaccurate. Furthermore, there is a strong need to reliably determine the correct profile for unknown kernels, which are often encountered during incident response situations [17].

2.3 Linux memory analysis

In this section, we survey some related studies on Linux memory analysis. Assuming that the kernel data structures are known, a modular, extensible framework named FATKit can realize general virtual address space reconstruction and visualization [18]. The open source volatility framework has been adapted to work with Linux memory dumps, including Android, but it must be configured for the specific version of Linux being examined [15]. SecondLook is a commercial application with a GUI and command-line interface that can extract and display memory structures including processes, loaded kernel modules, and system call tables [19]. RAMPARSER was designed to reconstruct kernel data structures such as task_struct, mm_struct, File, Dentry, Qstr, inet_sock, Sock, and Socket [20]. Linux kernel versions from 2.6.9 to 2.6.27 were tested to verify its feasibility.

Each memory forensics solution has different features along with several limitations: first, the accurate OS version of a memory image must be known in advance, which means that Linux memory images without precise OS version information cannot be parsed correctly. Second, the analyzed system’s System.map file and kernel information are needed. These data may not be immediately available, or may have been modified by attackers to thwart forensic analysis. For example, the Kernel Debugger Block can be easily overwritten by malware [21]. Furthermore, some tools only work on specific versions and require substantial manual intervention.

3 Linux memory analysis framework based on kernel code reconstruction

In this paper, based on kernel code reconstruction, we propose a new Linux memory analysis framework that can automatically detect the kernel version and recover the symbol table file from the memory image. As shown in Fig. 1, there are five key components in our framework:

Kernel version identification (KVI): this component allows the OS version to be detected in two ways: linux_banner content identification and vmcoreinfo_data content identification.
kallsyms location symbol values recovery (KLSR): the symbol table file can be recovered from memory using kallsyms location symbols such as kallsyms_addresses, kallsyms_num_syms, kallsyms_names, kallsyms_token_table, and kallsyms_token_index. Based on kernel code reconstruction, this component provides a method for discovering the above symbol values.
Symbol table file recovery (STFR): using the kallsyms location symbol values obtained from KLSR, the symbol table file content and kernel symbol information can be recovered.
Live system information extraction (LSIE): several key kernel symbols are selected to extract live system information, such as process information and module information. Furthermore, the symbols exported from modules can be parsed.
Database information extension: the addresses of the symbols are identical for identical kernel versions and compile configurations. To improve system efficiency, a database records symbol information for known kernel versions. When the kallsyms location symbol values of new versions are recovered from KLSR, these values are saved in the database.

The detailed flow of our algorithm is described as follows:

Step I: given a physical memory image, identify the precise kernel version of the target system. Using the KVI module, the kernel version and the values of _stext and swapper_pg_dir can be obtained.
Step II: check the database for preexisting symbol information. If the processed kernel version exists, extract the symbol addresses from the database and go to step IV; otherwise, go to step III.
Step III: recover kallsyms location symbol values using the KLSR module. New acquisition data are recorded in the database.
Step IV: after the acquisition of kallsyms location symbol values, the symbol table file is recovered using the STFR module.
Step V: extract the _stext symbol from the symbol table file and compare its value with that obtained from step I. If these two values are equal, the kallsyms location symbol values are correct; go to step VI. Otherwise, go to step III, adjust the kallsyms_address candidate value and retrieve the kallsyms location symbol value again.
Step VI: using the kernel symbols in the symbol table file, live system information can be extracted. In particular, symbols exported by loaded kernel modules are analyzed.

From the above description, we can see that our solution has an adaptive ability to cope with different kernel versions. More details are introduced in the next section.

4 Research methodology

In this section, we describe the detailed processes of kernel version identification, kallsyms location symbol values recovery, symbol table file recovery, and live system information extraction.

4.1 Kernel version identification

There are two ways to obtain kernel version information from the memory image: vmcoreinfo_data content identification and linux_banner content identification.

4.1.1 linux_banner content identification

The start_kernel() function, which is called by the startup_32() function, initializes all of the data structures needed by the kernel, enables interrupts, and creates another kernel thread named process 1. Finally, linux_banner information is printed in the following format:

const char linux_banner [] =

“Linux version ” UTS_RELEASE “ (" LINUX_ COMPILE_BY “@”

LINUX_COMPILE_HOST “) (” LINUX_COMPILER “) ”UTS_VERSION “\n”;

By searching for the characteristic character “Linux version”, kernel version information can be obtained.

4.1.2 vmcoreinfo_data Content Identification

In the system initialization phase, the crash_ save_vmcoreinfo_init() function is triggered to initialize the content of vmcoreinfo_data, which includes general crash kernel information such as the kernel version, page size, and symbol information. vmcoreinfo_data starts with the character string “OSRELEASE=”; the character strings “SYMBOL(swapper_pg_dir)=” and “SYMBOL(_stext)=” are also included. By searching for these three strings, the address of vmcoreinfo_data can be located. The partial content of vmcoreinfo_data in the memory image is shown in Fig. 2.

The kernel version in linux_banner and vmcoreinfo_data content should be the same. The latter contains information about the _stext, swapper_pg_dir, vmlist, mem_map, and init_uts_ns symbols, which are also stored in the symbol table files. The values of these symbols are virtual addresses. Generally, if swapper_pg_dir has a length value of 8, the OS is 32 bit and the physical address of swapper_pg_dir is its virtual address minus 0 ×c0000000. If swapper_pg_dir has a length value of 16, the OS is 64 bit and its physical address is its virtual address minus 0 ×ffffffff80000000. swapper_pg_dir is the page global directory (pdg) for a process named “swapper” and can be used to translate between the virtual address and the physical address in the kernel address space. This symbol name differs between architectures in the symbol table file, being called swapper_pg_dir on both ×86 and PPC64, but it is named init_level4_pgt on ×86_64.

4.2 Kallsyms location symbol values recovery

The algorithm for the kallsyms location symbol values recovery has four main steps:

Step I: Kallsyms_address candidate value scanning. Because _stext is one of the kernel symbols, the values obtained from the procedure described in Section 4.1.2 can be used to locate the kallsyms_addresses. During the search procedure, the value of _stext may be found in multiple places. To enhance the efficiency of the algorithm, we impose some restrictions. For 32-bit systems, for example, the content before and after the found address are the addresses of kernel symbols, and so the values should be greater than 0xc0000000. Tracing back from the found address, we can obtain the value of the startup_32 symbol, which can be calculated from _stext&0 ×ffff0000. The first symbol in /proc/kallsyms is generally startup_32, and this is where the address of the startup_32 symbol resides. This provides a candidate physical address for kallsyms_addresses. However, in 64-bit systems, there are several symbols before the startup_64 symbol, and so the test times are higher than for 32-bit systems.

Step II: The kernel code disassembly. To obtain the other four symbol values, the kernel code in memory must be disassembled correctly. Unfortunately, because the instructions for different systems and architectures are of various lengths, starting from the wrong instruction location will disassemble a completely different instruction sequence. To address this challenge, we decompile the smallest amount of kernel code, instead of the whole block of required function calls.

Analyzing the source code of a Linux kernel, it is clear that operations related to the kernel symbols are mainly present in Linux/kernel/kallsyms.c. The symbols that will be re-linked against their real values during the second link stage are defined below:

extern const unsigned long kallsyms_addresses[] _weak;

extern const u8 kallsyms_names[] _weak;

extern const unsigned long kallsyms_num_syms;

extern const u8 kallsyms_token_table[] _weak;

extern const u16 kallsyms_token_index[] _weak;

The call relationship of the update_iter function is described in Fig. 3. In the update_iter function, the get_ksymbol_core function is called using kallsyms_addresses. Next to the instruction “iter- >value = kallsyms_addresses[iter- >pos],” the kallsyms_get_symbol_type function is called using kallsyms_token_table, kallsyms_token_index, and kallsyms_names[off + 1].

Once the principle of the update_iter function is fully understood, we can use the candidate value of kallsyms_addresses obtained from step I to obtain the values of the other four symbols.

An image from the 3.6.10-4.fc18.i686.PAE system is used to illustrate the method. The value of the kallsyms_addresses symbol is found at offset 0xc4 for the update_iter function’s binary code in the image. Therefore, we step back three bytes and disassemble the binary code. As mentioned above, our principle is to reduce the amount of disassembled code by as much as possible and improve the precision of our method. Approximately 0 ×2a bytes are chosen to be decompiled, and the results are as follows:

Through a combined analysis of the disassembled output and the update_iter function’s source code, the values of kallsyms_names, kallsyms_token_index, and kallsyms_token_table can be obtained. In 32-bit systems, the value of kallsyms_num_syms is the value of kallsyms_names minus 4.

There are some differences in 64-bit systems. Bits 0–31 of the symbol addresses are used, and the value of kallsyms_num_syms is the value of kallsyms_names minus 8. Although some changes take place in the binary code of the update_iter function for different Linux systems, the differences in the code segment used here are minimal.

4.3 Symbol table file recovery

After acquiring the kallsyms location symbols, the STFR method proceeds as follows:

The kallsyms_num_syms symbol points to the kernel symbol “num” in /proc/kallsyms. The kallsyms_addresses symbol points to the addresses of all kernel symbols in order. Each symbol address has a length of 4 in 32-bit systems and 8 in 64-bit systems. To obtain the corresponding name, the kallsyms_names, kallsyms_token_table, and kallsyms_token_index symbols are needed. The kallsyms_names symbol points to a list of length-prefixed byte arrays that encode indexes into the token table. According to the length, the bytes of each array are acquired and used to construct a substring. Finally, the substrings are joined together to form the type and name of the required symbol.

We again use an image from the 3.6.10-4.fc18.i686.PAE system to describe the above method.

The addresses of the required symbols are determined from the database:

c09e8c0c R kallsyms_addresses

c0a247b0 R kallsyms_num_syms

c0a247b4 R kallsyms_names

c0ad9210 R kallsyms_token_table

c0ad95a0 R kallsyms_token_index

First, translate the virtual address of the kallsyms_num_syms symbol to a physical address and obtain the num of kernel symbols in /proc/kallsyms. Likewise, convert the virtual address of the kallsyms_addresses address and read the addresses of all kernel symbols in order. The partial content of kallsyms_addresses is displayed in Fig. 4.

In the next step, the content pointed to by the kallsyms_names symbol is read and split into substrings according to its format. As shown in Fig. 5, the first byte of each substring is the length of the compression bytes. Each compression byte corresponds to several characters through conversion with the kallsyms_token_table and kallsyms_token_index symbols. Connecting all of the characters together, we obtain the type and name of the symbol. By dealing with the bytes marked in Fig. 5, the type for the first symbol is determined to be T, which means that the symbol is in the text (code) section and the name of the first symbol is startup_32. In combination with the result from the kallsyms_address symbol, the address of the startup_32 symbol is 0 ×c0400000.

To verify the correctness of the symbol values, the value of _stext obtained from the process described in this section is compared with that obtained in Section 4.1.2. If the two values are the same, the obtained symbols are available. If not, we recover the symbols using the algorithm described in Section 4.2.

4.4 Live system information extraction

After determining the kernel symbols, several key symbols are selected to obtain the live system information.

4.4.1 Gathering offsets of structure members

The structure layouts vary greatly depending upon the configuration parameters. For example, the layout of the module structure depends on the values of optional configuration parameters such as CONFIG_MODULE_SIG, CONFIG_SYSFS, and CONFIG_UNUSED_SYMBOLS. Thus, to properly analyze a Linux image, the offsets of important structure members must be identified. As shown in Fig. 6, the module structure plays a significant role in the extraction of module information.

Code fragments 1–3 show how equivalent statements can be compiled to form radically different instruction sequences. The C source code in code fragment 1 is from the module_get_kallsym() function within /kernel/module.c of the Linux kernel source base. This function was used to help find the offset of the num_symtab, symtab, and strtab members of the struct module.

In the above code fragments, the constant 0 ×114 within the indexed instructions is the offset for the num_symtab member. As the methods used by compilers can be very different, all possible instruction formats for various architectures must be clarified. Code fragment 4 is again from the module_get_kallsym() function, and fragments 5–8 illustrate the disassembly of the instruction that accesses the strtab and symtab members of the module for different architectures.

The module_get_kallsym function is exported as a symbol to /proc/kallsyms, and its address can be obtained from the process described in Section 4.3. In this way, the state, name, module_core, and source_list members of module structures can be analyzed based on the kdb_lsmod function defined in /kernel/debug/kdb/kdb_main.c.

4.4.2 Process information extraction

Every process is represented by a structure named task_struct, which is defined in the /usr/src/linxu-2.4/include/linux/sched.h file. The init_task symbol corresponds to the task_struct structure address of the swapper, where the PID is zero. The task_struct structures of all active processes are doubly linked to each other. By traversing the double-linked list, all of the running processes can be identified. Moreover, the task_struct structure includes some objects that correspond to information regarding the current state of a process, such as struct mm_struct *mm, struct fs_struct *fs, struct files_struct *files, and struct thread_struct thread. Using these objects, we can obtain information on the memory management, file, and thread of the processes.

4.4.3 Module information extraction

Similar to the process information, all module structures are doubly linked to each other. By acquiring a module using the module symbol, the other modules can be identified from this doubly linked list.

To link a module, the sys_init_module() service initializes the syms and gpl_syms fields of the module object so that they point to the in-memory tables of symbols exported by the module. Some special kernel symbol tables are used by the kernel to store the symbols that can be accessed by modules with their corresponding addresses. These are contained in three sections of the kernel code segment: the _kstrtab section includes the names of the symbols, the _ksymtab section includes the addresses of the symbols, and the _ksymtab_gpl section includes the addresses of the symbols that can be used by the modules released under a GPL-compatible license. Only the kernel symbols actually used by some existing modules are included in the table. Linked modules can also export their own symbols so that other modules can access them. Although these symbols are critical during an investigation, they have largely been neglected in previous research. For instance, the vm_list symbol exported by the kvm module can be used to analyze the virtual machine information running on the current physical machine.

To obtain the exported symbols from the memory image, some objects of the module structure can be used, such as const struct kernel_symbol *syms, const struct kernel_symbol *gpl_syms, Elf_Sym *symtab, and Char *strtab. Among these objects, the symtab object is particularly important because it assists in the recovery of the symbol and string tables for kallsyms.

As for other system information, the rt_hash_mask, rt_hash_table, and net_namespace_list symbols are used to obtain information about the network configuration and current network connections; the boot_cpu_data symbol is used to obtain CPU information; the log_buf symbol corresponds to system log and debug information; and the iomem_resource symbol reflects the available physical address space of the target computer.

5 Evaluation

Based on the techniques described above, we developed a Linux memory analysis system named RAMAnalyzer. In this section, we present our experimental results. An experiment to test the effectiveness of RAMAnalyzer with 26 Linux kernels (from 2.6.18 to 4.2.0) is described in Section 5.1, and the performance of the proposed tool is reported in Section 5.2. All of our experiments were performed on a host machine with an Intel Core i5-4210U CPU, 4-GB memory, and a 64-bit Windows 7 OS.

5.1 Effectiveness

The following memory images were chosen: DFRWS 2008 forensics challenge, volatility memory samples, and memory snapshots from virtual machines running on the VMware Workstation. The test flow and execution results of RAMAnalyzer are described below.

Taking a memory image from the 3.1.0-7.fc16.i686.PAE system as an example, the first step was to identify the kernel version by searching for linux_banner content and vmcoreinfo_data content. The database was then checked to identify any prior knowledge of this kernel version. If the database returned no results, the kallsyms location symbol values were restored by disassembling the dynamically loaded code of the update_iter function in the memory image. The initialization result, including kernel version information and kallsyms location symbol values, is displayed in Fig. 7. Using the kallsyms location symbol values, the kernel symbols were extracted. The partial result is shown in Fig. 8.

Several symbols were selected to extract live system information from the memory image, including process information, module information, network connection information, and system log information. The loaded kernel modules information extracted from a memory image of the 3.1.0-7.fc16.i686.PAE system are listed in Fig. 9, and the symbols exported from the lockd module are listed in Fig. 10.

5.2 Performance evaluation

To ensure that large changes in the kernel algorithms do not affect the validity of our approach, memory images from various kernel versions were used to test the performance of RAMAnalyzer. Some of the kernels used in the experiment are listed in Table 1.

Table 1 Sample of kernel versions used for testing RAMAnalyzer

Full size table

We measured the execution speed of RAMAnalyzer when only a memory image was provided. As illustrated in Fig. 1, the key steps of our approach are kernel version identification and symbol table file recovery, and these are our evaluation targets. From Figs. 11 and 12, we can see that 15–78 ms were required for kernel version identification, and 347-15,693 ms were needed for the recovery of kallsyms location symbol values and kernel symbols.

After obtaining the kernel symbols, the modules symbol was used to find the loaded kernel modules and exported symbols. The time required for this process is shown in Fig. 13.

The experimental results prove that RAMAnalyzer can deal with memory images from a wide range of kernel versions and demonstrate that its execution time is acceptable.

6 Conclusions

Based on kernel code reconstruction, this paper has proposed an adaptive approach for Linux memory analysis that can address a Linux memory image without information about the kernel version or System.map file. We implemented a prototype named RAMAnalyzer that is made up of five main components: kernel version identification, symbol table file recovery, kallsyms location symbol value recovery, live system information extraction, and database information extension. Our experimental results with a number of Linux memory images show that RAMAnalyzer can automatically identify the kernel version and recovery kernel symbols. Based on the kernel symbols, RAMAnalyzer then extracts live system information about the target system at the time of memory acquisition.

The primary advantages of RAMAnalyzer are:

The ability to deal with memory images without precise kernel version information and symbol information.
The ability to identify the precise kernel version and recovery kernel symbols automatically, which means it can deal with memory images from different kernel versions. Furthermore, kernel symbol information obtained in this way is more accurate because symbol information from identical kernel versions can vary under different configuration options.
As well as kernel symbol information, RAMAnalyzer can acquire the symbols exported from modules, which play an important role in the investigation procedure.
Based on the above techniques, RAMAnalyzer has adaptability to deal with mainstream Linux kernel memory images and has high execution efficiency.

From the advantages of RAMAnalyzer, we can see that our solution can provide a solution for the challenge described in Section 1 and meet the need of scenarios described in Section 2.1. With the advent of mobile cloud computing, the development of Linux is accelerating and its security is becoming increasingly crucial. The techniques proposed in this paper provide forensics researchers with a starting point to delve into Linux memory forensics, which plays an important role in security and forensics investigations. Furthermore, these techniques can be conveniently embedded into other forensics frameworks.

To enhance the performance of RAMAnalyzer, the following research will be undertaken: first, owing to the differences in the kallsyms configurations of Linux kernel versions, there are various initial /proc/kallsyms. To improve the processing speed, it is essential to scan the kalsyms_address candidate values. Second, to improve the adaptive capacity of cloud environments, RAMAnalyzer was verified to be effective using memory images from the KVM host machine. However, further experiments are required to verify its efficacy on memory images from Xen host machines.

References

DFRWS 2005 Forensics Challenge. http://www.dfrws.org/conferences/dfrws-usa-2005.
L Wang, L Xu, S Zhang, Network connections information extraction of 64-bit windows 7 memory images. Forensics in Telecommunications, Information, and Multimedia. 56:, 90–98 (2010).
Article Google Scholar
DFRWS 2005 Forensics Challenge: Memparser. http://sourceforge.net/projects/memparser.
A Schuster, Searching for processes and threads in microsoft windows memory dumps. Digit. Investig. 3:, 10–16 (2006).
Article Google Scholar
R Zhang, L Wang, S Zhang, Windows memory analysis based on kpcr. Fifth International Conference On Information Assurance and Security. 2:, 677–680 (2009).
Article Google Scholar
JD Kornblum, Using every part of the buffalo in windows memory analysis. Digit Investig. 4(1), 24–29 (2007).
Article Google Scholar
A Schuster, The impact of microsoft windows pool allocation strategies on memory forensics. Digit. Investig.5:, 58–64 (2008).
Article Google Scholar
S Zhang, L Wang, R Zhang, Q Guo, Exploratory study on memory analysis of windows 7 operating system. 3rd International Conference On Advanced Computer Theory and Engineering (ICACTE). 6:, 373–377 (2010).
Google Scholar
B Dolan-Gavitt, Forensic analysis of the windows registry in memory. Digit. Investig.5:, 26–32 (2008).
Article Google Scholar
R Van Baar, W Alink, A Van Ballegooij, Forensic memory analysis: Files mapped in memory. Digit. Investig.5:, 52–57 (2008).
Article Google Scholar
S Hejazi, C Talhi, M Debbabi, Extraction of forensically sensitive information from windows physical memory. Digit. Investig.6:, 121–131 (2009).
Article Google Scholar
Q Zhao, T Cao, Collecting sensitive information from windows physical memory. J. Comput.4(1), 3–10 (2009).
Article MathSciNet Google Scholar
M Sikorski, A Honig, Practical Malware Analysis: the Hands-on Guide to Dissecting Malicious Software (no starch press, Scn Francisco, 2012).
Google Scholar
A Ligh, MH Case, J Levy, A Walters, The Art of Memory Forensics: Detecting Malware and Threats in Windows, Linux, and Mac Memory (John Wiley & Sons, 2014).
Volatility: Linux Memory Forensics. https://code.google.com/archive/p/volatility/wikis/LinuxMemoryForensics.wiki.
Rekall Memory Forensic Framework. http://www.rekall-forensic.com/.
MI Cohen, Characterization of the windows kernel version variability for accurate memory analysis. Digit. Investig.12:, 38–49 (2015).
Article Google Scholar
A Petroni, NL Walters, T Fraser, WA Arbaugh, Fatkit: A framework for the extraction and analysis of digital forensic data from volatile system memory. Digit. Investig.3(4), 197–210 (2006).
Article Google Scholar
SecondLook Linux Memory Dump Samples. https://www.forcepoint.com/.
A Case, L Marziale, GG Richard, Dynamic recreation of kernel data structures for live forensics. Digit. Investig.7:, 32–40 (2010).
Article Google Scholar
T Haruyama, H Suzuki. >https://media.blackhat.com/bh-eu-12/Haruyama/bh-eu-12-Haruyama-Memory_Forensic-Slides.pdf.

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (61472225), Shandong Natural Science Foundation (ZR2014FM003), and Shandong Province Outstanding Young Scientists Research Award Fund Project(BS2014DX007).

Author information

Authors and Affiliations

School of Computer Science and Technology, Shandong University, Shunhua Road, Jinan, China
Shuhui Zhang & Xiangxu Meng
Shandong Computer Science Center (National Supercomputer Center in Jinan), Shandong Provincial Key Laboratory of Computer Networks, Keyuan Road, 250014, Jinan, China
Shuhui Zhang & Lianhai Wang

Authors

Shuhui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangxu Meng
View author publications
You can also search for this author in PubMed Google Scholar
Lianhai Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuhui Zhang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

SH and XX carried out the main research of this work. LH performed the experiments. SH and LH conceived of the study, participated in its design and coordination, and helped to draft the manuscript. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Zhang, S., Meng, X. & Wang, L. An adaptive approach for Linux memory analysis based on kernel code reconstruction. EURASIP J. on Info. Security 2016, 14 (2016). https://doi.org/10.1186/s13635-016-0038-z

Download citation

Received: 06 January 2016
Accepted: 14 June 2016
Published: 27 June 2016
DOI: https://doi.org/10.1186/s13635-016-0038-z

An adaptive approach for Linux memory analysis based on kernel code reconstruction

Abstract

1 Introduction

2 Background and related work

2.1 Problem statement

2.2 Kernel symbols

2.3 Linux memory analysis

3 Linux memory analysis framework based on kernel code reconstruction

4 Research methodology

4.1 Kernel version identification

4.1.1 linux_banner content identification

4.1.2 vmcoreinfo_data Content Identification

4.2 Kallsyms location symbol values recovery

4.3 Symbol table file recovery

4.4 Live system information extraction

4.4.1 Gathering offsets of structure members

4.4.2 Process information extraction

4.4.3 Module information extraction

5 Evaluation

5.1 Effectiveness

5.2 Performance evaluation

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords