Intrusion detection model based on selective packet sampling
© Bakhoum; licensee Springer. 2011
Received: 8 January 2011
Accepted: 19 September 2011
Published: 19 September 2011
Skip to main content
© Bakhoum; licensee Springer. 2011
Received: 8 January 2011
Accepted: 19 September 2011
Published: 19 September 2011
Recent experimental work by Androulidakis and Papavassiliou (IET Commun 2(3):399, 2008; IEEE Netw 23(1):6, 2009) has shown that it is possible to maintain a high level of network security while selectively inspecting packets for the existence of intrusive activity, thereby resulting in a minimal amount of processing overhead. In this paper, a statistical approach for the modeling of network intrusions as Markov processes is introduced. The theoretical findings presented here confirm the earlier experimental results of Androulidakis and Papavassiliou. A common notion about network intrusion detection systems is that every packet arriving into a network must be inspected in order to prevent intrusions. This investigation, together with the earlier experimental results, disproves that notion. Additional experimental testing of a corporate local area network is reported.
Network intrusion detection systems (IDS) perform a vital role in protecting networks connected to the World Wide Web from malicious attacks. Traditionally, IDS software products such as SNORT , SecureNet , and Hogwash  work by monitoring traffic at the network choke-point, where every incoming IP packet is analyzed for suspicious patterns that may indicate hostile activity. Because those software systems must match packets against thousands of known ominous patterns, they must work extremely fast. Under heavy traffic, however, the IDS is usually forced to drop packets so that the IDS itself will not become the bottleneck of the network, of course at the risk of allowing an attack to go undetected. Because of this deficiency, host-based IDS solutions have been introduced [4, 5]. Host-based IDS products run on a server rather than at the network gateway. Unfortunately, however, host-based solutions can slow down the server considerably under heavy traffic conditions. Because of the inherent limitations of all software solutions, hardware solutions were finally introduced. The state-of-the-art hardware solution is a field programmable gate array (FPGA) that performs the same IDS function at substantially higher speeds [6, 7]. There are serious other problems, however, to contend with when hardware solutions are implemented .
The purpose of this paper is to introduce an analytic and statistical model for the process of network intrusion and to demonstrate that the common notion of the necessity of having the content of every IP packet inspected is flawed. In the past, numerous research articles that addressed the problem of network intrusion modeling have appeared in the literature [8–19]. Kephart and White [8, 9] published the first analytical work on the modeling of the propagation of viruses and worms. More recently, Wang and Wang , guided by the analysis of Kephart and White, recognized that the problem of network intrusion can be modeled after the popular "birth and death" epidemiological model. Wang and Wang (WW), however, did not develop such a model analytically, as the problem is mathematically challenging. Very recently, an important experimental discovery was made by Androulidakis and Papavassiliou (AP) [11, 12], when they demonstrated experimentally that the selective inspection of packets for the purpose of detecting network intrusion can be as effective as the full inspection of all packets. In this paper, it will be demonstrated that the seemingly unrelated discoveries of WW and AP do in fact stem from the same mathematical origin. More specifically, the WW hypothesis that the process of network intrusion can be modeled after the "birth and death" epidemiological model will be developed analytically for the first time. The results are surprising and essentially confirm the experimental findings of AP. The main conclusion is that it is possible to selectively inspect packets from only certain packet flows, thereby eliminating the speed bottleneck problem and the necessity to drop packets at high bit rates, while simultaneously maintaining a high degree of network security.
Actual testing by the author that involved a corporate local area network has confirmed the theoretical findings. Additional testing of an optimized SNORT software package--in combination with a traffic generator and an Agilent network analyzer--has further confirmed the theoretical findings. The implication of these theoretical and experimental results to the structure and the design of future IDS will be quite substantial.
The analysis that will be now developed is based on the observation that the birth and death model of network intrusion that was advocated by Wang and Wang is a class of Markov processes [20, 21]. By applying Markov chain analysis to the process of network intrusion, a statistical formula that relates the probability of a network being compromised to the probability of occurrence of intrusion will be obtained. In the following sections, it will be demonstrated that it is possible to selectively inspect packets arriving into a network while maintaining a high degree of security at the same time, as long as such inspection is performed in accordance with the statistical formula.
b be the birth rate (or initiation rate) of new processes on the network at any given time;
d be the death rate (or termination rate) of processes;
P i be the probability that the network is in state S i ;
P H be the probability that any new process started on the network be a hostile process. (This probability is an independent variable that strongly depends on the circumstances. The numerical value of this probability will be calculated as described further in Section 3).
which is exactly the same as Eq. (7). Hence, Eq. (6) is indeed valid for any value of i.
The number n in Eq. (13) and in the figure is the number of processes initiated by users and does not include such things as system processes or background tasks, since the probability of occurrence of intrusion is associated with user processes only. The inverse exponential relationship of Eq. (13) is remarkable because, as is well known, it can be practically approximated by a linear function that drops to zero at a specific threshold. For the n = 100 curve, for example, it can be easily concluded that the inspection of all the incoming packets will be inevitable if the probability of occurrence of intrusion is larger than about 2%, as the probability of a clean state is practically zero at all points past that threshold. For P H < 2%, the probability of a clean state is substantial, and, as will be demonstrated in the following section, it is possible to inspect packets selectively under such conditions without sacrificing security. This is clearly a better alternative to the strategy of "inspect all packets, or drop packets randomly" that is currently being implemented in IDS software solutions.
In view of the above, two questions must now be answered: first, how the probability P H will be determined at any given time; and secondly, if P H is below the threshold determined by Eq. (13), then what kind of packet sampling strategy must be used to ensure that intrusion would still be detected if it occurs.a In 2008, Androulidakis and Papavassiliou [11, 12] demonstrated experimentally for the first time that under certain conditions, the selective inspection of packets for the purpose of detecting network intrusion can be as effective as the full inspection of all packets. We shall now demonstrate that the Androulidakis-Papavassiliou criterion corresponds with the conclusions reached above.
As can be seen from the graph, the calculated value of P H increased from less than 1% to more than 90% for a time duration of approximately 1 min during which the attack was simulated. This corresponds very well with the data shown in Figure 2. Clearly, the formula in Eq. (16) for calculating the instantaneous value of P H correlates with the analysis of the previous section.
As suggested by Eq. (13) and Figure 2, full inspections of all the packets should be implemented by the IDS if the value of P H is above a calculated threshold, and selective inspection is conceivably possible if P H is below the threshold. Under heavy traffic conditions and low probability of occurrence of intrusion, such a solution is obviously very desirable. We shall now answer the important question of what kind of packet sampling strategy must be used to ensure that intrusion would still be detected if it occurs. A number of studies have differentiated between packet-based sampling and flow-based sampling [25–27]. In packet-based sampling, packets are selected from the global traffic using a pre-specified method. In flow-based sampling, packets are first classified into flows. A "flow" is defined as a set of packets that have in common the following packet header fields: source IP address, destination IP address, source port, destination port, and protocol. The published studies, particularly the studies by Barford et al.  and Sridharan et al. , have showed that small flows (flows that consist of 1-4 packets) are usually the source of most network attacks. Androulidakis and Papavassiliou have in fact advocated and demonstrated the success of the selective inspection of packets from small flows in their experimental investigation. According to that approach, flows that consist of 1-4 packets are fully inspected, and larger flows are inspected with a sampling frequency that is inversely proportional to their size (see ref. ). We now give a rigorous proof that such a technique for the selective inspection of packets guarantees that intrusion will be detected if it occurs:
Lemma: If the selective inspection of packets with a sampling probability that favors small flows is implemented, the probability of detecting a network intrusion is approximately equal to 1.
For sufficiently large N (e.g., N > 100) and sufficiently large P (e.g., P > 0.01, or 1%), the above summation is approximately equal to 1. If packets are selected predominantly from small flows, P is guaranteed to be substantially higher than 1% (port scan, for instance, is only one packet). ■
To summarize the above conclusions, a modern, efficient IDS should selectively inspect packets such that small flows (flows that consist of 1-4 packets) are fully inspected, and larger flows are inspected with a frequency that is inversely proportional to their size. The probability of occurrence of intrusion P H should be calculated in real time by using Eq. (16). For calculating P H , only the packet headers need to be inspected (see the discussion in the previous section) and the probabilities of occurrence of the source/destination IP address, the source/destination port, and/or the protocol must be calculated and used in Eq. (16). If at any time P H exceeds a suitable threshold that is calculated from Eq. (13), the IDS must switch immediately to the full inspection of the content of all the packet traffic and quarantine any packets that are found to be malicious.
As shown, malicious traffic was generated from a Linux machine on which two different packet-generation programs were installed: IDSWakeup  and D-ITG . These programs make use of the powerful kernel of Linux to generate packets at speeds of up to one Gigabit per second. The main purpose of IDSwakeup is to generate false intrusive attacks that mimic well-known ones (e.g., Denial of Service (DoS) attacks, port scan, and worm propagation), in order to determine how the IDS detects and responds to those attacks. D-ITG (which stands for Distributed Internet Traffic Generator), on the other hand, is a simple but very versatile packet generator that can generate packets of different sizes and different inter-departure times. The packet-generation machine is equipped with a 3 GHz Pentium 4 processor, 4 GB of RAM, and a 1 Gb/s network interface card. The malicious traffic generated was merged with regular Internet traffic through a Cisco router and directed to the corporate LAN, as shown. A simple IDS software solution was developed for implementing the inspection strategy described above. The code was developed in Matlab and converted to C (for brevity, the details of the code will not be discussed here). Essentially, the code inspects the headers of the packets in small flows (flows that are 1-4 packets in length). The headers of packets in larger flows are inspected with a frequency that is inversely proportional to the size of the flow, as described in the previous section. After 100 packets are selected, the code computes P H from Eq. (16), for 3 different attack scenarios: DoS, port scan, and worm propagation. If P H is found to have exceeded a suitable threshold that is calculated from Eq. (13), the code immediately moves to full inspection mode, where the actual contents of the packets selected and all subsequent packets are inspected for the presence of well-known patterns [11, 28, 29]. Any packets that are found to be malicious are quarantined.c Throughout each test conducted, the number of user processes n running on the LAN was purposely maintained at a constant value (according to the theory in Section 2, the higher the value of n the lower the threshold that must be used).
As Figure 10 shows, for a link speed of 10 Mbps, the percentage of hostile packets that slipped through Snort was essentially the same as the percentage shown in Figure 5. The percentage increases slightly at higher link speeds and reaches a maximum of about 0.2% (or 2 packets for every 1,000 malicious packets) at a link speed of 1 Gb/s. As the results clearly show, the effect of the link speed on this intrusion detection approach is essentially negligible.
The analysis of network intrusions as Markov chains disproves the common notion that it is necessary to fully inspect every packet entering a network in order to ensure security. The results shown here fully support the experimental results that were published recently by Androulidakis and Papavassiliou [11, 12]. The analysis, together with the testing data, demonstrates that it is sufficient to inspect only a small number of packets sampled predominantly from small flows, as long as the probability of occurrence of intrusion P H is below a critical threshold that is determined from Eq. (13) and calculated in real time from Eq. (16). The implications of the research presented here for software IDS solutions such as SNORT are substantial, as the selective inspection of packets allows the IDS to handle high speed links without dropping any packets. Hence, it is essentially possible for most of the time to eliminate the speed bottleneck problem without compromising security.
aHere, it is important to point out that a "process", as defined in the previous section, can be started with one or more packets. The procedure for calculating P H , however, will be based on the direct inspection of packets.
bIt can be argued that the relationship between P H and H(normalized) should be a proportionality relationship, not an exact equality as shown in Eq. (16). However, the objective of this work is to obtain a reasonable estimate for the likelihood of the occurrence of intrusion, not to seek idealized, precise mathematical relationships. In reality, due to the nature of the problem, the mathematical framework presented here is not meant to be highly precise, but it can be made sufficiently precise with the inclusion of experimental data.
cIt is to be pointed out that DoS attacks can be identified only from the packet headers.
dSNORT is a registered trademark of Sourcefire, Inc.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.