Intrusion detection model based on selective packet sampling
© Bakhoum; licensee Springer. 2011
Received: 8 January 2011
Accepted: 19 September 2011
Published: 19 September 2011
Recent experimental work by Androulidakis and Papavassiliou (IET Commun 2(3):399, 2008; IEEE Netw 23(1):6, 2009) has shown that it is possible to maintain a high level of network security while selectively inspecting packets for the existence of intrusive activity, thereby resulting in a minimal amount of processing overhead. In this paper, a statistical approach for the modeling of network intrusions as Markov processes is introduced. The theoretical findings presented here confirm the earlier experimental results of Androulidakis and Papavassiliou. A common notion about network intrusion detection systems is that every packet arriving into a network must be inspected in order to prevent intrusions. This investigation, together with the earlier experimental results, disproves that notion. Additional experimental testing of a corporate local area network is reported.
KeywordsNetwork Intrusion Intrusion Detection System IP Packets Markov Process Birth and Death Model
Network intrusion detection systems (IDS) perform a vital role in protecting networks connected to the World Wide Web from malicious attacks. Traditionally, IDS software products such as SNORT , SecureNet , and Hogwash  work by monitoring traffic at the network choke-point, where every incoming IP packet is analyzed for suspicious patterns that may indicate hostile activity. Because those software systems must match packets against thousands of known ominous patterns, they must work extremely fast. Under heavy traffic, however, the IDS is usually forced to drop packets so that the IDS itself will not become the bottleneck of the network, of course at the risk of allowing an attack to go undetected. Because of this deficiency, host-based IDS solutions have been introduced [4, 5]. Host-based IDS products run on a server rather than at the network gateway. Unfortunately, however, host-based solutions can slow down the server considerably under heavy traffic conditions. Because of the inherent limitations of all software solutions, hardware solutions were finally introduced. The state-of-the-art hardware solution is a field programmable gate array (FPGA) that performs the same IDS function at substantially higher speeds [6, 7]. There are serious other problems, however, to contend with when hardware solutions are implemented .
The purpose of this paper is to introduce an analytic and statistical model for the process of network intrusion and to demonstrate that the common notion of the necessity of having the content of every IP packet inspected is flawed. In the past, numerous research articles that addressed the problem of network intrusion modeling have appeared in the literature [8–19]. Kephart and White [8, 9] published the first analytical work on the modeling of the propagation of viruses and worms. More recently, Wang and Wang , guided by the analysis of Kephart and White, recognized that the problem of network intrusion can be modeled after the popular "birth and death" epidemiological model. Wang and Wang (WW), however, did not develop such a model analytically, as the problem is mathematically challenging. Very recently, an important experimental discovery was made by Androulidakis and Papavassiliou (AP) [11, 12], when they demonstrated experimentally that the selective inspection of packets for the purpose of detecting network intrusion can be as effective as the full inspection of all packets. In this paper, it will be demonstrated that the seemingly unrelated discoveries of WW and AP do in fact stem from the same mathematical origin. More specifically, the WW hypothesis that the process of network intrusion can be modeled after the "birth and death" epidemiological model will be developed analytically for the first time. The results are surprising and essentially confirm the experimental findings of AP. The main conclusion is that it is possible to selectively inspect packets from only certain packet flows, thereby eliminating the speed bottleneck problem and the necessity to drop packets at high bit rates, while simultaneously maintaining a high degree of network security.
Actual testing by the author that involved a corporate local area network has confirmed the theoretical findings. Additional testing of an optimized SNORT software package--in combination with a traffic generator and an Agilent network analyzer--has further confirmed the theoretical findings. The implication of these theoretical and experimental results to the structure and the design of future IDS will be quite substantial.
2. Statistical model of network intrusion
The analysis that will be now developed is based on the observation that the birth and death model of network intrusion that was advocated by Wang and Wang is a class of Markov processes [20, 21]. By applying Markov chain analysis to the process of network intrusion, a statistical formula that relates the probability of a network being compromised to the probability of occurrence of intrusion will be obtained. In the following sections, it will be demonstrated that it is possible to selectively inspect packets arriving into a network while maintaining a high degree of security at the same time, as long as such inspection is performed in accordance with the statistical formula.
b be the birth rate (or initiation rate) of new processes on the network at any given time;
d be the death rate (or termination rate) of processes;
P i be the probability that the network is in state S i ;
P H be the probability that any new process started on the network be a hostile process. (This probability is an independent variable that strongly depends on the circumstances. The numerical value of this probability will be calculated as described further in Section 3).
The number n in Eq. (13) and in the figure is the number of processes initiated by users and does not include such things as system processes or background tasks, since the probability of occurrence of intrusion is associated with user processes only. The inverse exponential relationship of Eq. (13) is remarkable because, as is well known, it can be practically approximated by a linear function that drops to zero at a specific threshold. For the n = 100 curve, for example, it can be easily concluded that the inspection of all the incoming packets will be inevitable if the probability of occurrence of intrusion is larger than about 2%, as the probability of a clean state is practically zero at all points past that threshold. For P H < 2%, the probability of a clean state is substantial, and, as will be demonstrated in the following section, it is possible to inspect packets selectively under such conditions without sacrificing security. This is clearly a better alternative to the strategy of "inspect all packets, or drop packets randomly" that is currently being implemented in IDS software solutions.
3. Optimization of network intrusion detection systems
In view of the above, two questions must now be answered: first, how the probability P H will be determined at any given time; and secondly, if P H is below the threshold determined by Eq. (13), then what kind of packet sampling strategy must be used to ensure that intrusion would still be detected if it occurs.a In 2008, Androulidakis and Papavassiliou [11, 12] demonstrated experimentally for the first time that under certain conditions, the selective inspection of packets for the purpose of detecting network intrusion can be as effective as the full inspection of all packets. We shall now demonstrate that the Androulidakis-Papavassiliou criterion corresponds with the conclusions reached above.
3.1. The connection between P H and the Androulidakis-Papavassiliou criterion
As can be seen from the graph, the calculated value of P H increased from less than 1% to more than 90% for a time duration of approximately 1 min during which the attack was simulated. This corresponds very well with the data shown in Figure 2. Clearly, the formula in Eq. (16) for calculating the instantaneous value of P H correlates with the analysis of the previous section.
3.2. The principle of selective packet inspection
As suggested by Eq. (13) and Figure 2, full inspections of all the packets should be implemented by the IDS if the value of P H is above a calculated threshold, and selective inspection is conceivably possible if P H is below the threshold. Under heavy traffic conditions and low probability of occurrence of intrusion, such a solution is obviously very desirable. We shall now answer the important question of what kind of packet sampling strategy must be used to ensure that intrusion would still be detected if it occurs. A number of studies have differentiated between packet-based sampling and flow-based sampling [25–27]. In packet-based sampling, packets are selected from the global traffic using a pre-specified method. In flow-based sampling, packets are first classified into flows. A "flow" is defined as a set of packets that have in common the following packet header fields: source IP address, destination IP address, source port, destination port, and protocol. The published studies, particularly the studies by Barford et al.  and Sridharan et al. , have showed that small flows (flows that consist of 1-4 packets) are usually the source of most network attacks. Androulidakis and Papavassiliou have in fact advocated and demonstrated the success of the selective inspection of packets from small flows in their experimental investigation. According to that approach, flows that consist of 1-4 packets are fully inspected, and larger flows are inspected with a sampling frequency that is inversely proportional to their size (see ref. ). We now give a rigorous proof that such a technique for the selective inspection of packets guarantees that intrusion will be detected if it occurs:
Lemma: If the selective inspection of packets with a sampling probability that favors small flows is implemented, the probability of detecting a network intrusion is approximately equal to 1.
For sufficiently large N (e.g., N > 100) and sufficiently large P (e.g., P > 0.01, or 1%), the above summation is approximately equal to 1. If packets are selected predominantly from small flows, P is guaranteed to be substantially higher than 1% (port scan, for instance, is only one packet). ■
To summarize the above conclusions, a modern, efficient IDS should selectively inspect packets such that small flows (flows that consist of 1-4 packets) are fully inspected, and larger flows are inspected with a frequency that is inversely proportional to their size. The probability of occurrence of intrusion P H should be calculated in real time by using Eq. (16). For calculating P H , only the packet headers need to be inspected (see the discussion in the previous section) and the probabilities of occurrence of the source/destination IP address, the source/destination port, and/or the protocol must be calculated and used in Eq. (16). If at any time P H exceeds a suitable threshold that is calculated from Eq. (13), the IDS must switch immediately to the full inspection of the content of all the packet traffic and quarantine any packets that are found to be malicious.
3.3. Testing of the proposed IDS approach
As shown, malicious traffic was generated from a Linux machine on which two different packet-generation programs were installed: IDSWakeup  and D-ITG . These programs make use of the powerful kernel of Linux to generate packets at speeds of up to one Gigabit per second. The main purpose of IDSwakeup is to generate false intrusive attacks that mimic well-known ones (e.g., Denial of Service (DoS) attacks, port scan, and worm propagation), in order to determine how the IDS detects and responds to those attacks. D-ITG (which stands for Distributed Internet Traffic Generator), on the other hand, is a simple but very versatile packet generator that can generate packets of different sizes and different inter-departure times. The packet-generation machine is equipped with a 3 GHz Pentium 4 processor, 4 GB of RAM, and a 1 Gb/s network interface card. The malicious traffic generated was merged with regular Internet traffic through a Cisco router and directed to the corporate LAN, as shown. A simple IDS software solution was developed for implementing the inspection strategy described above. The code was developed in Matlab and converted to C (for brevity, the details of the code will not be discussed here). Essentially, the code inspects the headers of the packets in small flows (flows that are 1-4 packets in length). The headers of packets in larger flows are inspected with a frequency that is inversely proportional to the size of the flow, as described in the previous section. After 100 packets are selected, the code computes P H from Eq. (16), for 3 different attack scenarios: DoS, port scan, and worm propagation. If P H is found to have exceeded a suitable threshold that is calculated from Eq. (13), the code immediately moves to full inspection mode, where the actual contents of the packets selected and all subsequent packets are inspected for the presence of well-known patterns [11, 28, 29]. Any packets that are found to be malicious are quarantined.c Throughout each test conducted, the number of user processes n running on the LAN was purposely maintained at a constant value (according to the theory in Section 2, the higher the value of n the lower the threshold that must be used).
3.4. Testing of an optimized SNORTd software package
As Figure 10 shows, for a link speed of 10 Mbps, the percentage of hostile packets that slipped through Snort was essentially the same as the percentage shown in Figure 5. The percentage increases slightly at higher link speeds and reaches a maximum of about 0.2% (or 2 packets for every 1,000 malicious packets) at a link speed of 1 Gb/s. As the results clearly show, the effect of the link speed on this intrusion detection approach is essentially negligible.
The analysis of network intrusions as Markov chains disproves the common notion that it is necessary to fully inspect every packet entering a network in order to ensure security. The results shown here fully support the experimental results that were published recently by Androulidakis and Papavassiliou [11, 12]. The analysis, together with the testing data, demonstrates that it is sufficient to inspect only a small number of packets sampled predominantly from small flows, as long as the probability of occurrence of intrusion P H is below a critical threshold that is determined from Eq. (13) and calculated in real time from Eq. (16). The implications of the research presented here for software IDS solutions such as SNORT are substantial, as the selective inspection of packets allows the IDS to handle high speed links without dropping any packets. Hence, it is essentially possible for most of the time to eliminate the speed bottleneck problem without compromising security.
aHere, it is important to point out that a "process", as defined in the previous section, can be started with one or more packets. The procedure for calculating P H , however, will be based on the direct inspection of packets.
bIt can be argued that the relationship between P H and H(normalized) should be a proportionality relationship, not an exact equality as shown in Eq. (16). However, the objective of this work is to obtain a reasonable estimate for the likelihood of the occurrence of intrusion, not to seek idealized, precise mathematical relationships. In reality, due to the nature of the problem, the mathematical framework presented here is not meant to be highly precise, but it can be made sufficiently precise with the inclusion of experimental data.
cIt is to be pointed out that DoS attacks can be identified only from the packet headers.
dSNORT is a registered trademark of Sourcefire, Inc.
- Sourcefire, Inc.: Snort: The Open Source Network Intrusion Detection System.2007. [http://www.snort.org]
- Secutrain, Inc.: SecureNet Pro: Protection Against Internet Security Threats.2007. [http://www.intrusion.com]
- Hogwash Intrusion Detection System2007. [http://hogwash.sourceforge.net/]
- Symantec, Inc.: Symantec Host IDS: Scalable Intrusion Detection and Prevention Solution for Critical Servers.2007. [http://www.symantec.com]
- Checkpoint Ltd.: IPS1: Robust and Accurate Intrusion Prevention.2007. [http://www.checkpoint.com]
- Weaver N, Paxson V, Gonzalez JM: The shunt: an FPGA-based accelerator for network intrusion prevention. Proc. 15th Ann. ACM Intl Symp. Field-Programmable Gate Arrays (FPGA 07) 2007, 292.
- Hwang WJ, Roan HC, Shih YN, DanLo CT, Ou CM: FPGA-based ROM-free network intrusion detection using shift-or circuit. J Embedded Comput 2009, 3(2):99.
- Kephart JO, White SR: Directed graph epidemiological models of computer viruses. Proceedings of the 1991 IEEE Computer Society Symposium on Research in Security and Privacy 1991, 343.View Article
- Kephart JO, White SR: Measuring and modeling computer virus prevalence. Proceedings of the 1993 IEEE Computer Society Symposium on Research in Security and Privacy 1993, 2.View Article
- Wang Y, Wang C: Modeling the effects of timing parameters on virus propagation. Proceedings of the 2003 ACM Workshop on Rapid Malcode 2003, 61.View Article
- Androulidakis G, Papavassiliou S: Improving network anomaly detection via selective flow-based sampling. IET Commun 2008, 2(3):399. 10.1049/iet-com:20070231View Article
- Androulidakis G, Chatzigiannakis V, Papavassiliou S: Network anomaly detection and classification via opportunistic sampling. IEEE Netw 2009, 23(9):6.View Article
- Vert G, Frincke DA, McConnell JC: A visual mathematical model for intrusion detection. Proceedings of the 21st NIST-NCSC National Information Systems Security Conference 1998, 1.
- Zhang Z, Li J, Manikopoulos C, Jorgenson J, Ucles J: A hierarchical anomaly network intrusion detection system using neural network classification. Proceedings of the 2nd Annual IEEE Systems, Man, Cybernetics Information Assurance Workshop (IAW 2001) 2001, 6.
- Kodialam M, Lakshman TV: Detecting network intrusions via sampling: a game theoretic approach. INFOCOM--22nd Annual Joint Conference of the IEEE Computer and Communications Societies 2003, 1880.
- Song H, Lockwood JW: Multi-pattern signature matching for hardware network intrusion detection systems. GLOBECOM--IEEE Global Telecommunications Conference 2005, 5.
- Subhadrabandhu D, Sarkar S, Anjum F: A framework for misuse detection in ad hoc networks--Part I. IEEE J Sel Areas Commun 2006, 24(2):274.View Article
- Subhadrabandhu D, Sarkar S, Anjum F: A framework for misuse detection in ad hoc networks--Part II. IEEE J Sel Areas Commun 2006, 24(2):290.View Article
- Jin S, Yeung DS, Wang X: Network intrusion detection in covariance feature space. Pattern Recogn 2007, 40(8):2185. 10.1016/j.patcog.2006.12.010View Article
- Sauer CH, Chandy KM: Computer Systems Performance Modeling. Prentice Hall, Englewood Cliffs, NJ; 1981.
- Kobayashi H: Modeling and Analysis: an Introduction to System Performance Evaluation Methodology. Addison Wesley, Reading, MA; 1978.
- Reza FM: An Introduction to Information Theory. Dover, New York, NY; 1994.
- Cover TM, Thomas JA: Elements of Information Theory. Wiley, New York, NY; 1999.
- Moore D, et al.: Inside the slammer worm. IEEE Sec Privacy 2003, 1(4):33. 10.1109/MSECP.2003.1219056View Article
- Hohn N, Veitch D: Inverting sampled traffic. IEEE/ACM Trans Netw 2006, 14(1):68.View Article
- Mai J, et al.: Impact of packet sampling on portscan detection. IEEE J Sel Areas Commun 2006, 24(12):2285.View Article
- Mai J, et al.: Is sampled data sufficient for anomaly detection. Internet Measurement Conf., Rio de Janeiro, Brazil 2006, 165.
- Barford P, Plonka D: Characteristics of network traffic flow anomalies. Proceedings of the 1st ACM SIGCOMM Internet Measurement Wksp., San Francisco, CA 2001, 69.
- Sridharan A, Ye T, Bhattacharyya S: Connectionless Port Scan Detection on the Backbone. IEEE IPCCC Malware Wksp., Phoenix, Az 2006, 1.
- Peebles PZ: Probability, Random Variables, and Random Signal Principles. McGraw Hill, New York, NY; 1993.
- IDS Wakeup: A collection of tools for testing network intrusion detection systems2007. [http://www.hsc.fr/ressources/outils/idswakeup/index.html.en]
- Botta A, Dainotti A, Pescape A: Multi-Protocol and Multi-Platform Traffic Generation and Measurement. IEEE INFOCOM, Anchorage, Alaska 2007, 12. [http://www.grid.unina.it/software/ITG/]
- Lan K, Hussain A, Dutta D: Effect of malicious traffic on the network. Proceeding of Passive and Active Measurement Workshop (PAM) 2003, 1.
- Koziol J: Intrusion Detection with SNORT. Pearson Education, Upper Saddle River, NJ; 2003.
- Cox K, Greg C: Managing Security with SNORT and IDS Tools. O'Reilly Media, Sebastopol, CA; 2004.
- Lee W, Cabrera JB, Thomas A, Balwalli N, Saluja S, Zhang Y: Performance adaptation in real-time intrusion detection systems. Proceedings of the Fifth International Symposium on Recent Advances in Intrusion Detection (RAID 2002), Lecture Notes in Computer Science, Zurich, Switzerland 2002.
- Schaelicke L, Slabach T, Moore B, Freeland C: Characterizing the performance of network intrusion detection sensors. Proceedings of the Sixth International Symposium on Recent Advances in Intrusion Detection (RAID 2003), Lecture Notes in Computer Science, Berlin-Heidelberg-New York 2003.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.