Software and Data

Prof. Barford's research is funded in part by the DHS IMPACT program. IMPACT is focused on providing diverse data sets for security and networking reserch. Check out the IMPACT Portal for data sets that may be of use to you.

Unless otherwise noted, all of the software and data described below can be found here.

Adscape: Understanding online Display Advertising

Over the past decade, advertising has emerged as the primary source of revenue for many web sites and apps. In order to understand more broadly understand the features, mechanisms and dynamics of display advertising i.e., the Adscape, we have developed capabilities to gather online advertising data. Our approach takes the perspective of users who are the targets of display ads shown on web sites. We developed a scalable crawling capability that enables us to gather the details of display ads including creatives and landing pages. Our crawling strategy is focused on maximizing the number of unique ads harvested. Of critical importance to our work is the recognition that a user’s profile (i.e., browser profile and cookies) can have a significant impact on which ads are shown. Included in the software repository we include both our crawler and the ad recognition modules. The code repository can be found here. Generation of user profiles requires following the methodology described in our WWW '14 paper.

P. Barford and I. Canadi and D. Krushevskaja and Q. Ma and S. Muthukrishnan. "Adscape: Harvesting and Analyzing Online Display Ads", In Proceedings of the World Wide Web Conference (WWW '14), April, 2014. (paper)

Internet Atlas

Internet Atlas is a visualization and analysis portal for diverse Internet measurement data. The starting point for Atlas is a geographically anchored representation of the physical Internet including (i) nodes (e.g., hosting facilities and data centers), (ii) conduits/links that connect these nodes, and (iii) relevant meta data (e.g., source provenance). This physical representation is built by using search to identify primary source data such as maps and other repositories of service provider network information. This data is then carefully entered into the database using a combination of manual and automated processes including consistency checks and methods for geocoding both node and link data. Data is added to the repository on an on-going basis. The repository currently contains over over 10k PoP locations and over 13K links for over 390 networks around the world. Customized interfaces enable a variety of dynamic (e.g., BGP updates, Twitter feeds and weather updates) and static (e.g., highway, rail and census) data to be imported into Atlas, and to layer it on top of the physical representation. The openly available web portal is based on the widely-used ArcGIS geographic information system, which enables visualization and diverse spatial analyses of the data.

R. Durairajan, S. Ghosh, X. Tang, P. Barford and B. Eriksson. "Internet Atlas: A Geographic Database of the Internet", In Proceedings of the 5th ACM HotPlanet Workshop, August, 2013. (paper)

B. Eriksson, R. Durairajan and P. Barford. "RiskRoute: A Framework for Mitigating Network Outage Threats", In Proceedings of ACM CoNEXT, December, 2013. (paper)

Path Audit and Energy Audit

In our quest to understand the detailed characteristics of network devices (e.g., routers and switches) deployed in netwsorks, we have developed two tools that allow us to gather unique information from nodes. The first tool - PathAudit - interprets the interface DNS names gathered from traceroute probes to identify key information such as interface type, bandwidth, manufacturer. The second tool - EnergyAudit - assumes that one has credentials to gather information from devices within a domain and focuses specifically on identifying the energy consumption of devices in a network. The code for both tools can be found here.

Joe Chabarek and Paul Barford. "What's in a Name? Decoding Router Interface Names", In Proceedings of the 5th ACM HotPlanet Workshop, August, 2013. (paper)

Joe Chabarek and Paul Barford. "Energy Audit: Monitoring Power Consumption in Diverse Network Environments", In Proceedings of the 4th International Green Computing Conference (IGCC), June, 2013. (paper)

DNS Query Stream Analysis

The Domain Name System (DNS) is a one of the most widely used services in the Internet. We investigated the question of how DNS traffic monitoring can provide an important and useful perspective on network traffic in an enterprise. We organized DNS query streams into three classes: canonical (i.e., RFC-intended behaviors), overloaded (e.g., black-list services), and unwanted (i.e., queries that will never succeed). To do this, we developed a contextaware clustering methodology that can be scaled to expose the desired level of detail of each traffic type, and to expose their time varying characteristics. We implemented our method in a tool we call TreeTop, which can be used to analyze and visualize DNS traffic in real-time. The code for TreeTop can be found here.

Dave Plonka and Paul Barford. "Flexible Traffic and Host Profiling via DNS Rendezvous", In Proceedings of the Workshop on Securing and Trusing Internet Names (SATIN '11), April, 2011. (paper)

Dave Plonka, Paul Barford. Context-aware Clustering of DNS Query Traffic. In Proceedings of ACM Internet Measurement Conference, October, 2008. (paper)

The GENI Instrumentation and Measurement System (GIMS)

GIMS is a high-speed packet capture system designed to run on commodity hardware. GIMS was developed to provide researchers a highly-configurable system for monitoring networking experiments in dedicated environments like GENI (www.geni.net). In a larger sense, however, GIMS is a control framework for network instrumentation, and packet capture is simply the first functionality that this control framework currently provides. The system consists of various web-based GUI components, a coordinating 'backend suite of applications, a database, one or more hardware capture devices, and a capture daemon which handles the actual capture, processing, and storage of packets. More information on the system is available on the GIMS home page: http://gims.wail.wisc.edu. The GIMS software distribution is also available on the GIMS wiki, which can be found here.

Charles Thomas, Joel Sommers, Paul Barford, Dongchan Kim, Ananya Das, Roberto Segebre and Mark Crovella. "A Passive Measurement System for Network Testbeds", In Proceedings of Tridentcom. June, 2012. (paper)

The fs Network Simulator

fs is a network simulation platform focused on the generation of representative flow records in large network topologies. Traditional networks simulators such as NS2, have focused on packets as the basic abstraction. The fs simulator focuses on flowlets as the basic abstraction. This enables a dramatic enhancement in scalability versus packet-level simulators. The fs simulator has been designed to enable flexible specification of network topologies and traffic characteristics. It also enables a range of anomalous conditions (traffic surges and outages) to be specified. Our investigations show that simulations in fs are identical to simulations in NS2 down to a time granularity of about 1sec. This enables a host of research questions to be addressed using this new platform. Developing of fs is on-going, and aimed at further enhancing scalability, expanding the range of capabilities of the system to make it applicable to a broader set of research domains.

Joel Sommers, Rhys Bowden, Brian Eriksson, Paul Barford, Matt Roughan and Nick Duffield. "Efficient Network-wide Flow Record Generation", In Proceedings of IEEE INFOCOM '11, April, 2011. (paper)

The BasisDetect Anomaly Detection Tool

BasisDetect is a tool that enables network anomalies to be detected in a timely and accurate fashion. The design of BasisDetect is based on the notion that both normal and anomalous network traffic has intrinsic structure in their signal characteristics. BasisDetect enables profiles of normal and abnormal traffic to be composed using labeled examples of each. The tool can then be applied to live traffic (NetFlow records) to identify instances of anomalous traffic that is a composite of the training set. This enables broad classes of anomalies to be identified with relatively modest training. Finally, BasisDetect can be applied to anomaly detection from a point source or from network-wide sources.

Brian Eriksson, Paul Barford, Rhys Bowden, Nick Duffield, Joel Sommers and Matt Roughan. "BasisDetect: A Model-based Network Event Detection Framework", In Proceedings of the ACM Internet Measurement Conference (IMC '10), November, 2010. (paper)

The PathPerf TCP Throughput Measurement Tool

PathPerf is a tool for predicting TCP throughput on end-to-end network paths. PathPerf uses a combination of lightweight active measurements and Support Vector Machines to predict throughput. Unlike earlier approaches which focus on bulk transfers, PathPerf supports TCP throughput prediction for transfers of arbitrary file sizes and is able to respond instantaneously to changes in path conditions. PathPerf differs from well-known tools such M-Lab's NDT speed test, which measures current TCP throughput. PathPerf predicts future throughput and is designed to allow applications to adapt to changes in TCP throughput on short timescales, e.g., by facilitating the selection of the highest throughput path in overlay networks or by allowing a non-TCP application to compute the appropriate TCP-friendly rate.

Mariyam Mirza, Paul Barford, Xiaojin Zhu, Suman Banerjee and Mike Blodgett. "Fingerprinting 802.11 Rate Adaptation Algorithms", In Proceedings of IEEE INFOCOM '11, April, 2011. (paper)

Mirza, Mariyam; Sommers, Joel; Barford, Paul; Zhu, Jerry. A Machine Learning Approach to TCP Throughput Prediction . In Proceedings of ACM SIGMETRICS, June 2007. (paper)

The Badabing Packet Loss Measurement Tool

Badabing is a tool for highly accurate active measurement of end-to-end packet loss properties. Badabing generates a geometrically distributed packet probe stream that measures the frequency and duration of packet loss episodes on a given path. The measurements produced by Badabing are nearly an order of magnitude more accurate than prior methods such as simple Ping or Poisson-modulated probes. The tool has also been enhanced with a method for accurately measuring loss rate, a key characteristic of interest in SLA monitoring. The tool must be installed on both the sending and receiving host.

Sommers, Joel; Barford, Paul; Duffield, Nick; Ron, Amos. A Geometric Approach to Improving Active Packet Loss Measurement, In IEEE/ACM Transactions on Networking, June, 2008. (paper)

Sommers, Joel; Barford, Paul; Duffield, Nick; Ron, Amos. Improving Accuracy in End-to-end Packet Loss Measurement, In Proceedings of ACM SIGCOMM, August, 2005. (paper)

The Yaz Available Bandwidth Measurement Tool

Yaz is a tool for low impact, highly accurate active measurement of end-to-end available bandwidth. Yaz sends groups of packets at increasingly higher rates along a target path and measures the changes in packet spacing at the receiver. Similar to the Pathload tool, packet compression at the receiver is an indication that available bandwidth has been reached, but unlike Pathload, packet expansion also indicates available bandwidth has been reached. Yaz accomplishes these measurements with a minimal number of packets per group thereby limiting the impact on the path under test, and has been shown to provide more accurate measurements than prior tools under a wide variety of network conditions. The tool must be installed on both the sending and receiving host.

Sommers, Joel; Barford, Paul; Willinger, Walter. Laboratory-based Calibration of Available Bandwidth Estimation Tools. In Elsevier Microprocessors and Microsystems Journal, 31(4), 2007. (paper)

Sommers, Joel; Barford, Paul; Willinger, Walter. A Proposed Framework for Calibration of Available Bandwidth Estimation Tools, In Proceedings of IEEE Symposium on Computers and Communication (ISCC '06), June 2006. (paper)

Sommers, Joel; Barford, Paul; Willinger, Walter. A Proposed Framework for Calibration of Available Bandwidth Estimation Tools (extended version), UW Technical Report, September, 2005. (paper)

The Trident Malicious Workload Generation Tool

Trident is a payload aware traffic generation tool. The purpose of Trident is to generate test traffic to assess the resiliency and performance of network systems such as firewalls and NIDS in emulation testbeds such as WAIL. Trident's architecture includes the ability to generate both malicious and benign test traffic. The malicious component of Trident enables known threats such as DDoS attacks and Worms such as Welchia and Blaster to be flexibly generated as well as new, as yet unseen attacks to be composed. The benign component of Trident enables payloads from the DARPA datasets or traces collected on one's own network to be composed flexibly for tests. While the basic architecture of Trident makes it unsuitable for use as a tool for black hats in the wide area (as well as the fact that it will not be distributed with actual exploit payloads), the sensitive nature of this type of traffic generation means that we will only make the tool available for use on verifiable projects.

NOTE: Trident is not available via anonymous download. Send email if you want to use the tool.

Sommers, Joel; Yegneswaran, Vinod; Barford, Paul. Recent Advances in Network Intrusion Detection Systems Tuning, In Proceedings of the 40th IEEE Conference on Information Sciences and Systesm (CISS '06), March, 2006. (paper)

Sommers, Joel; Yegneswaran, Vinod; Barford, Paul. Toward Comprehensive Traffic Generation for Online IDS Evaluation, UW Technical Report, August, 2005. (paper)

Sommers, Joel; Yegneswaran, Vinod; Barford, Paul. A Framework for Malicious Workload Generation, In Proceedings of ACM Internet Measurement Conference, October, 2004. (abstract, paper)

The SPLAT Viz Tool

Visualizations provide a natural means for organizing and mining data and identifying characteristics of interest in large complex data sets. SPLAT is a Scatter and Phase pLot Animation Tool. SPLAT provides a broad set of capabilities for exploratory and detailed analysis of large sets of measurements of Internet behavior or structure. Scatter plots and phase plots are two commonly-used graphical techniques for examining correlated behavior between different variables, and an important feature of SPLAT is that it can animate the scatter and phase plots over time to reveal dynamic characteristics of the data under study.

Sommers, Joel; Barford, Paul; Willinger, Walter. SPLAT: A Visualization Tool for Mining Internet Measurements, In Proceedings of the Passive and Active Measurement Conference (PAM '06), March, 2006. (paper)

Malicious IP Addresses Data Set

Malicious sources, as identified by their IP addressses, are systems that have been observed participating in unwanted activity such as attacks and intrusions in the Internet. The structure and characteristics of the IP addresses and subnets of these sources is important to not only devising means for thwarting unwanted activity but also for conducting simulation and analysis of unwanted activity. This dataset contains nearly 10M IP addresses from both Dshield and honeypot monitors collected over a 7 day period in 2004.

Barford, Paul; Nowak, Rob; Willett Rebecca; Yegneswaran, Vinod. Toward a Model for Sources of Internet Background Radiation, In Proceedings of the Passive and Active Measurement Conference (PAM '06), March, 2006. (paper)

The NetPath Network Delay Emulator

NetPath is a tool for scalable network path emulation. Emulating the conditions on individual links or a series of links in a network - including propagation and signaling delays, bit errors, duplication, reordering and loss - is essential in test bed environments such as WAIL. NetPath is a system based on the Click modular router that enables packets to be subjected to the aforementioned conditions as they traverse links in a test bed environment. The significant strengths of NetPath over tools such as Dummynet and NIST Net is it's ability to scale to accommodate heavy loads, it's ability to scale to high speeds, and it's delay accuracy over a wide range of operating conditions. Included with NetPath is the VICT configuration tool which enables NetPath to be easily configured in large testbeds.

Agarwal, Shilpi; Sommers, Joel; Barford, Paul. Scalable Network Path Emulation, In Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), September, 2005. (paper)

The Harpoon Network Traffic Generator

Harpoon is a tool for generating TCP and UDP packet traffic that mimics the IP flow level characteristics of traffic measured at an actual router. Harpoon traffic is generalized in the sense that it does not attempt to recreate flows specific to particular applications. Rather, the aggregate characteristics of TCP and UDP flows from the perspectives of small/large time scales and IP space diversity are embodied in a set of statistical distributions that are used by the tool to create packet flows. The tool itself has client and server components which generate/serve data transfer requests that use the transport protocols that reside on the host systems.

Sommers, Joel; Barford, Paul. Self-Configuring Network Traffic Generation, In Proceedings of ACM Internet Measurement Conference, October, 2004. (abstract, paper)

Sommers, Joel; Kim, Hyungsuk; Barford, Paul. Harpoon: A Flow-Level Traffic Generator for Router and Network Tests, In Proceedings of ACM SIGMETRICS (poster), June, 2004. (abstract, paper)

The Wisconsin Advanced Internet Laboratory

The Wisconsin Advanced Internet Laboratory (WAIL) is a unique testbed environment for conducting network and distributed systems research. The vision of WAIL is to be able to recreate instances of the Internet from end-to-end-through-core in a laboratory environment. What sets WAIL apart from other network testbeds is that real IP networking hardware (routers, switches, firewalls, etc.) is used to create the network configurations used in tests. WAIL currently has over 75 IP routers and switches with a wide variety of interfaces, over 200 PC workstations for measurement and traffic generation, and a variety of support hardware and storage capability for experiments and analysis. WAIL has been enabled through a foundational grant from Cisco Systems and the TOSA foundation. WAIL is openly available to the greater networking community for research use. You can access the lab (A HREF="http://www.schooner.wail.wisc.edu">here.)

Barford, Paul; Landweber, Larry. Bench-style Network Research in an Internet Instance Laboratory, In Proceedings of ACM SIGCOMM Computer Communications Review, 33(3), July, 2003. (abstract, paper).

Barford, Paul; Landweber, Larry. Bench-style Network Research in an Internet Instance Laboratory, In Proceedings of SPIE ITCom, Boston, MA, August, 2002. (abstract, paper).

The TCPEval TCP Analysis Tool

A TCP evaluation tool which generates a large variety of statistical information from tcpdump traces taken at end points of a wide area connection. The statistics include transaction latencies, packet latencies, packet loss characteristics, and inter-packet spacing characteristics. Tcpeval also generates both time line diagrams and sequence plots for transactions. It also extracts the critical path from TCP transactions which enables latency for individual transaction to be decomposed into delays caused by the server, the client and the network. It generates statistics on critical path delays for individual transactions as well as summary data for all transactions in a trace.

Barford, Paul; Crovella, Mark. Critical Path Analysis of TCP Transactions, June, 2001. In ACM Transactions on Networking, 9(3), pp. 238-248. (abstract, paper).

Barford, Paul; Crovella, Mark. Critical Path Analysis of TCP Transactions, January, 2000. (abstract, paper). In Proceedings of ACM SIGCOMM 2000 (also in proceedings of ACM SIGCOMM America Latina Conference, San Jose, Costa Rica, 2001)

The SURGE Web Workloaad Generator

A Web workload generator which is based on a set of seven statistical characteristics of Web browsing by clients (file sizes, request sizes, document popularity, OFF times, object sizes, temporal locality and session lengths). SURGE consists of a statistical input generator which can be configured by users to match specific client use characteristics, and a request generator which can be run on multiple clients simultaneously to generate a wide variety of load levels. SURGE is currently being used at thousands of sites world wide including major academic institutions, industrial labs telecom companies and startups.

Barford, Paul. Web Server Performance Analysis, ACM SIGMETRICS '99 Tutorial, IEEE LCN '99 Tutorial. (abstract, slides).

Barford, Paul; Crovella, Mark. Generating Representative Web Workloads for Network and Server Performance Evaluation, May 5, 1997 (revised December 31, 1997). (abstract, paper). In Proceedings of ACM SIGMETRICS '98.

Barford, Paul; Crovella, Mark. An Architecture for a WWW Workload Generator, October 3, 1997. (abstract, paper). World Wide Web Consortium Workshop on Workload Characterization, October, 1997.

Self-Similarity and Long Range Dependence in Computer Networks

(Maintained in collaboration will Sally Floyd, ICIR.) This site provides links and brief summaries for many of the important papers that have been written the area of self-similarity and long range dependence in computer networks. Self-similar and long range dependent characteristics in computer networks present a fundamentally new set of problems to people doing network analysis and/or design, and many of the previous assumptions upon which systems have been built are no longer valid in the presence of self-similarity. The web page can be found here.