Workshop on Internet Signal Processing 2004

Final Report

Paul Barford
University of Wisconsin

KC Claffy
CAIDA

Alfred Hero
University of Michigan

Craig Partridge
BBN

Walter Willinger
AT&T Labs-Research

1. Overview and Summary

The first Workshop on Internet Signal Processing (WISP1) was hosted by CAIDA at the University of California - San Diego campus on November 11, 12, 2004. The goal of the workshop was to explore opportunities for interdisciplinary collaboration and cross-pollination through the application of novel analytic techniques to a variety of challenging networking problems. This invitation-only workshop brought together a diverse group of 36 researchers (10 of whom were students) from the networking, signal processing and applied mathematics communities for an exchange of ideas. Support for travel and local expenses for attendees was provided by generous donations from the National Science Foundation (CNS-0456821) and from Cisco Systems which helped to enable an outstanding group to be assembled. Logistical support was provided by CAIDA staff which enabled the workshop to run smoothly and professionally.

The workshop began with a keynote talk delivered by Craig Partridge, and was then organized as a series of sessions in which a wide variety of subjects were treated. Each session began with several 30-minute talks that treated broader issues of Internet science and/or analytic methods. These were followed by sets of 15-minute talks that treated more specific networking issues and/or analytic techniques. Discussion during talks was encouraged but moderated, and breaks/meals offered opportunities for continued discussion. The format enabled almost all of the attendees to give a presentation over the two days of the workshop.

WISP concluded with a wrap-up session in which the major themes that emerged over the course of the workshop were discussed. There was strong consensus on the following points:

        There are many opportunities for collaborations between network researchers and signal processing/applied math experts. Open areas include network characterization (e.g., RTT or bandwidth estimation), network anomaly detection and classification, intrusion detection, distributed event monitoring and analysis, and traffic forecasting among others.

        The development of mechanisms for establishing ground truth in all estimation, identification and classification studies is essential, but continues to pose significant practical and theoretical challenges.

        Data sets that are labeled and documented by domain experts are essential for minimizing misuse or inappropriate analysis of available measurements by other communities.

Finally, there was a discussion of the utility of WISP1 and the interest in the possibility of another WISP in the future. There was strong positive support for this year's workshop and a great deal of interest in participating in another WISP in '05. In general, attendees would have liked an expansion of the workshop to include more time for discussion, panels and possibly tutorials. The organizers are in the process of reviewing the feedback provided on workshop questionnaires and expect to announce WISP2 in early 2005.

2. Objectives and Organization

2.1 Technical Description

Our basic understanding of Internet characteristics and behaviors is a key foundational component upon which new technologies will be developed. An important means for expanding the basic knowledge of how the Internet functions is through the application of mathematical, statistical and analytical techniques that lend themselves naturally to specific problem domains. The difficulty in this regard is that it is rarely the case that one can simply grab a technique, "turn the crank" and get results that are either correct or meaningful. Most methods have nuance in their application that is only understood within the community that developed them. Conversely, it is often difficult for experts in specific analytical domains to identify or address Internetworking problems because they lack the domain knowledge to make sense of the available data or the results.

The motivation for the first Workshop on Internet Signal Processing (WISP) was the emergence of new and powerful signal processing (SP) and multiresolution analysis (MRA) techniques in the networking domain. It is becoming increasingly apparent that SP/MRA-based or related techniques could lead to advances in areas such as network tomography; network data collection; data dimensionality reduction; data compression; traffic analysis in wired, ad-hoc wireless, and sensor networks; and network anomaly/intrusion detection. This workshop was also interested in new analysis techniques and approaches that are motivated by and account for the increasing availability of spatio-temporal network measurements from PlanetLab-like or Abilene-type infrastructures.

The goal of WISP was to shine light on the opportunities for analysis of spatio-temporal network data and to foster discussion between network researchers and groups from the traditional signal processing, statistics and applied mathematics areas. To that end, the technical focus of the workshop featured:

        Application of a range of signal analysis methods in the acquisition and evaluation of Internet behavior and performance

        Advances in signal processing and related areas

        Advances in applied mathematics specific to multiresolution analysis

        Advances in statistical analysis techniques

        Emerging communication and computing technologies over wireless, optical and quantum media

2.2 Format

WISP was conceived and organized by the following committee:

        Paul Barford - University of Wisconsin

        KC Claffy - CAIDA/University of San Diego

        Alfred Hero - University of Michigan

        Craig Partridge - BBN

        Walter Willinger - AT&T Labs-Research

The primary organizational objective for the workshop was to assemble a small (targeted at 30) but strong and diverse group of students and senior researchers from the fields of networking, signal processing and applied math. The intention for keeping the size limited was to facilitate full group discussion. Participation in the workshop was by invitation only. It turned out to be impossible to balance the desire for diversity in participants and to keep the size of workshop below 30 - the final number of participants was 36. Among these, of critical importance were the 10 student attendees all of whom made strong contributions to both presentations and technical discussions.

The workshop was hosted at CAIDA's facility on the University of California San Diego campus. It was arranged as a series of talks over a wide selection of topics over a two day period (November 11 and 12).  Each session consisted of either longer 30 minute talks aimed at outlining broad problem domains or analytic techniques, or shorter 15 minute talks aimed at highlighting a more specific technique or problem. Discussion during talks was encouraged but moderated in order to stay close to the target schedule. Discussions also took place during breaks and meals.

The final agenda for the workshop was as follows:

Thursday, November 11, 2004

        8:15 - 9:00 Breakfast

        9:00 - 10:00 Signal Processing

o        Craig Partridge - Whither Signal Processing and the Internet? (30 minutes)

o        kc claffy - Measurements fueling Internet SP (30 minutes)

        10:00 - 10:30 Break

        10:30 - 12:00 Short Presentations

o        Alefiya Hussain - Identification of Repeated Denial of Service Attacks (15 minutes)

o        Marcos Aguilera - Performance Debugging for Distributed Systems of Black Boxes (15 minutes)

o        Rui Castro - Hierarchical Clustering and Network Topology Identification (15 minutes)

o        Abhishek Kumar - Data-Streaming Algorithms for Monitoring High speed Traffic (15 minutes)

o        Nick Feamster - On BGP instabilities and end-to-end path failures (15 minutes)

o        Darryl Veitch - A Tricky Problem (15 minutes)

        12:00 - 1:30 Lunch

        1:30 - 3:00 Signal Processing, Methods

o        Bin Yu - Statistical Aspects of SP (30 minutes)

o        Mark Crovella - Applying the Subspace Method to Network Traffic Analysis (30 minutes)

o        Eric Moulines - Applied SP (30 minutes)

        3:00 - 3:30 Break

        3:30 - 4:30 Network and Wireless

o        Dave Marchette - Statistical and Visualization Techniques for Streaming Data (30 minutes)

o        Ramesh Rao - Reflections on Modeling, Analysis and Insights in Networking Research (30 minutes)

        4:30 - 5:00 Break

        5:00 - 6:00 Short Presentations

o        Rajesh Krishnan - Traffic and Topological Analysis of Wireless Networks (15 minutes)

o        Thomas Karagiannis - A Nonstationary Poisson View of Internet Traffic (15 minutes)

o        Joel Sommers - Phase plot analysis of Internet packet traffic (15 minutes)

o        Neal Patwari - Watching Traffic for an Anomaly: Data Visualization using Dimensionality Reduction (15 minutes)

        6:00 Reception

Friday, November 12, 2004

        8:15 - 9:00 Breakfast

        9:00 - 10:00 Trends

o        Alfred Hero - Network Inference and Signal Processing (30 minutes)

o        Paul Barford - Trends in Network Measurements (30 minutes)

        10:00 - 10:30 Break

        10:30 - 12:00 Short Presentations

o        Konstantina Papagiannaki - Long-Term Forecasting of Internet Backbone Traffic (15 minutes)

o        Harsha Madhyastha - Wasted Measurements in the Internet (15 minutes)

o        Felix Hernandez-Campos - From Traffic Measurement to Realistic Workload Generation (15 minutes)

o        Abhishek Kumar - Outwitting the Witty Worm -- Reconstruction and Analysis of an Internet-Scale Event by Exploiting Pseudo-Random Number Generation (15 minutes)

o        Nick Feamster - Open problems in anomaly detection in BGP data (15 minutes)

o        Christos Papadopolous - ANT: Analysis of Network Traffic (15 minutes)

        12:00 - 1:30 Lunch

        1:30 - 2:30 Signal Processing, Methods

o        Amos Ron - Mathematical Aspects of SP (30 minutes)

o        Andre Broido - Spectroscopy Methods (30 minutes)

        2:30 - 3:00 Break

        3:00 - 4:30 Networks

o        Constantine Dovrolis - Why is the Internet traffic bursty in short (sub-RTT) time scales? (30 minutes)

o        Randy Moses - Distributed Sensing and Inference (30 minutes)

o        David Meyer - Quantum computing and internet signal processing (30 minutes)

        4:30 - 5:00 Break

        5:00 - 5:45 Short Presentations

o        Jorma Kilpi - Passive Monitoring of RTT spikes (15 minutes)

o        Vinay Ribeiro - Optimal probing schemes for estimation of multiscale traffic (15 minutes)

o        Ljupco Kocarev - Nonlinear Dynamics in TCP/IP networks (15 minutes)

        5:45 Wrap-up Discussions

2.3 Support

Support was sought to defray travel and local expenses for participants. WISP was extremely fortunate to receive support from both the National Science Foundation and Cisco Systems toward this objective. This funding was especially important for student participants who otherwise might not have funding available. Logistical, web page management and on-site support was provided by the CAIDA staff. Their participation and assistance enabled the workshop to run smoothly and professionally for all participants. The organizing committee is deeply grateful to NSF, Cisco and CAIDA for their support.

3. Technical Discussion

Talk abstracts, speaker bios as well as other materials (including talk slides) from the workshop can all be found on the WISP web site:  http://www.caida.org/outreach/isma/0411/.   The goal of this section is to give an overview of some of the themes that emerged during the workshop.

3.1 Signal Processing and Statistical Techniques

In his keynote, Partridge urged participants to move beyond pretty pictures and to aim to try to understand what underlying reality different signal processing techniques exposed (i.e., why the pretty pictures looked the way they do).  While a few of the later speakers dutifully admitted that much of their results were, in fact, pretty pictures, the workshop spent a lot of time working on understanding.

A number of talks presented interesting algorithms.  Aguilera talked about ways to treat distributed systems as black boxes, and then using statistical techniques to find causal paths and understand how information was moving through a distributed system.  The presentation nicely complemented Krishnan's talk on statistical techniques (including causal techniques) to understand traffic flows in encrypted wireless networks.  Crovella gave a wonderful talk that showed how Principal Component Analysis (PCA) could be used to improve network understanding. Papagiannaki discussed the use of wavelet analysis and linear time series models to show that IP backbone traffic exhibits long-term trends, strong periodicities, and variability at multiple time scales.  Yu talked about Internet tomography and compared different approaches to inferring traffic matrices.  Castro discussed what sort of network problems can be addressed through the use of sandwich probes.  Hussain presented work on using Fourier-based techniques to better understand the structure of DDoS attacks, and Papadopoulos discussed the use of spectral analysis techniques to the study of network traffic in general.  Kumar presented his work on applying techniques from data-streaming to the design of efficient monitoring applications.  In a second talk, he used network telescope traces of the Witty worm and reconstructed the series of actions performed by the worm at each infected site.  Computing statistics and estimates on streaming data was also the topic of Marchette's talk.

Overall, the talks clearly described a number of exciting research methods and several important problems to which they may be applied.  It is clear, however, that we are still in the early stages of exploration, and that validation (i.e., establishing ground truth) has to play a much more dominant role than in the past.

3.2 Plenty of Data but a Shortage of Good Data

There were a number of talks that focused simply on how to get good data to feed our nifty signal processing algorithms.  kc talked about the difficulties of getting data in the first place. Feamster gave two talks, both of which dealt with BGP measurements and illustrated the difficulties in analyzing them.  Barford's talk was forward looking and described the sort of network measurements that can be expected to be collected on a regular basis in the not-too-distant future.

The talks and ensuing discussions made it very clear that in order for the different research communities to interact in a productive manner, there is a dire need for data sets that are labeled and documented by domain experts to minimize misuse or inappropriate analysis of available measurements by other communities. NSF has recognized some of the difficulties of even getting high quality data sets to researchers and has funded CAIDA to design an Internet Measurement Data Catalog (IMDC) to index the heterogeneous datasets (both publicly accessible and restricted usage) into a database that researchers can query to find relevant data to support their work. The IMDC includes annotation capabilities for users so that bugs, novel features, and other information about datasets can be shared by investigators with experience in analyzing a particular dataset. In addition to providing the fodder for new inquiries, the IMDC will also facilitate more robust science by documenting exactly the data used in a study and enabling others to reproduce published results.

3.3 Statistical Analysis and Networking Reality

Veitch described an identifiability problem in statistical modeling of network traffic that illustrates the benefits of accounting for known network structures as compared to traditional time series modeling. Karagiannis presented a statistical analysis of recent backbone traces and claimed that traffic in today's large backbone networks is no longer self-similar and more like Poisson over time scales relevant to the calculation of primary performance metrics such as delays, queue lengths, etc.  In stark contrast, relying on a more networking-centric analysis of trace data, Dovrolis showed that real network traffic exhibits pronounced burstiness over those time scales and is far from Poisson. This observation was supported by Sommers who gave a demo of a tool for visualizing network traffic dynamics over small time scales.  Broido's talk about spectroscopy methods applied to network traffic analysis also illustrated the presence of interesting phenomena of packet-level traffic over small time scales and tried to associate them with networking-specific mechanisms.  Hernandez-Campos talked about his work that tries to offer a more complete understanding of the interactions between the different forces behind Internet traffic dynamics.  Finally, Moulines focused again on statistical issues and compared the use of Fourier versus wavelet methods for estimating the long-memory or Hurst parameter.

These talks demonstrated that the analysis of network traffic over small time scales is still a wide open problem, but that substantial progress will only occur when statistical analysis techniques are combined with more network-centric approaches to produce results that are consistent with respect to other available and relevant measurements and agree with networking reality and intuition.

3.4 Looking Ahead

There were a number of talks that focused on existing trends and future problems. Hero described how network inference from traffic measurements can be placed in the context of general spatio-temporal analysis, dimension reduction, and classification. Patwari illustrated this idea in his talk about visualizing traffic anomalies using dimensionality reduction.  Madhyastha discussed the problem of wasted measurements in planned triggered measurement architectures and talked about ways to be more efficient.  Ron's talk focused on multiresolution representations and their relevance to analyzing network measurements.  Moses described inference problems (detection, estimation, and tracking) using distributed sensors and signal processing. Finally, Meyer's talk covered basic ideas of quantum computing and surveyed existing quantum algorithms that are potentially applicable to signal processing.

4. Feedback

An on-line anonymous feedback form was used to solicit comments from attendees. Selected comments are given below. The organizers are using this feedback to help plan the future of WISP.

   Comments utility/relevance of talks

"I found talks on signal processing methods and trends the most interesting."

 "All talks were relevant. The talks by Harsha and Neal have some relationship to my work and so are most useful to me."

 "For me, a signal processing person, the network talks were most useful - especially k c claffy, Craig Partridge, Paul Barford, Nick Feamster; they provided good insight that I didn't have prior to the workshop."

   Comments utility of workshop

"Overall, I thought the workshop was very good, though, and I learned quite a lot.  Thanks for giving me the opportunity to come."

 "For someone like me who wants to enter this research area, this was the perfect opportunity."

 "The opportunity to meet excellent networking researchers with an SP perspective."

 "Really nice to get everybody together and encourage discussion and exchange of ideas between the statistical/signal processing group and networking group of researcher."

 "It was a great opportunity for discussing among people doing research in the same area and for exchanging ideas about new directions and issues to solve."

 "High degree of audience participation and interest. A well-focused and active community of interest and an excellent selection of talks and readings."

 "The organization was great, and the group of attendees made for interesting discussions."

   Suggestions to improve future workshops

"Have a 30-60min discussion session with a mediator at the end of each morning and afternoon."

 "Maybe a discussion panel/open discussion of what the group believed were the emerging problems/trends would have been interesting."

 "I think WISP was a fantastic idea and would be really nice to continue  getting together once a year."

 "My biggest, strongest suggestion would be, let's do it again!"

 5. Next Steps

The organizing committee is currently discussing the format and organization for WISP 2005. It will take place in late summer or early fall and will be hosted on the University of Wisconsin - Madison campus. Dates are currently being considered within the context of other related events (e.g., SIGCOMM) and an announcement of WISP 2005 should be made in February 2005.