http://pages.cs.wisc.edu/~epaulson/cs764-spring12.html

Here are some footnotes to my CSx64 lecture on Big Data. There's no set order of what you should read, and it certainly isn't necessary to read everything or even anything. I just wanted to give some longer references to things I mentioned during the lecture if any of it sounded interesting.

General Overview/Jobs in the field

Jeff Hammerbacher: "Information Platforms and the Rise of the Data Scientist" This is Jeff's chapter in an O'Reilly book he edited titled 'Beautiful Data'. It's a good overview of how he gradually came to realize that they were doing something that felt different at Facebook, and that it might deserve a new name. Now, the role of a 'data scientist' is one of the hottest jobs in industry.
Information Platforms and the Rise of the Data Scientist

Steve Lohr of The New York Times: "For Today's Graduate, Just One Word: Statistics", Aug 5th 2009
"Data Science", Statistics, and "Big Data" are closely related. The Grey Lady included a story on how Statistics was turning into a hot field, and three years later, it still holds true.
http://www.nytimes.com/2009/08/06/technology/06stats.html

Big Data in Science

The Fourth Paradigm: Data-Intensive Scientific Discovery
This is the book I had on my slide, inspired by Jim Gray's work. It is a collection of essays by domain and computer scientists talking about their work in the earth, astronomical, and oceanographic sciences, biology and health care, computational infrastructure and architecture, and the future of the scientific method and scholarly communication. There's a free PDF version, or you can buy it in print or as an eBook. Jim's last talk before he was lost at sea is the first essay in the book, and is a nice short read.

Moshe Vardi, editor of 'Communications of the ACM', pushed back on the notion that "Data-intensive exploration" deserved to be a new pillar of science, and disputed that even simulation/computational science was nothing more than a facet of the two main branches of science, experimental and theoretical. He titled his 1-page essay "Science Has Only Two Legs" in the September 2009 'Communications of the ACM'

Bill Howe of the University of Washington is the "In-Ferro" coiner, as near as I can tell. Bill's idea was that "Hypotheses are increasingly tested by evaluating queries over massive datasets in secondary storage - in ferro experiments - rather than relying solely on in situ, in vitro, and in silico experiments as primary means of scientific discovery." (The 'in-ferro' name comes from finding answers from magnetic disks, i.e. spinning rust) His homepage lists his papers and projects, and I think it gives a nice overview of how computing can have an impact on science, beyond just running a simulation faster.

Wired wrote an outlandish article a few years back titled "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete". Like many things Wired writes, it makes over-the-top claims and predictions that can look hilariously bad a few years later. (They famously predicted in 1997 that we are watching the beginnings of a global economic boom on a scale never experienced before. and that the boom would last for 25 years. (The dot-com bust was four years later, 21 years early. And let's not even talk about 2008.)) Peter Norvig, director of research at Google (and, if you've taken AI, likely your textbook author) was a subject of the 'End of Theory' piece and felt compelled to to write a rebuttal. While I wouldn't take this Wired piece as gospel, it is still a fun read, and there are a dozen or so vignettes in the side bar that are much more reality-based and worth reading.

Speaking of Peter Norvig, he with some other Googlers/Washington Faculty wrote an essay titled The Unreasonable Effectiveness of Data for IEEE. It looks into how using large amounts of data can help attack some difficult problems in language processing and extracting data from the web. It's just a few pages long and has no equations at all and is a nice gentle introduction to the subject.

Big Data Science Projects

Interesting Applications

Paper references