notes

type "?" to learn navigation, HOME

[2017-12-05 Tue] Statistics and Elections   statistics

Statistics can be a powerful tool for identifying fraud in elections. One of my favorite examples comes from the 2011 Russian election. See the wikipedia article and this figure. The distribution of the votes has very abnormal peaks at every 5%.

The Honduran election that just happened is also suspect to fraud and the economist did a quick analysis to test for any sign of interference in the voting. Check out their article here for the details. But the gist of their work investigates changes in the distribution of voting from one day to the next, with the premise being that Hernández's party saw they were losing and stuffed the ballots near the end of voting. I'm curious to see what comes of this. To me it seems like a recount is in order.

Thank you statistics.

[2017-10-18 Wed] Flyer to get citizen help with urban forest research.

Screenshot 2017-12-05 19.18.51.png Screenshot 2017-12-05 19.19.02.png

This is a beautiful flyer created by Cheyenne to leave on the doors of houses who don't answer when we knock to find out when a nearby tree was removed. As of today we've had a couple responses that have given us the exact date trees were removed. Thank you Sara Sandberg and Mike Bussan!

[2017-10-12 Thu] Madison East AP Environmental Studies Field Trip

I got to help students in Madison East's AP Environmental studies on their field trip to the Madison School Forest. With 85 students and just one teacher, it was a big undertaking, but their teacher, Angie Wilcox-Hull, did an awesome job organizing.

They learned how identify common Wisconsin tree species and also did a lab on carbon in forests. Students used a clinometer and diameter at breast height tape to measure forest trees, they estimated carbon content of the trees, and they compared this to the carbon emissions caused by their transportation to and from school. As always it was great to work with high school students and there were a lot of great questions and points brought up. Here are four that were especially salient to me:

  1. Students realized that we used the equation of a cylindar to approximate the volume of a tree, but a cone is usually more appropriate.
  2. When we talked about finding the volume of wood in leaning trees, one student used his knowledge of calculus to tell me it wasn't quite so hard. See here. I wonder if foresters use that idea for leaning trees.
  3. Carbon storage is not the same as carbon sequestration
  4. While we measured individual trees, carbon stored per area of land may be more interesting for managers.
none
Figure 1: Being outside is a great part of doing a forestry lab. Photo: Angie Wilcox-Hull

[2017-05-24 Wed] Second Trip to Washington, DC for NASA's Biodiversity and Ecological Forecasting Team Meeting   nasa travel

none
Figure 2: National Museum of African American History and Culture

[2017-05-16 Tue] Shotgun Training

none
none
none
none
Figure 6: Zhihui

[2017-04-25 Tue] Collecting Urban Heat Island Data with Carly Ziter   UrbanHeatIsland

none

[2017-04-24 Mon] Using OpenBLAS to speed up matrix operations in R (linux)

I use the foreach and doParallel packages in R to speed up my work that can be easily parallelized. However, sometimes work can't be easily parallelized and things are slower than I'd like. An example of this might be fitting a single very large and complex model. Andy Finley, who resently stopped by UW-Madison to give a workshop on hierarchical modeling, taught us about OpenBLAS as a way to speed up matrix operations in R. Here are the notes about computing from the workshop.

BLAS is Basic Linear Algebra Subprograms. R and other higher level languages call BLAS to do matrix operations. There are other versions of BLAS, such as OpenBLAS, which are faster than the default BLAS that comes with R because they are able to take advantage of multiple cores in a machine. This is the extent of my knowledge on the topic.

Below is how I installed OpenBLAS locally on our linux server and pointed R to use the OpenBLAS instead of its default BLAS. A benchmark test follows.

Getting OpenBLAS

cd src                         # move to src directory to download source code
wget http://github.com/xianyi/OpenBLAS/archive/v0.2.19.tar.gz    # your version may be different
tar xzf v0.2.19.tar.gz
cd OpenBLAS-0.2.19/
make clean
make USE_OPENMP=1               #OPENMP is a threading library recommended by Andy Finley
mkdir /home/erker/local
make PREFIX=/home/erker/local install       # You will have to change your install location

Pointing R to use OpenBLAS

I have R installed in my ~/local directory. libRblas.so is the default BLAS that comes with R. For me it is located in ~/local/lib/R/lib. Getting R to use OpenBLAS is as simple as changing the name of the default BLAS and creating a link in its place that points to OpenBLAS:

mv libRblas.so libRblas_default.so
ln -s ~/local/lib/libopenblas.so libRblas.so

Deleting the link and reverting the name of the default BLAS, will make R use the default BLAS again. Something like:

rm libRblas.so
mv libRblas_default.so libRblas.so

Benchmark Test

I copied how to do this benchmark test from here. The benchmark test time was cut from about 146 to about 38 seconds on our server. This is a very significant speed up. Thank you OpenBLAS and Andy Finley.

  • Default BLAS
    curl http://r.research.att.com/benchmarks/R-benchmark-25.R -O
    cat R-benchmark-25.R | time R --slave
    
    Loading required package: Matrix
    Loading required package: SuppDists
    Warning messages:
    1: In remove("a", "b") : object 'a' not found
    2: In remove("a", "b") : object 'b' not found
    
    
    R Benchmark 2.5
    ===============
    Number of times each test is run__________________________:  3
    
    I. Matrix calculation
    ---------------------
    Creation, transp., deformation of a 2500x2500 matrix (sec):  0.671333333333333
    2400x2400 normal distributed random matrix ^1000____ (sec):  0.499666666666667
    Sorting of 7,000,000 random values__________________ (sec):  0.701666666666667
    2800x2800 cross-product matrix (b = a' * a)_________ (sec):  10.408
    Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  4.877
    --------------------------------------------
    Trimmed geom. mean (2 extremes eliminated):  1.31949354763381
    
    II. Matrix functions
    --------------------
    FFT over 2,400,000 random values____________________ (sec):  0.220333333333334
    Eigenvalues of a 640x640 random matrix______________ (sec):  0.717666666666664
    Determinant of a 2500x2500 random matrix____________ (sec):  3.127
    Cholesky decomposition of a 3000x3000 matrix________ (sec):  4.15
    Inverse of a 1600x1600 random matrix________________ (sec):  2.364
    --------------------------------------------
    Trimmed geom. mean (2 extremes eliminated):  1.74407855808281
    
    III. Programmation
    ------------------
    3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.503999999999981
    Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.259999999999991
    Grand common divisors of 400,000 pairs (recursion)__ (sec):  0.301000000000007
    Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.0393333333333317
    Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.305999999999983
    --------------------------------------------
    Trimmed geom. mean (2 extremes eliminated):  0.288239673174189
    
    
    Total time for all 15 tests_________________________ (sec):  29.147
    Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.87211888350174
    --- End of test ---
    
    144.64user 0.94system 2:25.59elapsed 99%CPU (0avgtext+0avgdata 454464maxresident)k
    0inputs+0outputs (0major+290577minor)pagefaults 0swaps
    
  • OpenBLAS
    cat R-benchmark-25.R | time R --slave
    
    Loading required package: Matrix
    Loading required package: SuppDists
    Warning messages:
    1: In remove("a", "b") : object 'a' not found
    2: In remove("a", "b") : object 'b' not found
    
    
    R Benchmark 2.5
    ===============
    Number of times each test is run__________________________:  3
    
    I. Matrix calculation
    ---------------------
    Creation, transp., deformation of a 2500x2500 matrix (sec):  0.689666666666667
    2400x2400 normal distributed random matrix ^1000____ (sec):  0.499
    Sorting of 7,000,000 random values__________________ (sec):  0.701
    2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.163000000000001
    Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  0.228
    --------------------------------------------
    Trimmed geom. mean (2 extremes eliminated):  0.428112796718245
    
    II. Matrix functions
    --------------------
    FFT over 2,400,000 random values____________________ (sec):  0.224333333333332
    Eigenvalues of a 640x640 random matrix______________ (sec):  1.35366666666667
    Determinant of a 2500x2500 random matrix____________ (sec):  0.140666666666667
    Cholesky decomposition of a 3000x3000 matrix________ (sec):  0.280333333333332
    Inverse of a 1600x1600 random matrix________________ (sec):  0.247000000000001
    --------------------------------------------
    Trimmed geom. mean (2 extremes eliminated):  0.249510313157146
    
    III. Programmation
    ------------------
    3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.505000000000001
    Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.259333333333333
    Grand common divisors of 400,000 pairs (recursion)__ (sec):  0.299333333333332
    Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.039333333333334
    Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.256999999999998
    --------------------------------------------
    Trimmed geom. mean (2 extremes eliminated):  0.271216130718114
    
    
    Total time for all 15 tests_________________________ (sec):  5.88666666666666
    Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.30712894095638
    --- End of test ---
    
    176.85user 12.20system 0:38.00elapsed 497%CPU (0avgtext+0avgdata 561188maxresident)k
    0inputs+0outputs (0major+320321minor)pagefaults 0swaps
    

Next things

From comments here, I have heard that OpenBLAS doesn't play well with foreach and doParallel. I will have to test these next. If it is an issue, I may have to include a shell code chunk in a literate program to change between BLAS libraries.

[2017-02-28 Tue] Application Essay: Catalyzing Advocacy in Science and Engineering: 2017 Workshop

I just applied to the CASE 2017 Workshop in Washington, DC. The application process led to some interesting thoughts, so I thought I'd share the essay.

Application

"How do we know the earth is 4.5 billion years old?" I loved asking my students this question when I taught high school science. The students (and I) were hard pressed to explain how we know this to be true. Most of us don't have the time to fully understand radiometric dating, let alone collect our own data from meteorites to verify the earth's age. So unless it's a topic we can investigate ourselves, we must simply trust that scientists are following the scientific method and evaluate their results within the context of our own experience.

Trust between scientists and the public is therefore the necessary foundation upon which our society accepts scientific research, incorporates it into policy, and supports more science. The communication of science's benefits to society maintains this trust. Unfortunately, the public and scientists disagree in many critical areas of research, such as genetic modification, climate change, evolution, vaccinations, and the age of the earth (1) (2). I believe scientists must do more to directly address these discrepancies.

As a scientist I have the incredible opportunity to conduct research that I think will improve society, and I'm honored that the public pays me to do it. I'm making a withdrawal from the bank of public trust and feel strongly that I need to pay it back with interest. I see scientific communication as the way to do so. Effective scientific communication goes way beyond publishing quality work in reputable journals and requires that we place our findings into the public consciousness. I have taught at the university and have led a few guest labs at an area high school, but I want to have a greater impact. The CASE 2017 workshop excites me with the opportunity to learn how to make this impact.

My hope is that CASE will orient me to the landscape of science advocacy, policy, and communication. Despite benefiting from federal funds for science, I am mostly ignorant of how our nation allocates resources to research, and I look forward to CASE demystifying this process. I hope to learn effective methods to communicate science with the public and to discuss with elected officials the value of research for crafting smart policy.

Because scientists understand their work best, they are best suited to advocate for it. CASE will provide a unique opportunity to learn how to be an advocate for science and a leader in strengthening the trust between the scientific community and the public whom we serve. If selected, I would like to work with the other selected graduate student and the graduate school's office of professional development to host a mini-workshop to bring the knowledge and skills from CASE to our campus. I'd like to replicate the Capitol Hill visits at a state level and work to get more graduate students engaged with elected officials from across the state.

[2016-10-27 Thu] OBSOLETE:Installing R, gdal, geos, and proj4 on UW Madison's Center for High Throughput Computing

NOTE

This post is obsolete. Use Docker as the chtc website now recommends

R is the language I use most often for my work. The spatial packages of R that I use very frequently like rgdal, rgeos, and gdalUtils depend on external software, namely gdal, proj4, and geos.

Here I show how I installed gdal, proj4, and geos on chtc, and pointed the R packages to these so that they install correctly.

The R part of this tutorial comes from chtc's website. Their site should be considered authoritative. I quote them heavily below. My effort here is to help people in the future (including myself) to install gdal etc. on chtc.

Create the interactive submit file. Mine is called interactive_BuildR.sub

I save it in a directory called "Learn_CHTC"

universe = vanilla
# Name the log file:
log = interactive.log

# Name the files where standard output and error should be saved:
output = process.out
error = process.err

# If you wish to compile code, you'll need the below lines.
#  Otherwise, LEAVE THEM OUT if you just want to interactively test!
+IsBuildJob = true
requirements = (OpSysAndVer =?= "SL6") && ( IsBuildSlot == true )

# Indicate all files that need to go into the interactive job session,
#  including any tar files that you prepared:
# transfer_input_files = R-3.2.5.tar.gz, gdal.tar.gz
# I comment out the transfer_input_files line because I download tar.gz's from compute node

# It's still important to request enough computing resources. The below
#  values are a good starting point, but consider your file sizes for an
#  estimate of "disk" and use any other information you might have
#  for "memory" and/or "cpus".
request_cpus = 1
request_memory = 1GB
request_disk = 1GB

queue

transfer interactive submit file to condor submit node

change erker to your username and if you don't use submit-3, change that too. You'll have to be inside the directory that contains "interactive_BuildR.sub" for this to work.

rsync -avz interactive_BuildR.sub erker@submit-3.chtc.wisc.edu:~/

log into submit node and submit job

ssh submit-3.chtc.wisc.edu
condor_submit -i interactive_BuildR.sub

wait for job to start

Installing GDAL, Proj4, Geos

Each install is slightly different, but follows the same pattern. This worked for me on this date, but may not work in the future.

  • GDAL: Download, configure, make, make install gdal, then tar it up
    wget http://download.osgeo.org/gdal/gdal-1.9.2.tar.gz # download gdal tarball
    tar -xzf gdal-1.9.2.tar.gz # unzip it
    mkdir gdal # create a directory to install gdal into
    dir_for_build=$(pwd) # create a variable to indicate this directory (gdal doesn't like relative paths)
    cd gdal-1.9.2 # go into the unzipped gdal directory
    ./autogen.sh # run autogen.sh
    ./configure --prefix=$dir_for_build/gdal # run configure, pointing gdal to be installed in the directory you just created (You'll have to change the path)
    make
    make install
    cd ..
    tar -czf gdal.tar.gz gdal #zip up your gdal installation to send back and forth between compute and submit nodes
    
  • Proj4: Download, configure, make, make install proj4 then tar it up
    wget https://github.com/OSGeo/proj.4/archive/master.zip
    unzip master.zip
    mkdir proj4
    cd proj.4-master
    ./autogen.sh
    ./configure --prefix=$dir_for_build/proj4
    make
    make install
    cd ..
    tar -czf proj4.tar.gz proj4
    
  • Geos:
    wget http://download.osgeo.org/geos/geos-3.6.0.tar.bz2
    tar -xjf geos-3.6.0.tar.bz2 # need to use the "j" argumnet because .bz2 not gz
    mkdir geos
    cd geos-3.6.0
    ./configure --prefix=$dir_for_build/geos # no autogen.sh
    make
    make install
    cd ..
    tar -czf geos.tar.gz geos
    
    

Add libs to LD_LIBRARY_PATH

I don't actually know what this path is exactly, but adding gdal/lib, proj4/lib, and geos/lib to the LD_LIBRARY_PATH resolved errors I had related to files not being found when installing in R. For rgdal the error was

Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/home/erker/R-3.2.5/library/rgdal/libs/rgdal.

and lines like this:

...
./proj_conf_test: error while loading shared libraries: libproj.so.12: cannot open shared object file: No such file or directory
...
proj_conf_test.c:3: error: conflicting types for 'pj_open_lib'
/home/erker/proj4/include/proj_api.h:169: note: previous declaration of 'pj_open_lib' was here
./proj_conf_test: error while loading shared libraries: libproj.so.12: cannot open shared object file: No such file or directory
...

For rgeos the error was

"configure: error: cannot run C compiled programs"

Run this to fix these errors

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(pwd)/gdal/lib:$(pwd)/proj4/lib # this is to install rgdal properly
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(pwd)/geos/lib # and rgeos

If you run:

echo $LD_LIBRARY_PATH

The output should look something like

:/var/lib/condor/execute/slot1/dir_2924969/gdal/lib:/var/lib/condor/execute/slot1/dir_2924969/proj4/lib:/var/lib/condor/execute/slot1/dir_2924969/geos/lib

R: download, untar and move into R source directory, configure, make, make install

As of [2016-10-25 Tue] R 3.3.0 or higher isn't supported on chtc

wget https://cran.r-project.org/src/base/R-3/R-3.2.5.tar.gz
tar -xzf R-3.2.5.tar.gz
cd R-3.2.5
./configure --prefix=$(pwd)
make
make install
cd ..

Install R packages

The installation steps above should have generated an R installation in the lib64 subdirectory of the installation directory. We can start R by typing the path to that installation, like so:

R-3.2.5/lib64/R/bin/R

This should open up an R console, which is how we're going to install any extra R libraries. Install each of the library packages your code needs by using R's install.packages command. Use HTTP, not HTTPS for your CRAN mirror. I always download from wustl, my alma mater. For rgdal and rgeos you need to point the package to gdal, proj4 and geos using configure.args

Change your vector of packages according to your needs.

install.packages('rgdal', type = "source", configure.args=c(
     paste0('--with-gdal-config=',getwd(),'/gdal/bin/gdal-config'),
     paste0('--with-proj-include=',getwd(),'/proj4/include'),
     paste0('--with-proj-lib=',getwd(),'/proj4/lib')))

install.packages("rgeos", type = "source", configure.args=c(paste0("--with-geos-config=",getwd(),"/geos/bin/geos-config")))

      install.packages(c("gdalUtils",
                         "mlr",
                         "broom",
                         "raster",
                         "plyr",
                         "ggplot2",
                         "dplyr",
                         "tidyr",
                         "stringr",
                         "foreach",
                         "doParallel",
                         "glcm",
                         "randomForest",
                         "kernlab",
                         "irace",
                         "parallelMap",
                         "e1071",
                         "FSelector",
                         "lubridate",
                         "adabag",
                         "gbm"))

Exit R when packages installed

q()

Edit the R executable

nano R-3.2.5/lib64/R/bin/R

The above will open up the main R executable. You will need to change the first line, from something like:

R_HOME_DIR=/var/lib/condor/execute/slot1/dir_554715/R-3.1.0/lib64/R

to

R_HOME_DIR=$(pwd)/R

Save and close the file. (In nano, this will be CTRL-O, followed by CTRL-X.)

Move R installation to main directory and Tar so that it will be returned to submit node

mv R-3.2.5/lib64/R ./
tar -czvf R.tar.gz R/

Exit the interactive job

exit

Upon exiting, the tar.gz files created should be sent back to your submit node

[2016-09-23 Fri] Cool Science Image contest

none
Figure 8: MNF transformation of AVIRIS hyperspectral imagery over lakes Mendota, Monona, and Wingra

I created this image of Madison's lakes using hyperspectral imagery from NASA's AVIRIS sensor for the Cool Science Image Contest. I threw it together the week before the contest and was very pleased to be selected, but I wish that it had been more related to the science that I do. It is a minimum noise fraction transformation which is a way to transform/condense the data from the ~250 bands into the 3 visible channels (rgb) for maximum information viewing. Originally I intended to create an image over land, but had great difficulty getting the mosaicing of the 3 flightlines to be seamless. You can see the band across the northern part of lake Mendota from fox bluff to warner bay that is due to image processing, not something real in the water. The image is no doubt cool, but I wish I could say more what the colors meant (If you're a limnologist and see some meaning, please let me know). I think that pink may be related to sand, and green to bright reflections on the water. There's probably some algae detection going on too. My goal for next year is to make an image that is heavier on the science and still very cool.

[2016-09-20 Tue] Field work in northern Wisconsin

Field work provides the opportunity to be outside, help out on lab-wide projects, and to learn about new research that isn't exactly in my wheelhouse. September 8-10 I went to the north woods to help collect foliar samples as part of a NEON and Townsend lab project to ultimately predict foliar traits such as morphology, pigments, and other chemical constituents from hyperspectral imagery to create maps of these traits. This was the first year of a five year project. There's much more to the science behind the goal. But the aim of this post is not to explain all that, but rather, to share some images and the joy of being in the north woods.

none
Figure 9: Trout Lake Research Station, our lodging
none
Figure 10: Jablonski grilling Aditya's Famous Chicken
none
Figure 11: Always excited for field work
none
Figure 12: Always excited for field work
none
Figure 13: Aditya fake shooting leaves (for retrieval)
none
Figure 14: John fake writing
none
Figure 15: Larch Stand
none
Figure 16: NEON's Flux Tower. Measuring the exhange of carbon between atmosphere and biosphere. Sweet.

Flux tower of Ankur Desai's research group, much smaller than NEON's. Maples creating lovely dappled light.

[2016-08-02 Tue] Making this website   orgmode

I use emacs org-mode as the core application for my research. It makes sense to use the great org publishing features to create a website without having to learn many new skills. I had considered using jekyll, but ultimately realized that I could make a website that is just as beautiful and functional with emacs org-mode.

I've looked at tons of websites made with org-mode. I like Eric Schulte's best for an academic personal page, and I wanted to use the org-info.js for a blog with keyboard shortcuts for navigation and search.

If you're not familiar with org mode, check it out.

If you are already familiar with org mode, spend twenty minutes reading about exporting to html and publishing. The manual is pretty clear. Once you have a published webpage, check out some css stylesheets from other org sites that you like. Mine is a modified version of the stylesheet of eric schulte, who I asked permission from to use.

I spent no more than 3 hours setting up the site. Deciding that this was the approach I wanted to take and generating the content took a couple days.

You can clone the github repo to see how I have it set up.

It is great to be able to work on the content of the website in a very familiar way and export it to the internet with one command. Amazing.

[2016-08-02 Tue] Trip to Washington, DC for NASA's Biodiversity and Ecological Forecasting Team Meeting   nasa travel

none
Figure 17: Albert Einstein Memorial

[2016-08-01 Mon] Removing Stuck Aluminum Seatpost from a Steel Frame   bike

This document is created using Org-mode and Org-babel. Eric Schulte is inspiration. The original plain-text document is available at index.org