Debugging Nature

      Comments Off on Debugging Nature

nature.jpgToday’s post is a bit of a “how to” on debugging database specific problems with EZproxy. More to the point, it focuses on a command line utility that helps when troubleshooting problems of this sort if you’re using a Mac (there are similar tools—and more of them— for a Windows or Unix machine). First, a bit of background:

A user called to say that whenever he did a search on nature.com, despite having logged into our proxy server, he was prompted for a password when he asked to view the full text of an article. I configured my desktop machine to go through the proxy server and while doing a search, saw the same behavior. Clearly the proxy server was dropping out of the loop somewhere between hitting the search button and results page (I could see that the “Full Text” link did not carry our proxy server’s URL stem).

We’ve made Nature available to library users for years—without any issues, so clearly something had changed recently.  First stop was the ‘database-specific issues’ page on the EZproxy website. Finding no mention of Nature, I dug deeper. On my machine, I noticed that as the search results page loaded, several URLs streaked by on the browser’s status bar but they went by too fast for me to read them.
What I needed was a way to watch all the traffic going between my machine and nature.com. Not a big deal in the unix or Windows world, but finding just the right tool on a mac took a bit longer. I first tried bending tcpdump to my will but it was picking up all sorts of “extraneous” network traffic which made diagnosis difficult. There are likely command line switches that will reduce the reported traffic but I didn’t want to spend the morning reading documentation. I figured someone had developed a tool that was more tightly focused on web traffic (and not, for example, preoccupied with IP headers and local machine interaction). Turns out, the proper tool was a command line utility called tcpflow.

You can download a precompiled binary from entropy.ch or grab and compile your own from darwinports. Once installed, open a terminal window and type this command as root:

tcpflow -i en0 -c port 80

If you don’t have root enabled, use this instead:

sudo tcpflow -i en0 -c port 80

This assumes that your ethernet connection is on en0 (run ipconfig if you’re not sure) and that tcpflow is on your path. If you compile the darwinports version, it likely will be. If you use the precompiled binary, it gets installed in /usr/local/bin/tcpflow.

BeforeSo with a terminal window scrolling all my port 80 traffic, I fired up the browser again and went back to nature.com. Took a while to wade through the output but I finally found the culprit—lots of javascript involved on Nature’s site. It was clear that after calls to ad.doubleclick.net and search.atomz.com the nature site was losing track of our proxy server in the ‘Referer’ links it was passing around.

AfterI modified our EZproxy configuration files to add javascript support to Nature (e.g., HJ search.nature.com and so on). Voila! Things began working again. If you look at the Referer: block for each of these, you’ll see the ‘blue’ one (after adding HJ tags to our configuration file) correctly stores the proxy server’s URL.

As Yogi Berra once said, “you can see a lot just by observing.” In this particular case using tcpflow was the key to finding something to look at.