What does local use of Sci-Hub look like?

      Comments Off on What does local use of Sci-Hub look like?

Product bundling thrives in markets with few competitive options.  Your cable company knows that and so do large academic publishers.   For years, they’ve sold collections of e-journals at a discount over what you’d pay to subscribe to each individual title in the bundle (though you probably wouldn’t if given the choice).  That’s a big deal, right?

But as the cost these bundles has risen–far outpacing inflation–libraries have begun looking for alternatives. Some are letting their “big deals” expire while others are developing strategies to help inform those looming (and often fraught) renewal decisions.

SPARC (the Scholarly Publishing and Academic Resources Coalition) has been carefully tracking this activity and their work provides an easy way to keep up-to-date on most aspects of this issue.

But one question I’ve had for some time is what sort of gravitational pull are sites like Sci-Hub or ResearchGate exerting on the already disrupted orbits of users, libraries and publishers?  Put another way, if researchers are satisfying their content needs outside the library/publisher channel, shouldn’t we factor that into our strategy around these big deals?

I realize I’m not the first to ask who’s using Sci-Hub.  Here are just a few of the many articles that get at this topic:

Each talks about usage activity and traffic patterns but in a way that is little more than anecdotal background noise if you’re trying to fashion a local strategy and need to focus on what your local users are actually doing.  Simply asking who’s using these sites poses all sorts of problems.

I finally settled on analyzing DNS queries to our campus nameservers as a reasonable metric.  When a user on our campus network points his browser at researchgate.net, our campus nameserver logs the transaction.  An imperfect measure to be sure (e.g., it ignores traffic to “shady” sites from off-campus affiliates using their ISP’s nameserver) but it does let me compare on-campus traffic to “pirate” sites with on-campus traffic to sites provided via our library’s subscriptions.

Mindful of privacy issues, I asked a friend in campus IT to take a list of 6 or 7 domains and derive an extract file from the DNS query logs, providing just date, time and query string for anything that matched the domain information I provided.  Here’s an excerpt of the result:

2019 07 10 13 53 29

Producing this extract is now part of a weekly cron job so I’ll be able to monitor the relative use of these sites over the coming months.  In this one particular instance, I can’t wait for the Fall term to begin…

So what did I find by monitoring DNS queries between July 3rd and July 13th?

The graph shows activity for users on the campus network.  A better name for this post might be, “What does local use of ResearchGate look like?”