Why do we still have referrers? They don't allow us to do anything that we would...

JohnTHaller · on Dec 11, 2013

For lots of us using basic CDN services, we enable referrer checks to ensure that folks aren't hotlinking images or direct linking downloads from other sites. These CDNs allow basic blocking based on referrers. You usually set it to only permit when there is a referrer from your own domain as well as blank referrers (if the CDN supports it) since most privacy conscious folks will disable referrer rather than fake it. We don't actually care that you don't provide a referrer, we just don't want other sites using our images in their own pages (leeching our bandwidth we pay for) or direct linking to binary downloads (bypassing our site with our advertising or revenue possibilities or branding but using our bandwidth).

chavesn · on Dec 11, 2013

I can think of one more; some sites use referrer to allow you to bypass a paywall if (and only if) you came from search results.

The problem is, all of these "features" allowed by referrers are user-hostile actions.

If referrers went away tomorrow, users wouldn't notice the difference or care. Publishers would get angry and think "we can't milk our content/visitors for as much money anymore!" But that doesn't really change the relationship with the customers who value your product or business so I personally can't believe it will make a sizable difference in the end.

JohnTHaller · on Dec 11, 2013

It's only hostile to sites that try and steal bandwidth resources by hotlinking/leeching images or direct linking downloads. It's not about milking visitors. It's about preventing unethical behavior by other sites.

I've spent 10s of thousands of dollars hosting free and open source software for millions of people over the years and I make sure to prevent bandwidth theft from other sites that cut into my ability to provide that service. Ad revenue (all responsible ads... no popups, no sound, etc) doesn't cover the cost of hosting and bandwidth even when sites are prevented from being unethical.

Take away referrers and it will be replaced by more complicated technology that serves the same purpose. CDN providers have secure links, for instance, that use an API to allow sites to generate a one-time use or limited time window link for a download from the CDN of a given file. It's more complex, but it's what I'd switch to tomorrow for downloads if referrers went away.

DerpDerpDerp · on Dec 11, 2013

So it sounds like there's already a solution to your problem that doesn't require leaking privacy all over the internet, since presumably the one-time links are on request from a page on your website, and tells you nothing but someone on your website wanted something from your website.

How is this a bad thing? How would removing referrals harm you in any way?

JohnTHaller · on Dec 11, 2013

Well, it would only work for downloads, not images. And it would require dynamic sites, so static sites couldn't take advantage of it. And it would require some coding ability as opposed to just knowing how to upload a file and point to it. So, it would mean expending additional resources programming-wise just to keep files moving instead of doing whatever actual service we're really working on, since we don't have money to throw at it. And it would increase the load on the website server, too, which would require additional resources, which means more money.

For images, the whole point of a CDN is to keep them in one place with a long expiry (a week or more) possibly downloaded from a nice geographically close edge node so that visitors load the images very quickly once and then cache them for the next pages and later visits of the site. The only current way CDNs implement of keeping folks from leeching/hotlinking images is to check referrals. The unique download link bit would negate the whole benefit of the CDN (you'd lose caching and the back and forth to generate the unique URL would slow it down), so that's out. Basically, lots of folks would ditch CDNs and host internally, possibly using server log checking to see if said IP recently hit a page. Otherwise they have to deal with lots of bandwidth leeches. The end result would be slowing down visitors' experience.

So, for both images and downloads, users wind up losing if referrals go away. It's far better to just leave it as is. Enable referrals by default. Let the privacy conscious disable them (sending blank ones). And build systems to take into account both userbases. Again, as a software developer, publisher, and host, I don't really care about referrals in terms of violating privacy, so I don't care if you disable them and send blank ones. I purposely set up my redirects and CDNs to allow for that. I care about them in terms of continuing to deliver services effectively to my users without competitors stealing my resources.

mintplant · on Dec 11, 2013

What if the page containing a one-time link was cached, but the resource itself was not? The "secure link" solution doesn't seem to work in all cases.

JohnTHaller · on Dec 11, 2013

Correct. And that end user would have to reload the page or clear cache. It's not as effective as just checking the referral. But without referrals, it would be what we'd have to resort to.

AnthonyMouse · on Dec 12, 2013

> For lots of us using basic CDN services, we enable referrer checks to ensure that folks aren't hotlinking images or direct linking downloads from other sites.

This is actually an interesting problem, because it's already solved but most people aren't using the solution: If you have a large file do distribute to a large number of people without authentication, use BitTorrent. As far as I can see there are two primary impediments to this:

A) Most browsers can't by default download large files P2P. You can actually write a BitTorrent client in javascript using Web Sockets if you really want to, but that's just horrible. What would be really nice is to be able to just e.g. embed a video into a webpage using a magnet link. There is no technical reason why this couldn't be implemented and rightly should be for large files.

B) Images are exactly the wrong size. They're big enough that you can't just ignore hotlinking but not big enough that you want to pay the overhead of connecting to 50 different peers instead of one to get a good transfer rate. But that just requires some adjustments to the protocol; if you're looking for realtime retrieval for display in a webpage you would probably want to use UDP and then use erasure coding to deal with slow/broken peers and packet loss. If you have a 60KB image, you can send a ~50 byte packet to each of a dozen peers and have ten of them each send 6KB (approximately four packets) to the target with 6KB worth of erasure bits from each of the others (which also allows the image to be constructed once 60KB of data is received in total from any collection of peers), and now the image is costing you ~600 bytes instead of 60KB. And if the image hasn't been received in 150ms, add more peers.

JohnTHaller · on Dec 12, 2013

Our software is used from portable devices (usually usb) as users move between computers (PortableApps.com). As such, using bittorrent would be a technical option within our platform's app store/updater. It would, however, get our platform banned/blocked at many companies and universities that have policies forbidding bittorrent use. (And we can say all day long that it's a legit protocol with lots of legit uses like downloading linux ISOs, it doesn't change the facts and policies on the ground.) Additionally, most users will be behind NATs that they can't poke a hole through to be able to properly share.

As for images, the bittorrent protocol would just be way too slow even with some changes when compared to HTTP with SPDY and all the internal tweaks done at the geographically close CDN edge nodes to make them as fast as possible. 150ms before adding a new peer is an eternity in an age when 47% of people expect a web page to load in 2 seconds or less and the abandon rate increases with each second that passes with 40% abandoning at a little after the 3 second mark.

toomuchtodo · on Dec 12, 2013

Couldn't you use an inexpensive CDN like Cloudflare? Your origin would see transfer to Cloudflare, but they'd be able to offload the majority of your download traffic. You also wouldn't be charged per GB as happens with Cloudfront.

JohnTHaller · on Dec 13, 2013

I'm not that familiar with Cloudflare and unsure how well it would work with a Drupal-based site. They do appear to have a module ( https://drupal.org/project/cloudflare ) but it seems like it is a beta and hasn't been developed in a few months. It does seem like it would be more expensive for the level with an SLA ($200 for business plan) than the CDN we use now (which is $79 a month for our images and includes 1TB of bandwidth, about what we need. excluding binary downloads, of course). Do you have any direct experience with CloudFlare?

toomuchtodo · on Dec 13, 2013

I haven't used it with Drupal before; we're using it to cache a read-only JSON service in the event of failure, heavy load, etc. We've been pretty happy with them. You could always try their free or $20/month plan on a separate subdomain.

mike-cardwell · on Dec 11, 2013

There is a trivial solution to this. Introduce a new HTTP response header 6 months before phasing out the Referer header. This header would be optionally delivered with content and would specify which third party domains are allowed to access the content. Perhaps Content-Security-Policy could be extended for this purpose.

JohnTHaller · on Dec 11, 2013

Sure. And have it on by default with a correct content security policy. If it were off by default, it wouldn't be used by most folks and the bandwidth thieves would be content hotlinking images and direct linking binaries, just ignoring the small percentage of users who turned it on.

Of course, even if this was released today and referrers were phased out in June 2014. We'd still be able to use them for at least 5 years until you could safely assume that they were gone. Likely longer.

mike-cardwell · on Dec 11, 2013

If they released support for this header with Firefox and Chrome, almost immediately, people wouldn't bother hotlinking to sites which utilise it because a good proportion of their users wouldn't be able to see the content at all.

leephillips · on Dec 11, 2013

You ask, "Why do we still have referrers?", but then you answer your question: "as a web developer, it's useful to be able to see where people came from."

You are of course correct that we don't have a "right" to this information. But I've discovered, many times, through the referrers in my logs, links to my pages from some very interesting places that I might not have discovered otherwise (because the link information that Google discloses is woefully incomplete).

Any user who wants to hide referrer information can easily to do in a variety of ways. For example, I wrote a bookmarklet that does this for you: http://lee-phillips.org/norefBookmarklet/

mike-cardwell · on Dec 11, 2013

Defaults are important. Most users of the web don't know referrers exist.

It's irrelevant how useful you find the information. You'd probably find it useful to know the name and email address of everyone that visits your site too... So?

j_s · on Dec 11, 2013

Google is already killing referrer on search results (by redirecting), to force people to pay for Analytics

andrenotgiant · on Dec 11, 2013

What? That is not true. Paying for the Google Analytic premium products gets you no additional data on referrer or search terms.

AdWords paid search clicks still send the info, if that's what you're talking about.

But this discussion is on completely removing referrers, not just stripping search keyword.

ceejayoz · on Dec 11, 2013

I'd pay for an intermediate level of GA in a heartbeat.

Right now, once you hit 10M pageviews a month you either have to sample or pay $150k/year for Premium.

I don't need support, an account manager, four-hour turnaround on data, an SLA, etc. I just need more pageviews sometimes.

richbradshaw · on Dec 11, 2013

Set up 2 GA accounts, then in your embed give even IP addresses version 1, and odd ones version 2, then you can track 20M pageviews. Admittedly it's a little rubbish. Guess you could use the API to combine them again for your own backend use.

ceejayoz · on Dec 11, 2013

Clients aren't huge fans of the "check both of these accounts and sum them together" approach, and ours like direct access so the API isn't a great solution.

I've heard that running a $10/month AdWords campaign gets you higher caps, but it may be an internet old wives tale.

dangayle · on Dec 11, 2013

Hate to burst your bubble, but even the premium GA uses samples. They don't give you a firehose of real data.

ceejayoz · on Dec 12, 2013

Some of the reports, yes, but not overall pageviews. Per their capabilities page:

> 1 billion hits per month

> Up to 3 million rows of data in unsampled reports

If you're doing conversion tracking etc. you're going to start getting sampled data at some point, but it's the pageviews our folks care for.

reginaldjcooper · on Dec 11, 2013

I use a RefControl and just don't send the header, although I feel that's close to being the least of my worries.

I just thought I'm already using NoScript, AdBlock, RequestPolicy, BetterPrivacy, Cookie Monster, Blender, and HTTPS-Everywhere; might as well go all-in.