Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Using a Hosts File To Make The Internet Not Suck As Much (someonewhocares.org)
100 points by heelhook on Feb 22, 2013 | hide | past | favorite | 97 comments


Think twice before using someone else's hosts file, unless you can actually review every entry. Not that I really think anything funny is happening here, but it is a huge security risk to use non-authoritative DNS like this by taking someone else on their word that their hosts file is legit.


If every entry routes to 127.0.0.1, what's the worst that could happen?


One can also null route them as well via 0.0.0.0. Hosts tend to die instantly that way versus timing out with 127.0.0.1


Years ago I maintained a popular ad blocking hosts file. I got a lot of complaints when I switched from 127.0.0.1 to 0.0.0.0. Some TCPIP stacks just didn't like it. Others had little webservers running on 127.0.0.1 that quickly served up 404's for speed or black jpegs to make things pretty. Someone out there maintained a little httpd for windows that did just that. If it saw a GET for blahblah.jpg it would serve up a black graphic of some random size. It sure beat the default FF or IE error message.

Honestly, there's no need for this stuff in the age of browser based ad blocking. I gave up on it when I saw how easy it was to write rules and wildcards in ad block plus. Interest in it fell. I'm surprised to see one still maintained.


Routing to 127.0.0.1 is dumb for this reason, especially since you may be running something on port 80.


Unless you have something listening on the ports that doesn't respond or a firewall that is dropping packets on the loopback interface, "connection refused" is returned right away.


Here is the 0.0.0.0 version: http://someonewhocares.org/hosts/zero/


If you were malevolent and stupid enough, you could 'remotely' hack yourself (and set your own HP printer on fire).


What's the concern? It's pretty easy to see all entries point to 127.0.0.1.


You've just illustrated his point because not all entries do point to loopback.


Looking at the body of the text, rather than the head/footer, and excluding comments lines, here is what I get with counts:

  [~]$ cat hosts_example.txt | grep -v '#' | awk '{print $1}'| sort | uniq -c
       56 
        1 ::1
     9674 127.0.0.1
        1 255.255.255.255
        1 fe80::1%lo0


if you pass it through grep, you can see that virtually every non-comment line refers to 127.0.0.1:

    ~/tmp% grep -v "^[[:space:]]*#" fred.txt | grep -v "^[[:space:]]*$" | grep -v 127.0.0.1
    books
    guestbook
    hosts
      text file
      rss feed
      0.0.0.0
      0 text file
      old macs
      0 old macs
    math links
    origami
    photos
    polls
    siteoftheday
    home
     Hosted by:
      theorem.ca
    how to make the internet not suck (as much)
    255.255.255.255	broadcasthost
    ::1		localhost
    fe80::1%lo0     localhost
    Fri Feb 22 08:05:36 2013top
    ~/tmp%
And I just realised now that the junk at the top of the output comes from me doing a Select All on the web page to highlight everything first, and it picked up the headers! Ooops. So there's in fact 3 non-comment lines that don't refer to localhost.


Wary of self-promotion here (sorry) but here's a gitHub repo that amalgamates and de-dupes recent hosts files from mvps.org, someonewhocares.org, and malwaredomainlist.com.

https://github.com/StevenBlack/hosts


awesome, I haven't been using mvps.org or malwaredomainlist, I'll have to add those into mine. Thanks


This is sweet. Good looking out.


I see a load of porn sites. Just how does this Make The Internet Not Suck? (damn it's even more annoying to write incorrect casing than read it.) I thought this was going to block the fifty "Share this! Email this!" buttons on websites or something... If you don't visit porn websites, you won't visit these domains anyway; what's the point in blocking them in a huge hosts file?


Shock sites (like the comments in the file call them) is probably more accurate word for the sites listed in the beginning.

And the majority of host names below them are for tracking and ad sites.


You never know where a URL redirection service (e.g., bit.ly) is going to send you.


"Just how does this Make The Internet Not Suck? (damn it's even more annoying to write incorrect casing than read it.) "

With the exception of "The" the titling is just a proper use of titling case. Do you find it annoying to read newspapers? I imagine it is the awkward phrasing that you (and I) find annoying.


Shock sites (like the comments in the file call them) is probably more accurate word for the sites listed in the beginning.

And the majority of host names there are for tracking and ad sites.


I once had to troubleshoot why a Windows (XP I believe) computer was taking so long to browse the web. Turns out, it had a hosts file like this (generated from Spybot, IIRC). Cleaned out the hosts file, and it ran much faster.

Just something to think about when using massive hosts files.


I tested exactly this recently and a hosts file dramatically helps browsing performance.

It's true there is an up-front cost paid when first loading it, and for the initial domain lookups. After initial lookups, DNS cache takes over.

The real benefit comes because countless requests for ads, tracking scripts and counters, etc, never leave your box. Network traffic is greatly reduced.

A comprehensive hosts file dramatically improved the performance of a relative's dialup internet connection a few years ago. A web that was borderline unusable became much snappier. And web pages had almost no ad adornments.

So on balance this is a very good thing. YMMV.


Yeah it was definitely Spybot. That was the first thing I thought of when I saw this. For the time, it was a good enough workaround in my experience. I don't really remember having network slowdown, but I may have been used to it since I think I was still on a blazing 56k modem at that point ;-)


Also did this. Once a domain is looked up and in the local DNS cache it's fine, but the initial DNS lookup can get stupidly slow, especially on older hardware.


XP must not of being storing them in a decent data structure. Geez. I wonder if this is the case for windows 7?


The more I think about it, the more I think it may have been Vista. Still doesn't bode well for Win7.


I somehow noticed it included stat.livejournal.com in its list, which is funny because they must've assumed it's something related to statistics when in fact that's where static resources were hosted. (A user had already grabbed "static.livejournal.com".)


In Australia the government went on this rampage of attempting to "protect" people through censorship. Perhaps a better solution would have simply been to ask Microsoft and Apple to distribute computers with a pre-loaded host file. Nice and easy to opt out, all while providing some decent protection from obvious silly sites.


Governments don't know enough about the internet to know what a hosts file is and I don't think we want them to.


Naw. /etc/hosts can be explained to a government.

But this kind of thing doesn't give your average government nearly enough power. So: not interested.


Or they could declare another moral panic, and start kicking down doors and arresting people for the contents of their hosts file: "Bu-bu-but I didn't visit any of those sites!" - "Yeah, right - tell it to the judge, perv".


I'd recommend Gas Mask for anyone editing their hosts file:

(Plug for own site with similar stuff): http://pineapple.io/resources/gas-mask

Gas mas direct link: https://code.google.com/p/gmask/


I tried Gas Mask a couple weeks ago, but it crashed every time I tried to edit the file list. I'm running Mountain Lion. At the time, I was busy on a project, so I didn't have any time to troubleshoot. I just gave up and uninstalled it.

It looks to be very useful so hopefully I can get it working sometime soon or at least be able to provide some crash feedback.


For windows use http://www.abelhadigital.com/hostsman (free, but closed source). It also has a list of popular 3rd party supplied host files to block unwanted sites, with options to update them


For linux, you can use vim.


for domains you don't want to route try 192.0.2.x - that range is reserved for examples and won't end up with spurious requests to stuff you're running on localhost.


Will browsers immediately fail the connection, or will it try to make the connection and wait for a timeout? I generally use 0.42.42.42 because Class-A addresses can't start with 0, so they fail immediately.


If you're running a web server locally on port 80, the webserver will handle the request. If no service is listening on port 80 your networking stack will not be able to open a tcp connection and it will be actively refused, unless you're using some kind of terrible software firewall.


Using Firefox in Windows 7, it takes 4-5 seconds to fail. The other address I gave fails so fast sometimes the favicon doesn't even blink.


IIRC, addresses 0/8 addresses are for multicast, so depending on your and your upstream's router configuration, that potentially sends those requests to a weird set of hosts on your class-A.


Multicast is 224.0.0.0/4 (224-239). 0/8 should only be used as a source address for "local network" traffic and never a destination.


This would depend on your network setup - it seems unlikely that the OS would automatically refuse to route a request to there. On OS X:

    $ curl --connect-timeout 5 -v 192.0.2.1
    * About to connect() to 192.0.2.1 port 80 (#0)
    *   Trying 192.0.2.1...
    * Connection timed out after 5011 milliseconds
    * Closing connection 0
    curl: (28) Connection timed out after 5011 milliseconds
So your browser will likely end up trying the connection and timing out after a while.


it does wait for a timeout, so "route" was poorly chosen. it just wont get anywhere on most configurations because they respect the 192.0.2.x is examples only convention.


Another is the MVPs hosts file. http://winhelp2002.mvps.org/hosts.htm

Several commenters appear not to "get" the benefit of a comprehensive and recent hosts file.

With a good hosts file, many well-known trojans and viruses can't phone home.


I like the idea of this, just deciding how to balance it with my own hosts file that has local entries vs keeping this one updated (presuming it will be)

Interesting that something that looks to have taken a fair bit of work and is moderately useful hasn't had a comment on it.



Neat, thanks!


I have a script setup in my router (TomatoUSB) that downloads a known hosts list and maps all the domains to 0.0.0.0 to block ads and other bad content for all the devices on my network.

Generally, I enjoy not seeing all the ads, but some sites (Slickdeals, I'm looking at you) make such extensive use of affiliate and ads sites that it makes the site barely useable. For slickdeals I ended up making a Chrome extensions that finds URL encoded URLs within a query parameter and redirects to that site directly instead of the affiliate link.


I like this approach, although I don't have a router that I can flash with TomatoUSB. Instead I setup adsuck[1] which is a DNS blacklisting daemon. It can be setup on individual machines, or on the perimeter like in your case. I prefer the DNS approach as hosts files can grow to be quite lengthy (article proves it) and can in certain cases cause performance issues, as parsing several KBs of entries prior to sending a DNS request is noticeable.

[1] https://opensource.conformal.com/wiki/adsuck


Jesus. Way to thread a needle with a sledgehammer.


Or run your own caching recursive DNS server, and have it cache entries for as long as possible.

Google DNS, Level 3 DNS can't shake a stick at my local cache, which is off course to be expected.


Care to link a tutorial on how to do this?


Depending on your OS, it is possible there is already something like this in place. IIRC, Ubuntu shipped with dnsmasq [1] at some point, thus maintaining a local DNS cache which in turn speeds up browsing. Otherwise, it's as easy as installing dnsmasq via your preferred package manager, and configuring dnsmasq.conf / resolv.conf

dnsmasq.conf needs to point to the DNS server you wish to use resolv.conf needs a line "nameserver 127.0.0.1" or whatever IP dnsmasq is listening on.

Make sure dnsmasq is started at boot, and that your resolv.conf isn't overwritten by your dhcp client, and you're good to go. Further configuration needs to done to make dnsmasq provide authoritative answers for other domains.

If you don't need caching, then use adsuck (linked it in my previous comment in this thread).

[1] http://www.thekelleys.org.uk/dnsmasq/doc.html


I personally run a local caching server on my home network. Rather than locally on my personal machine. That way I get the best of two worlds, when I move my laptop it will pick up new DNS fast and easy (especially if it is required for captive portals and the like) and at home I get fast speeds over the local network.

I run unbound, I don't have a tutorial or anything like that, mainly because I don't know what OS you are running or where you want to do this.

My personal unbound also connects to a locally hosted nsd, which is used to host the zone network.lan. On network.lan is where all of my hosts live, it is where I have entries for various of my internal servers, as well as all of my test sites.


Be aware that this blocks Google Analytics and similar services. IMO it's a really bad idea to encourage users to block these services. For example, they are used by pretty much every software company to IMPROVE the product, and if they can't get data on what people use the most in their product, it may not get the attention it needs.


The reality is that so few people will actually install a hosts file like this, that it's not going to make an appreciable difference in any statistics.


I would imagine it could put a significant dent in the already very small percentage of people using linux. Thus under representing it even further. Not that I'm too worried about it though.

edit: Looking through the current version of the host file, it appears google analytics and some other "non-nefarious" tracking sites are commented out.


That is true, but it still conflicts with the purpose of this file, which is to "Make the internet suck less", while it instead, in the long term, may make the internet suck more. I don't see how blocking them makes it sucks less anyhow?


Could someone post the same approach, but using bind (DNS), handling all those entries in one zone? Is it possible?


I run BIND at home for my network and take advantage of that to do the following: https://isc.sans.edu/diary/Easy+DNS+BIND+Sinkhole+Setup/7930

I also have a webserver that any black holed domain is sent to so that there is no waiting for a web request to time out. The first time I put it in place I didn't have the webserver so the performance when hitting one of these was horrible. I also use it for any ad server or analytic site I want to permanently block.

This way I don't have to distribute and update a hosts file to every machine in the house and I control where the offending sites redirect too.


Exactly this.

I have been creating master zones on my named.conf and pointing them to a blocked.txt file. My method is more cumbersome, the one you linked is very streamlined. Excellent, and thanks!


I also made a little thing for adding additional zones to the list when I was playing a little with Bootstrap. I wrote a little page to add, enable and disable (if I blocked something important) entries. Then a cron job to pull the enabled domains, build the conf file and reload BIND.


Yes, it's actually pretty easy.

First, create a zone file that all of the domains will share. I put mine in /var/named/master/dummy:

    $TTL    1d
    @               IN      SOA     ns1.localdomain.       hostmaster.ns1.localdomain. (
                            2012100601 ; serial
                            8h ; refresh
                            2h ; retry
                            7d ; expire
                            1h ; default_ttl
                            )
    ;
    ; Name servers
    ;
    @               IN      NS      ns1.localdomain.
    @               IN      NS      ns2.localdomain.
    ;
    ; Host addresses
    ; Leave commented to return NXDOMAIN
    ; Uncomment to resolve to IP address
    ;@               IN      A      127.0.0.1
I prefer not resolve these hosts at all (NXDOMAIN), because it seems to be faster and I don't want client machines to probe themselves, but you can uncomment the A record and use whatever IP address you want (e.g. for sinkhole monitoring). Remember to increment the serial number with every edit (which will be rare or never, once you've set it to your liking).

Next, create a simple file with each domain you want to block on one line. I put mine in /var/named/dummy:

    ads.example.com
    tracker.example.com
    example.org
Now create a conf file in the format bind expects, pointing every domain to the zone file (for convenience, put this in /var/named/Makefile in a 'dummy' target):

    sed 's/.*/zone "&" { type master; file "master\/dummy"; };/' < dummy > dummy.conf
Which will result in /var/named/dummy.conf containing:

    zone "ads.example.com" { type master; file "master/dummy"; };
    zone "tracker.example.com" { type master; file "master/dummy"; };
    zone "example.org" { type master; file "master/dummy"; };
Finally, add to your named.conf:

    include "/var/named/dummy.conf";
Restart bind and you're now authoritative for those zones on your network!


Bind is really confusing.

One of these days I'm gonna roll my own DNS server in Python with a sane configuration syntax.


Is there an IP block lists for Servers ? I see a lot of traffic from ip addresses that seem to probe my measly AWS EC2 instance and I wish there was a list I could feed to my NGINX or IPtables so they would just drop those packets .


Why would you want to block search engines and other crawlers?


I love the search engines ; its the other type of automated crawlers -- its kind of each to spot them by the urls access logs. Once a flaw/vulnerability is publicized, you can see script kiddies' attempting to use that on your site


I see entries to statcounter.com in there. I love that site!


There are frequent updates to this site, as new sites get added so often. In order to keep up with it, I wrote a lil script to download this and make it my host file, every so often (once a month in chron) saving the old one as a backup, of course. If I remember, I'll post it up on github this evening.

edit: I'll also add that this host file blocks adds on youtube, and pandora (and probably several others). I never wait for ads there anymore.


Does any one else think a big host file slows down browsing? It there any way to fix that beside flushing the cache at random times?


it's hella faster than making a network connection for DNS


But these are all routed to localhost, so they'll just spend ages timing out...


HostsMan is a pretty decent Hosts file manager for Windows, has some automation that can help with longer lists, also can update or auto merge lists from the web. (I'm not the author of this application, but I have used it).

http://www.abelhadigital.com/hostsman


Anyone got a version that is only blocking malicious sites? I don't want to block ads and stuff.


Why not just delete the entries which block ads?


Dont know about the list linked directly, but I have used hostfiles to block plenty of sites. I avoid timeouts by running a small http service on localhost so that your browser can get a quick 404 response instead of a timeout..


I used to run a local server as well until I saw a recommendation to use 0.0.0.0. Requests now fail instantly and I don't have to run a local server just for blocked domains.

There is one thing I do miss: being able to log requests and seeing which domains were accessed the most.


I started maintaining a hosts file (simple heuristic: if I see the browser status bar is waiting for a given domain, it gets added to the hosts file), but at some point it was less maintenance effort to simply install adblock.


I'd prefer using Disconnect and Adblock - I usually allow the ads on sites I read daily - at least they receive the impressions, even if I don't click on them.

Facebook and Twitter buttons on the other hand - sorry guys, you stay blocked.


There are some really poor ideas in there like blocking the domains of affiliate marketing networks. That won't just block ads, it'll prevent you from clicking through from affiliate sites to vendors' websites.


I may be in the minority, but I like that. If I really want to get to the vendor site I will type in the URL directly, this is generally safer than clicking links anyway.


I think that most users will be a lot lazier and less informed than you


Can most of this be accomplished more easily by using OpenDNS?


If you use the hosts file, your browser doesn't need to wait to make the DNS request to OpenDNS, since it already has the loopback/null route in hosts.


But you have an entire community addressing real time changes (like adding sites that get infected with malware or being used in phishing attacks) and dealing with false positives.


Also, OpenDNS entries should be cached after the first lookup.


My Internet doesn't suck and I have a fairly empty hosts file. I don't get it. It looks like OP was just browsing porn too much and needed to install AdBlock.


I like dark high contrast themes, this is not. Smart and simple though.


>goatse.cx

It's actually an email service now. hello.jpg is gone.


For now, until everyone lets their guard down...


goatse.cx isn't objectionable any more. It's a (novelty, of course) email provider.


Wow, you're right. I was almost convinced you were being clever.


I was Kickstarter contributor #1 and got the first address handed out. I don't even remember how I heard of it.

... But damn, that would have been clever back in the day. Thumbs up!


Does anyone else see this as a Shock site gold mine?


I just wish they were up to date and all worked. goatse.cx hasn't been a shock site for ages.


Nice list to go trough in boring times @ work. Thanks.


I think shutup.css is more effective at making the Internet suck less http://stevenf.com/shutup-css/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: