Think twice before using someone else's hosts file, unless you can actually review every entry. Not that I really think anything funny is happening here, but it is a huge security risk to use non-authoritative DNS like this by taking someone else on their word that their hosts file is legit.
Years ago I maintained a popular ad blocking hosts file. I got a lot of complaints when I switched from 127.0.0.1 to 0.0.0.0. Some TCPIP stacks just didn't like it. Others had little webservers running on 127.0.0.1 that quickly served up 404's for speed or black jpegs to make things pretty. Someone out there maintained a little httpd for windows that did just that. If it saw a GET for blahblah.jpg it would serve up a black graphic of some random size. It sure beat the default FF or IE error message.
Honestly, there's no need for this stuff in the age of browser based ad blocking. I gave up on it when I saw how easy it was to write rules and wildcards in ad block plus. Interest in it fell. I'm surprised to see one still maintained.
Unless you have something listening on the ports that doesn't respond or a firewall that is dropping packets on the loopback interface, "connection refused" is returned right away.
if you pass it through grep, you can see that virtually every non-comment line refers to 127.0.0.1:
~/tmp% grep -v "^[[:space:]]*#" fred.txt | grep -v "^[[:space:]]*$" | grep -v 127.0.0.1
books
guestbook
hosts
text file
rss feed
0.0.0.0
0 text file
old macs
0 old macs
math links
origami
photos
polls
siteoftheday
home
Hosted by:
theorem.ca
how to make the internet not suck (as much)
255.255.255.255 broadcasthost
::1 localhost
fe80::1%lo0 localhost
Fri Feb 22 08:05:36 2013top
~/tmp%
And I just realised now that the junk at the top of the output comes from me doing a Select All on the web page to highlight everything first, and it picked up the headers! Ooops. So there's in fact 3 non-comment lines that don't refer to localhost.
Wary of self-promotion here (sorry) but here's a gitHub repo that amalgamates and de-dupes recent hosts files from mvps.org, someonewhocares.org, and malwaredomainlist.com.
I see a load of porn sites. Just how does this Make The Internet Not Suck? (damn it's even more annoying to write incorrect casing than read it.) I thought this was going to block the fifty "Share this! Email this!" buttons on websites or something... If you don't visit porn websites, you won't visit these domains anyway; what's the point in blocking them in a huge hosts file?
"Just how does this Make The Internet Not Suck? (damn it's even more annoying to write incorrect casing than read it.) "
With the exception of "The" the titling is just a proper use of titling case. Do you find it annoying to read newspapers? I imagine it is the awkward phrasing that you (and I) find annoying.
I once had to troubleshoot why a Windows (XP I believe) computer was taking so long to browse the web. Turns out, it had a hosts file like this (generated from Spybot, IIRC). Cleaned out the hosts file, and it ran much faster.
Just something to think about when using massive hosts files.
I tested exactly this recently and a hosts file dramatically helps browsing performance.
It's true there is an up-front cost paid when first loading it, and for the initial domain lookups. After initial lookups, DNS cache takes over.
The real benefit comes because countless requests for ads, tracking scripts and counters, etc, never leave your box. Network traffic is greatly reduced.
A comprehensive hosts file dramatically improved the performance of a relative's dialup internet connection a few years ago. A web that was borderline unusable became much snappier. And web pages had almost no ad adornments.
Yeah it was definitely Spybot. That was the first thing I thought of when I saw this. For the time, it was a good enough workaround in my experience. I don't really remember having network slowdown, but I may have been used to it since I think I was still on a blazing 56k modem at that point ;-)
Also did this. Once a domain is looked up and in the local DNS cache it's fine, but the initial DNS lookup can get stupidly slow, especially on older hardware.
I somehow noticed it included stat.livejournal.com in its list, which is funny because they must've assumed it's something related to statistics when in fact that's where static resources were hosted. (A user had already grabbed "static.livejournal.com".)
In Australia the government went on this rampage of attempting to "protect" people through censorship. Perhaps a better solution would have simply been to ask Microsoft and Apple to distribute computers with a pre-loaded host file. Nice and easy to opt out, all while providing some decent protection from obvious silly sites.
Or they could declare another moral panic, and start kicking down doors and arresting people for the contents of their hosts file: "Bu-bu-but I didn't visit any of those sites!" - "Yeah, right - tell it to the judge, perv".
I tried Gas Mask a couple weeks ago, but it crashed every time I tried to edit the file list. I'm running Mountain Lion. At the time, I was busy on a project, so I didn't have any time to troubleshoot. I just gave up and uninstalled it.
It looks to be very useful so hopefully I can get it working sometime soon or at least be able to provide some crash feedback.
For windows use http://www.abelhadigital.com/hostsman (free, but closed source). It also has a list of popular 3rd party supplied host files to block unwanted sites, with options to update them
for domains you don't want to route try 192.0.2.x - that range is reserved for examples and won't end up with spurious requests to stuff you're running on localhost.
Will browsers immediately fail the connection, or will it try to make the connection and wait for a timeout? I generally use 0.42.42.42 because Class-A addresses can't start with 0, so they fail immediately.
If you're running a web server locally on port 80, the webserver will handle the request. If no service is listening on port 80 your networking stack will not be able to open a tcp connection and it will be actively refused, unless you're using some kind of terrible software firewall.
IIRC, addresses 0/8 addresses are for multicast, so depending on your and your upstream's router configuration, that potentially sends those requests to a weird set of hosts on your class-A.
This would depend on your network setup - it seems unlikely that the OS would automatically refuse to route a request to there. On OS X:
$ curl --connect-timeout 5 -v 192.0.2.1
* About to connect() to 192.0.2.1 port 80 (#0)
* Trying 192.0.2.1...
* Connection timed out after 5011 milliseconds
* Closing connection 0
curl: (28) Connection timed out after 5011 milliseconds
So your browser will likely end up trying the connection and timing out after a while.
it does wait for a timeout, so "route" was poorly chosen. it just wont get anywhere on most configurations because they respect the 192.0.2.x is examples only convention.
I like the idea of this, just deciding how to balance it with my own hosts file that has local entries vs keeping this one updated (presuming it will be)
Interesting that something that looks to have taken a fair bit of work and is moderately useful hasn't had a comment on it.
I have a script setup in my router (TomatoUSB) that downloads a known hosts list and maps all the domains to 0.0.0.0 to block ads and other bad content for all the devices on my network.
Generally, I enjoy not seeing all the ads, but some sites (Slickdeals, I'm looking at you) make such extensive use of affiliate and ads sites that it makes the site barely useable. For slickdeals I ended up making a Chrome extensions that finds URL encoded URLs within a query parameter and redirects to that site directly instead of the affiliate link.
I like this approach, although I don't have a router that I can flash with TomatoUSB. Instead I setup adsuck[1] which is a DNS blacklisting daemon. It can be setup on individual machines, or on the perimeter like in your case. I prefer the DNS approach as hosts files can grow to be quite lengthy (article proves it) and can in certain cases cause performance issues, as parsing several KBs of entries prior to sending a DNS request is noticeable.
Depending on your OS, it is possible there is already something like this in place. IIRC, Ubuntu shipped with dnsmasq [1] at some point, thus maintaining a local DNS cache which in turn speeds up browsing. Otherwise, it's as easy as installing dnsmasq via your preferred package manager, and configuring dnsmasq.conf / resolv.conf
dnsmasq.conf needs to point to the DNS server you wish to use
resolv.conf needs a line "nameserver 127.0.0.1" or whatever IP dnsmasq is listening on.
Make sure dnsmasq is started at boot, and that your resolv.conf isn't overwritten by your dhcp client, and you're good to go.
Further configuration needs to done to make dnsmasq provide authoritative answers for other domains.
If you don't need caching, then use adsuck (linked it in my previous comment in this thread).
I personally run a local caching server on my home network. Rather than locally on my personal machine. That way I get the best of two worlds, when I move my laptop it will pick up new DNS fast and easy (especially if it is required for captive portals and the like) and at home I get fast speeds over the local network.
I run unbound, I don't have a tutorial or anything like that, mainly because I don't know what OS you are running or where you want to do this.
My personal unbound also connects to a locally hosted nsd, which is used to host the zone network.lan. On network.lan is where all of my hosts live, it is where I have entries for various of my internal servers, as well as all of my test sites.
Be aware that this blocks Google Analytics and similar services. IMO it's a really bad idea to encourage users to block these services. For example, they are used by pretty much every software company to IMPROVE the product, and if they can't get data on what people use the most in their product, it may not get the attention it needs.
The reality is that so few people will actually install a hosts file like this, that it's not going to make an appreciable difference in any statistics.
I would imagine it could put a significant dent in the already very small percentage of people using linux. Thus under representing it even further. Not that I'm too worried about it though.
edit: Looking through the current version of the host file, it appears google analytics and some other "non-nefarious" tracking sites are commented out.
That is true, but it still conflicts with the purpose of this file, which is to "Make the internet suck less", while it instead, in the long term, may make the internet suck more. I don't see how blocking them makes it sucks less anyhow?
I also have a webserver that any black holed domain is sent to so that there is no waiting for a web request to time out. The first time I put it in place I didn't have the webserver so the performance when hitting one of these was horrible. I also use it for any ad server or analytic site I want to permanently block.
This way I don't have to distribute and update a hosts file to every machine in the house and I control where the offending sites redirect too.
I have been creating master zones on my named.conf and pointing them to a blocked.txt file. My method is more cumbersome, the one you linked is very streamlined. Excellent, and thanks!
I also made a little thing for adding additional zones to the list when I was playing a little with Bootstrap. I wrote a little page to add, enable and disable (if I blocked something important) entries. Then a cron job to pull the enabled domains, build the conf file and reload BIND.
First, create a zone file that all of the domains will share. I put mine in /var/named/master/dummy:
$TTL 1d
@ IN SOA ns1.localdomain. hostmaster.ns1.localdomain. (
2012100601 ; serial
8h ; refresh
2h ; retry
7d ; expire
1h ; default_ttl
)
;
; Name servers
;
@ IN NS ns1.localdomain.
@ IN NS ns2.localdomain.
;
; Host addresses
; Leave commented to return NXDOMAIN
; Uncomment to resolve to IP address
;@ IN A 127.0.0.1
I prefer not resolve these hosts at all (NXDOMAIN), because it seems to be faster and I don't want client machines to probe themselves, but you can uncomment the A record and use whatever IP address you want (e.g. for sinkhole monitoring). Remember to increment the serial number with every edit (which will be rare or never, once you've set it to your liking).
Next, create a simple file with each domain you want to block on one line. I put mine in /var/named/dummy:
ads.example.com
tracker.example.com
example.org
Now create a conf file in the format bind expects, pointing every domain to the zone file (for convenience, put this in /var/named/Makefile in a 'dummy' target):
sed 's/.*/zone "&" { type master; file "master\/dummy"; };/' < dummy > dummy.conf
Which will result in /var/named/dummy.conf containing:
zone "ads.example.com" { type master; file "master/dummy"; };
zone "tracker.example.com" { type master; file "master/dummy"; };
zone "example.org" { type master; file "master/dummy"; };
Finally, add to your named.conf:
include "/var/named/dummy.conf";
Restart bind and you're now authoritative for those zones on your network!
Is there an IP block lists for Servers ? I see a lot of traffic from ip addresses that seem to probe my measly AWS EC2 instance and I wish there was a list I could feed to my NGINX or IPtables so they would just drop those packets .
I love the search engines ; its the other type of automated crawlers -- its kind of each to spot them by the urls access logs. Once a flaw/vulnerability is publicized, you can see script kiddies' attempting to use that on your site
There are frequent updates to this site, as new sites get added so often. In order to keep up with it, I wrote a lil script to download this and make it my host file, every so often (once a month in chron) saving the old one as a backup, of course. If I remember, I'll post it up on github this evening.
edit: I'll also add that this host file blocks adds on youtube, and pandora (and probably several others). I never wait for ads there anymore.
HostsMan is a pretty decent Hosts file manager for Windows, has some automation that can help with longer lists, also can update or auto merge lists from the web. (I'm not the author of this application, but I have used it).
Dont know about the list linked directly, but I have used hostfiles to block plenty of sites. I avoid timeouts by running a small http service on localhost so that your browser can get a quick 404 response instead of a timeout..
I used to run a local server as well until I saw a recommendation to use 0.0.0.0. Requests now fail instantly and I don't have to run a local server just for blocked domains.
There is one thing I do miss: being able to log requests and seeing which domains were accessed the most.
I started maintaining a hosts file (simple heuristic: if I see the browser status bar is waiting for a given domain, it gets added to the hosts file), but at some point it was less maintenance effort to simply install adblock.
I'd prefer using Disconnect and Adblock - I usually allow the ads on sites I read daily - at least they receive the impressions, even if I don't click on them.
Facebook and Twitter buttons on the other hand - sorry guys, you stay blocked.
There are some really poor ideas in there like blocking the domains of affiliate marketing networks. That won't just block ads, it'll prevent you from clicking through from affiliate sites to vendors' websites.
I may be in the minority, but I like that. If I really want to get to the vendor site I will type in the URL directly, this is generally safer than clicking links anyway.
If you use the hosts file, your browser doesn't need to wait to make the DNS request to OpenDNS, since it already has the loopback/null route in hosts.
But you have an entire community addressing real time changes (like adding sites that get infected with malware or being used in phishing attacks) and dealing with false positives.
My Internet doesn't suck and I have a fairly empty hosts file. I don't get it. It looks like OP was just browsing porn too much and needed to install AdBlock.