Rate this page del.icio.us  Digg slashdot StumbleUpon

How to use Squid as an easy web filter

by

Have you ever mistyped a web address and ended up somewhere you definitely did not want to go? You miss one letter in the URL, and instead of getting to your favorite site, you end up in the virtual red light district! In this article, Anderson Silva explains how to set up a basic web filter.

So what if instead of you making this mistake, it’s your child accidentally going to these questionable sites? I have two kids, a five- and a seven-year-old, and both of them have been actively playing Flash-based kids’ games online since they were two years old. So lately I’ve been thinking of solutions to this problem.

There are plenty of non-open-source solutions to help parents filter the material that their little ones are being exposed to on the web. But I didn’t find that many open source and simple solutions available online. That’s when I decided to use an open source web proxy system called Squid as a quick, dirty, and simple solution for my web filtering problem.

I’ve used Squid to set up my system so that my kids’ browsers only access the web addresses that I want them to. Everything else out there is out of reach for them.

I will note that this method can be easily be circumvented. I don’t recommend it for computer-savvy older kids, but it should work fine for your kindergarten to elementary school aged kids.

In the instructions below, I assume that the Squid proxy service will be running on the same computer that the children will be using, but that is not a requirement. I also assume that the children are using Fedora 7 as their desktop OS.

Let’s begin:

1. Start a terminal session. (In Gnome: Applications > System Tools > Terminal)

2. Become root in your terminal session:

   	su 

3. Install Squid:

   	yum install squid

4. Set up Squid to start every time you boot:

   	/usr/sbin/chkconfig squid on

5. Edit the file /etc/squid/squid.conf.

6. Find the second line on the conf file with: #Recommendend minimum configuration:.Under that line there will be a few rules starting with the word acl. At the end of the acl block, add the following line:

        acl safekids dstdomain .kidsite.com .kidsite2.com

Replace .kidsite.com and .kidsite2.com with a list of the sites you want your children to be able to visit. You can list a full address like: http://www.kids.com, but then if your child tried to go to a subdomain like games.kids.com, Squid would block it. Add a dot (.) in front of the domain to make a wildcard that will allow any subdomain to go through.

7. Find the line:

# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS

Below that line, find the line that says http_access allow localhost, and comment it out by adding a '#' in front of it:

# http_access allow localhost

8. Abovethe line http_access deny all, add:

http_access allow safekids

9. Start Squid service:

/sbin/service squid start 

This should have your squid proxy server all set up.

If you are going to use Squid on a separate server, open up port 3128 on your firewall to allow the browser to talk to Squid.

Now that we have our server set up, we need to set up the child's browser to use our Squid proxy.

1. Start Firefox (in this case I am using Firefox 2).
2. Go to Edit > Preferences > Advanced.
3. Select the Network Tab.
4. Click ”Settings...” under “Connection.”

Connection settings screenshot

5. Under the new “Connection Settings” window, select “Manual proxy configuration.”

proxy config screenshot

6. For HTTP Proxy, enter “localhost” and port 3128. If your Squid service is running on another machine, use the IP address of that machine instead of “localhost.”
7. Click “OK.”
8. Close Firefox Preferences.

Now only the sites that have their domains specified in the Squid configuration will be granted access to your kids' browser. Everything else will give a proxy error.

error screen

Overall, this is not a foolproof filter. All you have to do to circumvent it is turn off the manual proxy on the browser, but I hope that with my five- and seven-year-olds, I still have a few years to come up with a more robust solution.

29 responses to “How to use Squid as an easy web filter”

  1. sean says:

    nice to see you are writing stuff ;)

  2. David B. says:

    I still have that boxed copy of RH5 you gave me. I don’t think my wife will ever forgive you.

  3. Steve says:

    Nice article! Where did you learn to write so well ;-).

  4. Scott says:

    Great work Anderson! Awesome article and I will definitely use it at my next conference.

  5. Jim says:

    Anderson, you are one inventive guy. Can’t wait to see what else you come up with.

  6. Solomon S. says:

    My 4yo’s computer is a linux box…she loves starfall.com (or is that .starfall.com?), but I really don’t want her browsing anything else…so good solution!

    As an alternative solution for when your kids get older, could you have a protected login script write the hosts file to redirect all unapproved traffic to a specific url?

  7. kurt b says:

    You could setup iptables to handle this one. You could use the owner module of iptables to take the traffic and redirect it based on the user logged in to the squid proxy. So if a kid is logged in, redirect packets destined for port 80 or 443 to the squid on the local machine. This way they are proxied no matter what they do and you can browse unrestricted. Just a thought.

  8. Sebastien says:

    I would rather use a transparent web proxy (as kurt b was describing) with dansguardian ( http://dansguardian.org/ ) or squidguard ( http://www.squidguard.org/ ).

    I explain on this link how to set up a transparent squid proxy :
    http://www.wains.be/index.php/2006/12/18/transparent-squid/

  9. Smitty says:

    ************
    # Jim says:
    August 31st, 2007 at 10:16 am
    …..
    **************

    Jim, I hate to burst your bubble, but this is pretty trivial functionality that has been available in squid for years. Not only that but it is a less than optimal solution. It relies upopn a static list of filtered sites coded into the configuration of the squid daemon. It also doesn’t do anything about users who don’t proxy their http. In short, this is another disappointing article lacking any real technical depth. Unfortunately these are becoming all to common on RedHat’s web-site.

  10. dogwind says:

    Smitty thy name is douchebag. Nice work Anderson. Looking forward to future articles.

  11. Anderson Silva says:

    Smitty,

    You are right… the functionality has been available for years, but there are new linux users out there every day, so it never hurts re-visit old topics. I disagree that all articles must have deep technical depth, since not everyone may be at your level of expertise. I tried to right this article w/ as few steps as possible so any linux beginner could have a crank at it.

    Thanks for the feedback.

  12. Planet Malaysia says:

    I know this is one of the option but adding one by one into website list is painful and you can’t imaging they will go to which web site.

  13. Smitty says:

    Anderson Silva,

    First of all let me apologize for for being a bit over-critical of your article. In fact the article on its own is quite good, especially as an introduction to the use of squid as a filtering proxy.

    What I was actually trying to address was the overzealous praise given by ‘Jim’ (and others) for an article which is actually quite basic and which contains no ‘inventive’-ness or novel ideas.

    Please accept my apologies for mis-speaking, and thereby casting aspersions, but I stand by my basic premise. The level of expertise exhibited in recent RedHat magazine articles appears to be falling to the point where the information presented is so superficial as to be almost incorrect.

  14. Anderson Silva says:

    Smitty,

    No sweat man. Take care.

  15. jef says:

    “The level of expertise exhibited in recent RedHat magazine articles appears to be falling to the point where the information presented is so superficial as to be almost incorrect.”

    Get off your high horse dude. The article is correct information and presentation is good.

  16. Henry Hertz Hobbit says:

    If you want a more robust filter, try mine which is composite of blocking hosts file and PAC filter:

    http://www.securemecca.com

    It consists of three parts
    1. A pseudo web server written PERL. Feel free to use something else.
    2. A blocking hosts file. This takes up the slack of the next unit.
    3. A Proxy Auto Configuration (PAC) filter. This is the biggie since it does the majority of the work. Feel free to move the rules into a squid or other proxy server if you want to do that.

    In terms of locking things down, here is how you can do it on Linux (be sure everything is set in the browsers first).
    I still don’t have a good default for the user, but all of the following in a terminal locks things down.

    Firefox:
    ========
    $ su
    # cd /home/username/.mozilla/firefox
    # chown bin:bin profiles.ini
    # chmod 644 profiles.ini
    # # now they cannot create a new profile
    # cd 2m6tf5t7.default
    # # the stuff in front of the “.default” will be different
    # chmod 644 prefs.js
    # chown bin:bin prefs.js
    # cd ../../..
    # the next is untested
    # chown bin:bin .mozilla
    # chmod 755 .mozilla

    Firefox like everything else has no concept of dynamic versus static (thinking of /etc/fstab on Linux versus /etc/vfstab and /etc/mnttab on Solaris), but I was able to go for some time without it complaining too much. It really desperately wants to write into the prefs.js file ALL the time.

    Opera:
    ======
    $
    # cd /home/username/.opera
    # chown bin:bin opera6.ini
    # chmod 644 opera6.ini
    # cd ..
    # chown bin:bin .opera
    # chmod 755 .opera

    I haven’t tested it exhaustively for either, but it works. Like I said, bin:bin may not be best. I am open for suggestions. My filters concentrate on sites that harm machines. Yes, believe it or not they CAN cause problems on Linux and I have had to torch my .mozilla folder and start all over again some times as I do my work. Just because you are immune to Windows executables and registry exploits does NOT mean that they can’t abuse Java, FlashPlayer, and JavaScript, and they HAVE read my files to find my email addresses when Java or JavaScript haven’t been properly sand-boxed.

    Hey, if you don’t like the filter rules, DUMP THEM! This is just a starting point, but they just caught a new spy host on the way to writing this article – hint, hint – you will need to uncomment some rules.

  17. Brett says:

    To overcome the issue of them turning off the proxy. You could just block direct access on port 8080 or 3128 at your gateway, and only allow http traffic that comes from the squid server through to the outside world…

    Or if the proxy and gateway are the same then make squid act as a transparent proxy, then no configuration at the client end is required.

  18. David Legg says:

    Setting up squid is indeed as easy as the author explains.
    However, something dynamic is also needed to protect kids from the Internet’s nasties. I tried squidguard on FC5 and FC6, but found it rather unmaintained and couldn’t get it working properly. Ironically, my son then tried dansguardian which worked quite well. The bti of squidguard that was worth keeping, however, was the blacklists, which ca also be used with dansguardian.

    So, be prepared to fiddle around and try a few things before settling on a solution. What’s really needed is a system that automatically updates blacklists on a daily basis, plus some content filtering to catch the millions of pages that will never reach your blacklists in time.

  19. STOO says:

    You might like to check out…

    http://dansguardian.org/

  20. chasq says:

    Great article about a topic near and dear to my heart. I have three sons, from 11 to 16 and believe me before I installed Squid, if you typed Ctrl-h in one of their browser sessions, your jaw would drop. That’s just from the names of these porn sites. So I installed Squid on my file server and configured their browsers to use it to connect to the Internet. They haven’t figured out that they can simply turn off the proxy.

    Next, I bought a copy of ‘Squid, the Definitive Guide’ published by O’Reilly. You really shouldn’t deploy Squid with out this book. With this book you will learn how to set up filters, both time of day, workstation and URL’s. And you will learn that you can point Squid to separate text files containing blacklists or regex expressions that can filter out bad words in URLs.

    What I have done is grepped through the Squid logs for file extensions like .jpg and .gif. The part of the name to the left of the dot will clue you in if this is a porn site or something more appropriate like models, sports or racing. Of course you need to grep for the bad words, too. It is very enlightning to discover how varied and clever these porn guys are. Then I add the new sites to my pornhosts text file and tell Squid to re-read its config file.

    Sure, it is a pain to maintain your own blacklist. But when you show the contents to other parents, they instantly realize how big a problem this is. IMHO, Squid is one of he best Open Source success stories out there.

  21. Guy says:

    chasq

    I think you are correct, and have explained some important features. To add to what you have already accomplished, it may be possible for you to configure your network to direct all HTTP requests through squid, and eliminate the manual proxy configuration. As a network administrator I come across some inventive people who endeavor to thwart any measure meant to keep them from doing things they are not allowed to do. One of the things you may need to do is track HTML access rather than HTTP access, which is called “Level 4″ or application layer filtering. I have not attempted to do this with open source products, but it should be possible. Alternatively you can use opensource software such as ngrep to “watch” for HTML that is not on TCP port 80, port 443 is much more difficult because it is encrypted. Some porn sites are designed to use alternative ports to enable people to access porn from work, where most systems only filter port 80. One way to find such sites is to examine the history and the bookmarks on the computer and look for a port designator such as “http://www.nastysite.cc:69/” wher the “:69″ is the port used to access the site instead of port 80.

    I hope this helps some of you in your attempt to enforce the rules you have made aware to those you are restricting.

  22. Henry Hertz Hobbit says:

    You cannot do it with a chmod. They can still move the folder and create another one. What really needs to be done is the filter needs to be moved into an egress point (the Linksys router for a home situation). The sad thing is that people don’t understand that porn is three to five times riskier than the Internet at large. And like I just told somebody else, I have caught them stuffing garbage into the .mozilla folder on Linux (not just with porn sites). In other words the perps are moving their bad stuff into the Linux world. It is usually Java / JavaScript that is being abused. You need to make it something that is NOT on the machine being used and no matter what you do to that machine the filter remains active. Perhaps Snort running on a transparent firewall machine with locked box where it goes out to the Internet.

    It looks like we got some really good thinking going. Just remember to pull what I have ASAP because I am out of money and out of time and haven’t worked for eleven years. If you want to know why contact me directly.

    Thanks for the good ideas.

  23. Comment on How to use Squid as an easy web filter by Henry Hertz … says:

    [...] You can read the rest of this blog post by going to the original source, here [...]

  24. Comment on How to use Squid as an easy web filter by Comment on … says:

    [...] You can read the rest of this blog post by going to the original source, here [...]

  25. Lou says:

    Great article. I was looking for an example to utilze squid for a white list of sites. I will implement squid with a python script to update the config to allow non technical users to modify the white list.

    Smitty ought to browse other sites if he does not care for the RH content. PErhaps we could have RH block him with SQUID even if it is ineffective. :)

  26. smitty, jr says:

    All you need to do is have your router block all outgoing traffic from everything except the proxy server.

    Then the only way out is thru the proxy.

    But is there a better solution than using a blacklist like http://squidguard.shalla.de/Downloads/shallalist.tar.gz ?

    It easy my machine alive… :-(

  27. Ogdenous says:

    If you want to force the proxy settings, assuming the use XP or Vista, just remove admin privileges from them, and log on as an admin and setup group policy to force proxy setting. it will gray out the setting for them to change.

  28. Kavesa says:

    Nice guide.
    Question: is there a way or wildcard that I could use in order to enable in the whitelist all domains ending on something? Like all .gov domains?
    Thanks

  29. ][AKEP says:

    Stupid crap for lamers. Gimme pre-compiled version of DansGuardian for windows! I haven’t money to buy and maintain another PC just for filter and only for linux. I want all and for free! For squid/nt!!!