Archives for May, 2011

20
May

Python Script to Grab All CSS for Given URL(s)

While at work, I needed a script that would grab all CSS elements that a webpage was using, both internal and external, given a URL and concatenate these elements into a single file. I came up with the following Python script. There’s a very minimal amount of pre-setup work in order to run it though. First, you must have python installed. I’m going to assume that 1) you do, and 2) you know how to use it. The next step is to download and install the beautifulsoup package. This is how I accomplished this on my Ubuntu box (could vary depending on your distribution):

sudo apt-get install curl
curl -O http://python-distribute.org/distribute_setup.py
sudo python distribute_setup.py
sudo easy_install beautifulsoup

After that, you should be good to go. Copy / paste the following into a python script (I called mine fetch_css.py) and then run it with python fetch_css.py.

# -*- coding: utf-8 -*-
import urllib2
from urlparse import urlparse
from BeautifulSoup import BeautifulSoup

def fetch_css( url ):

    try:
       response = urllib2.urlopen(url)
       html_data = response.read()
       response.close()

       soup = BeautifulSoup(''.join(html_data))

       # Find all external style sheet references
       ext_styles = soup.findAll('link', rel="stylesheet")

       # Find all internal styles
       int_styles = soup.findAll('style', type="text/css")

       # TODO: Find styles defined inline?
       # Might not be useful... which <p style> is which?

       # Loop through all the found int styles, extract style text, store in text
       # first, check to see if there are any results within int_styles.
       int_css_data = ''
       int_found = 1
       if len(int_styles) != 0:
          for i in int_styles:
              print "Found an internal stylesheet"
              int_css_data += i.find(text=True)
       else:
           int_found = 0
           print "No internal stylesheets found"

       # Loop through all the found ext stylesheet, extract the relative URL,
       # append the base URL, and fetch all content in that URL
       # first, check to see if there are any results within ext_styles.
       ext_css_data = ''
       ext_found = 1
       if len(ext_styles) != 0:
          for i in ext_styles:
              # Check to see if the href to css style is absolute or relative
              o = urlparse(i['href'])
              if o.scheme == "":
                 css_url = url + '/' + i['href']  # added "/" just in case
                 print "Found external stylesheet: " + css_url
              else:
                 css_url = i['href']
                 print "Found external stylesheet: " + css_url

              response = urllib2.urlopen(css_url)
              ext_css_data += response.read()
              response.close()
       else:
           ext_found = 0
           print "No external stylesheets found"

       # Combine all internal and external styles into one stylesheet (must convert
       # string to unicode and ignore errors!
       # FIXME: Having problems picking up JP characters:
       #    html[lang="ja-JP"] select{font-family:"Hiragino Kaku Gothic Pro", "ããè´ Pro W3"
       # I already tried ext_css_data.encode('utf-8'), but this didn't work
       all_css_data = int_css_data + unicode(ext_css_data, errors='ignore')

       return all_css_data, int_found, ext_found
    except:
        return "",0,0

################################################################################
# Specify URL(s) here
################################################################################
urls = {
    'jaresfencing': "http://jaresfencing.com",
    'derekhildreth': "http://derekhildreth.com",
    'thelinuxdaily': "http://thelinuxdaily.com",
    'myurl1': "http://myurl1.com"
}

for k, v in urls.items():
   print "nFetching: " + v
   print "--------------------------------------------------------------------------------"
   out, int_found, ext_found = fetch_css(v)
   if ext_found == 1 or int_found == 1:
      filename = k + '_css.out'
      f = open( filename, 'w')
      f.write(out)
      print "Styles successfully written to: " + filename + "n"
      f.close()
   elif out == "":
      print "Error: URL not found!"
   else:
      print "No styles found for " + v + "n"

## OPTIONAL CSS PARSING STEP ##
# MUST INSTALL CSSUTILS with 'sudo easy_install cssutils'
#include cssutils
#sheet = cssutils.parseString(all_css_data)
#f2 = open('temp2', 'w')
#f2.write(sheet.cssText)
#f2.close()

Seems to work for me! I could see potential for many improvements, but it’s pretty robust as it is. Enjoy.

18
May

Run Linux Kernel Inside a Browser

An excellently novel idea! Not sure what I would use it for, but it’s there. I can compile a program, run it, edit it in vi, and do all sorts of stuff. What would you find useful about having a Linux shell in a browser?

http://bellard.org/jslinux/

17
May

Give Normal User Write Access to /var/www

Here’s a quick way to allow a normal user write access to the /var/www directory so that they can work with web server files without the need to login as root first. Open the terminal and issue the commands:

sudo chown -R `id --user` /var/www/
sudo chgrp -R `id --user` /var/www/

If you don’t have sudo enabled on your distro, remove ‘sudo’ from the commands and login as root first before executing them with ‘su’.

You should now be able to write to all the files within /var/www/. My only other suggestion is that you can be a bit more selective about what directories and files are allowed write access by removing the recursive option “-R” from the command.