Tuesday, November 18, 2008

Meetings and Boredom

I am sitting in a really boring meeting. I was browsing reddit and found a very interesting dilbert script. I am a huge fan of Dilbert and wanted to download Dilbert strips for offline viewing. I know its not entirely ethical but what the hell. I wrote a small python screen scraper to do that. This requires you to download Beautiful Soup. If I am too bored, I will write a script to compose it as a PDF.

import urllib2
import datetime
import os.path
from BeautifulSoup import BeautifulSoup
dilbert_dir = 'c:\\work\\toons\\dilbert\\'
year=2007
start_date=datetime.date(year,1,1)
one_day = datetime.timedelta(days=1)
current_date = start_date
base_url = r'http://www.dilbert.com/fast/'
while(current_date.year==year and current_date <= datetime.date.today()):
dtfmt=current_date.strftime('%Y-%m-%d')
contents=urllib2.urlopen(base_url + dtfmt + '/').read()
soup=BeautifulSoup(contents)
img_loc=soup.findAll('img')[-1]['src']
print 'downloading ', img_loc, ' as ', dtfmt, '.gif'
img=urllib2.urlopen('http://www.dilbert.com/' + img_loc).read()
f=open(os.path.join(dilbert_dir,dtfmt+'.gif'),'wb')
f.write(img)
f.close()
current_date= current_date + one_day

No comments: