htdebug, a script for tracing HTTP redirects

Recently, I have written a simple Python script that helps trace HTTP redirects along with in­for­ma­tion on rel=canonical links. I called it htdebug.

Here it is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/usr/bin/env python

import urllib
import urllib2
import sys
import requests

import lxml
from lxml import etree
from cStringIO import StringIO

def get_canonical(html_content):
    parser = etree.HTMLParser()
    tree = etree.parse(StringIO(html_content), parser)
    canonical_hrefs = tree.xpath("/html/head/link[@rel='canonical']/@href")
    return canonical_hrefs

if __name__ == "__main__":
    url = sys.argv[1]
    print "Probing {} for redirect path.".format(url)
    response = requests.get(url)
    for resp in response.history+[response]:
        print
        print "[{}] {}".format(resp.status_code, resp.url)
        print "canonical link: {}".format(get_canonical(resp.content))

After placing it in a executable file named htdebug in your $PATH, you can use it like so:

$ htdebug http://yahoo.com
Probing http://yahoo.com for redirect path.

[301] http://yahoo.com/
canonical link: []

[200] https://www.yahoo.com/
canonical link: ['https://www.yahoo.com/']

HTTP status codes are shown in square brackets.

BuiltWith lookup script » « Enable ISO 8061 date format in Thunderbird (and the rest of userland)