htdebug, a script for tracing HTTP redirects
Recently, I have written a simple Python script that helps trace HTTP redirects along with information on rel=canonical links. I called it htdebug.
Here it is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | #!/usr/bin/env python
import urllib
import urllib2
import sys
import requests
import lxml
from lxml import etree
from cStringIO import StringIO
def get_canonical(html_content):
parser = etree.HTMLParser()
tree = etree.parse(StringIO(html_content), parser)
canonical_hrefs = tree.xpath("/html/head/link[@rel='canonical']/@href")
return canonical_hrefs
if __name__ == "__main__":
url = sys.argv[1]
print "Probing {} for redirect path.".format(url)
response = requests.get(url)
for resp in response.history+[response]:
print
print "[{}] {}".format(resp.status_code, resp.url)
print "canonical link: {}".format(get_canonical(resp.content))
|
After placing it in a executable file named htdebug in your $PATH
, you can
use it like so:
$ htdebug http://yahoo.com
Probing http://yahoo.com for redirect path.
[301] http://yahoo.com/
canonical link: []
[200] https://www.yahoo.com/
canonical link: ['https://www.yahoo.com/']
HTTP status codes are shown in square brackets.