mechanize — Hints
Hints for debugging programs that use mechanize.
General
Enable logging.
Sometimes, a server wants particular HTTP headers set to the values it expects. For example, the User-Agent
header may need to be set to a value like that of a popular browser.
Check that the browser is able to do manually what you’re trying to achieve programatically. Make sure that what you do manually is exactly the same as what you’re trying to do from Python — you may simply be hitting a server bug that only gets revealed if you view pages in a particular order, for example.
Try comparing the headers and data that your program sends with those that a browser sends. Often this will give you the clue you need. There are browser addons available that allow you to see what the browser sends and receives even if HTTPS is in use.
If nothing is obviously wrong with the requests your program is sending and you’re out of ideas, you can reliably locate the problem by copying the headers that a browser sends, and then changing headers until your program stops working again. Temporarily switch to explicitly sending individual HTTP headers (by calling .add_header()
, or by using httplib
directly). Start by sending exactly the headers that Firefox or IE send. You may need to make sure that a valid session ID is sent — the one you got from your browser may no longer be valid. If that works, you can begin the tedious process of changing your headers and data until they match what your original code was sending. You should end up with a minimal set of changes. If you think that reveals a bug in mechanize, please report it.
Logging
To enable logging to stdout:
import sys, logging
logger = logging.getLogger("mechanize")
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.DEBUG)
You can reduce the amount of information shown by setting the level to logging.INFO
instead of logging.DEBUG
, or by only enabling logging for one of the following logger names instead of "mechanize"
:
"mechanize"
: Everything."mechanize.cookies"
: Why particular cookies are accepted or rejected and why they are or are not returned. Requires logging enabled at theDEBUG
level."mechanize.http_responses"
: HTTP response body data."mechanize.http_redirects"
: HTTP redirect information.
HTTP headers
An example showing how to enable printing of HTTP headers to stdout, logging of HTTP response bodies, and logging of information about redirections:
import sys, logging
import mechanize
logger = logging.getLogger("mechanize")
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.DEBUG)
browser = mechanize.Browser()
browser.set_debug_http(True)
browser.set_debug_responses(True)
browser.set_debug_redirects(True)
response = browser.open("http://python.org/")
Alternatively, you can examine request and response objects to see what’s going on. Note that requests may involve “sub-requests” in cases such as redirection, in which case you will not see everything that’s going on just by examining the original request and final response. It’s often useful to use the .get_data()
method on responses during debugging.
Handlers
This section is not relevant if you use mechanize.Browser
.
An example showing how to enable printing of HTTP headers to stdout, at the HTTPHandler
level:
import mechanize
hh = mechanize.HTTPHandler() # you might want HTTPSHandler, too
hh.set_http_debuglevel(1)
opener = mechanize.build_opener(hh)
response = opener.open(url)
The following handlers are available:
NOTE: as well as having these handlers in your OpenerDirector
(for example, by passing them to build_opener()
) you have to turn on logging at the INFO
level or lower in order to see any output.
HTTPRedirectDebugProcessor
: logs information about redirections
HTTPResponseDebugProcessor
: logs HTTP response bodies (including those that are read during redirections)
I prefer questions and comments to be sent to the mailing list rather than direct to me.
John J. Lee, April 2010.