CherryProxy - a filtering HTTP proxy extensible in Python
CherryProxy is a simple HTTP proxy written in Python 2.x, based on the CherryPy WSGI server and httplib, extensible for content analysis and filtering.
It has not been designed for operational use and the current version lacks some HTTP features (such as HTTPS support), so some websites will not display properly. However, it should be very useful for testing / demo / prototyping / educational purposes.
Why a new proxy
There are already quite a few HTTP proxies developed in Python, as shown by the excellent list maintained by xhaus. I needed a simple proxy with filtering features, so I looked at several of them. But they were either too simple with lacking features, or too complex to extend, so I decided to develop a different one.
I chose the CherryPy WSGI server because it provides a robust, thread-pooled HTTP server with good HTTP 1.1 support.
News:
- 2011-11-22: moved CherryProxy to its own bitbucket project
- 2011-11-15 v0.12: added parent proxy support
Download:
Get the zip archive from here, or use Mercurial to get the latest source code from here.
Install
On Windows, double-click on install.bat. On other systems, run "python setup.py install" from a shell.
License:
Open-source, BSD-style
Usage as a tool (simple proxy):
1) run CherryProxy.py [options]
Options:
-h, --help show this help message and exit
-p PORT, --port=PORT port for HTTP proxy, 8070 by default
-a ADDRESS, --address=ADDRESS
IP address of interface for HTTP proxy (0.0.0.0 for
all, default=localhost)
-f PROXY, --forward=PROXY
Forward requests to parent proxy, specified as
hostname[:port] or IP address[:port]
-v, --verbose
2) setup your browser to use localhost:8070 as proxy
Usage in a Python Application:
- import cherryproxy
- create a subclass of cherryproxy.CherryProxy
- implement methods filter_request and/or filter_response to enable filtering as
needed.
- see provided examples
Filtering API:
CherryProxy:classimplementingafilteringHTTPproxy
Touseit,createaclassinheritingfromCherryProxyandimplementthe
methodsfilter_requestandfilter_responseasdesired.
Thencallthestartmethodtostarttheproxy.
Note:theloggingmoduleneedstobeinitializedbeforecreatinga
CherryProxyobject.
Seetheexamplescriptsformoreinformation.
- __init__(self, address='localhost', port=8070, server_name='CherryProxy/0.12', debug=False, log_level=20, options=None, parent_proxy=None)
-
CherryProxyconstructor
address:IPaddressofinterfacetolistento,or0.0.0.0forall
(localhostbydefault)
port:TCPportfortheproxy(8070bydefault)
server_name:servernameusedinHTTPresponses
debug:enabledebuggingmessagesifsettoTrue
log_level:logginglevel(useconstantsfromloggingmodule)
options:Noneoroptparse.OptionParserobjecttoprovideadditionaloptions
parent_proxy:parentproxy,eitherIPaddressorhostname,withoptional
port(example:'myproxy.local:8080')
- filter_request(self)
- Methodtobeoverridden:
Calledtoanalyse/filter/modifytherequestreceivedfromtheclient,
afterreadingthefullrequestwithitsbodyifthereisone,
beforeitissenttotheserver.
Thismethodmaycallset_response()iftherequestneedstobeblocked
beforebeingsenttotheserver.
ThefollowingattributescanbereadandMODIFIED:
self.req.data:datasentwiththerequest(POSTorPUT)
(andalsoalllistedinfilter_request_headers)
- filter_request_headers(self)
- Methodtobeoverridden:
Calledtoanalyse/filter/modifytherequestreceivedfromtheclient,
beforereadingthefullrequestwithitsbodyifthereisone,
beforeitissenttotheserver.
Thismethodmaycallset_response()iftherequestneedstobeblocked
beforebeingsenttotheserver.
ThefollowingattributescanbereadandMODIFIED:
self.req.headers:dictionaryofHTTPheaders,withlowercasenames
self.req.method:HTTPmethod,e.g.'GET','POST',etc
self.req.scheme:protocolfromURL,e.g.'http'or'https'
self.req.netloc:IPaddressorhostnameofserver,withoptional
port,forexample'www.google.com'or'1.2.3.4:8000'
self.req.path:pathinURL,forexample'/folder/index.html'
self.req.query:querystring,foundafterquestionmarkinURL
ThefollowingattributescanbeREADonly:
self.req.environ:dictionaryofrequestattributesfollowingWSGI
format(PEP333)
self.req.url:partialURLcontaining'path?query'
self.req.full_url:fullURLcontaining'scheme:netloc/path?query'
self.req.length:lengthofrequestdatainbytes,0ifnone
self.req.content_type:content-type,forexample'text/html'
self.req.charset:charset,forexample'UTF-8'
self.req.url_filename:filenameextractedfromURLpath
- filter_response(self)
- Methodtobeoverridden:
Calledtoanalyse/filter/modifytheresponsereceivedfromtheserver,
afterreadingthefullresponsewithitsbodyifthereisone,
beforeitissentbacktotheclient.
Thismethodmaycallset_response()iftheresponseneedstobeblocked
(e.g.replacedbyasimpleresponse)beforebeingsenttotheclient.
- filter_response_headers(self)
- Methodtobeoverridden:
Calledtoanalyse/filter/modifytheresponsereceivedfromtheserver,
beforereadingthefullresponsewithitsbodyifthereisone,
beforeitissentbacktotheclient.
Thismethodmaycallset_response()iftheresponseneedstobeblocked
(e.g.replacedbyasimpleresponse)beforebeingsenttotheclient.
- set_response(self, status, reason=None, data=None, content_type='text/plain')
- setaHTTPresponsetobesenttotheclientinsteadoftheonefrom
theserver.
-status:int,HTTPstatuscode(seeRFC2616)
-reason:str,optionaltextfortheresponseline,standardtextbydefault
-data:str,optionalbodyfortheresponse,default="statusreason"
-content_type:str,content-typecorrespondingtodata
- set_response_forbidden(self, status=403, reason='Forbidden', data=None, content_type='text/plain')
- setaHTTP403Forbiddenresponsetobesenttotheclientinsteadof
theonefromtheserver.
-status:int,HTTPstatuscode(seeRFC2616)
-reason:str,optionaltextfortheresponseline,standardtextbydefault
-data:str,optionalbodyfortheresponse,default="statusreason"
-content_type:str,content-typecorrespondingtodata
- start(self)
- startproxyserver
- stop(self)
- stopproxyserver