CherryProxy - a filtering HTTP proxy extensible in Python

CherryProxy is a simple HTTP proxy written in Python 2.x, based on the CherryPy WSGI server and httplib, extensible for content analysis and filtering.

It has not been designed for operational use and the current version lacks some HTTP features (such as HTTPS support), so some websites will not display properly. However, it should be very useful for testing / demo / prototyping / educational purposes.

Why a new proxy

There are already quite a few HTTP proxies developed in Python, as shown by the excellent list maintained by xhaus. I needed a simple proxy with filtering features, so I looked at several of them. But they were either too simple with lacking features, or too complex to extend, so I decided to develop a different one.

I chose the CherryPy WSGI server because it provides a robust, thread-pooled HTTP server with good HTTP 1.1 support.

News:

  • 2011-11-22: moved CherryProxy to its own bitbucket project
  • 2011-11-15 v0.12: added parent proxy support

Download:

Get the zip archive from here, or use Mercurial to get the latest source code from here.

Install

On Windows, double-click on install.bat. On other systems, run "python setup.py install" from a shell.

License:

Open-source, BSD-style

Usage as a tool (simple proxy):

1) run CherryProxy.py [options]

Options:

-h, --help show this help message and exit

-p PORT, --port=PORT port for HTTP proxy, 8070 by default

-a ADDRESS, --address=ADDRESS

IP address of interface for HTTP proxy (0.0.0.0 for

all, default=localhost)

-f PROXY, --forward=PROXY

Forward requests to parent proxy, specified as

hostname[:port] or IP address[:port]

-v, --verbose

2) setup your browser to use localhost:8070 as proxy

Usage in a Python Application:

- import cherryproxy

- create a subclass of cherryproxy.CherryProxy

- implement methods filter_request and/or filter_response to enable filtering as

needed.

- see provided examples

Filtering API:

CherryProxy:classimplementingafilteringHTTPproxy

Touseit,createaclassinheritingfromCherryProxyandimplementthe

methodsfilter_requestandfilter_responseasdesired.

Thencallthestartmethodtostarttheproxy.

Note:theloggingmoduleneedstobeinitializedbeforecreatinga

CherryProxyobject.

Seetheexamplescriptsformoreinformation.

__init__(self, address='localhost', port=8070, server_name='CherryProxy/0.12', debug=False, log_level=20, options=None, parent_proxy=None)
CherryProxyconstructor

address:IPaddressofinterfacetolistento,or0.0.0.0forall

(localhostbydefault)

port:TCPportfortheproxy(8070bydefault)

server_name:servernameusedinHTTPresponses

debug:enabledebuggingmessagesifsettoTrue

log_level:logginglevel(useconstantsfromloggingmodule)

options:Noneoroptparse.OptionParserobjecttoprovideadditionaloptions

parent_proxy:parentproxy,eitherIPaddressorhostname,withoptional

port(example:'myproxy.local:8080')

filter_request(self)
Methodtobeoverridden:

Calledtoanalyse/filter/modifytherequestreceivedfromtheclient,

afterreadingthefullrequestwithitsbodyifthereisone,

beforeitissenttotheserver.

Thismethodmaycallset_response()iftherequestneedstobeblocked

beforebeingsenttotheserver.

ThefollowingattributescanbereadandMODIFIED:

self.req.data:datasentwiththerequest(POSTorPUT)

(andalsoalllistedinfilter_request_headers)

filter_request_headers(self)
Methodtobeoverridden:

Calledtoanalyse/filter/modifytherequestreceivedfromtheclient,

beforereadingthefullrequestwithitsbodyifthereisone,

beforeitissenttotheserver.

Thismethodmaycallset_response()iftherequestneedstobeblocked

beforebeingsenttotheserver.

ThefollowingattributescanbereadandMODIFIED:

self.req.headers:dictionaryofHTTPheaders,withlowercasenames

self.req.method:HTTPmethod,e.g.'GET','POST',etc

self.req.scheme:protocolfromURL,e.g.'http'or'https'

self.req.netloc:IPaddressorhostnameofserver,withoptional

port,forexample'www.google.com'or'1.2.3.4:8000'

self.req.path:pathinURL,forexample'/folder/index.html'

self.req.query:querystring,foundafterquestionmarkinURL

ThefollowingattributescanbeREADonly:

self.req.environ:dictionaryofrequestattributesfollowingWSGI

format(PEP333)

self.req.url:partialURLcontaining'path?query'

self.req.full_url:fullURLcontaining'scheme:netloc/path?query'

self.req.length:lengthofrequestdatainbytes,0ifnone

self.req.content_type:content-type,forexample'text/html'

self.req.charset:charset,forexample'UTF-8'

self.req.url_filename:filenameextractedfromURLpath

filter_response(self)
Methodtobeoverridden:

Calledtoanalyse/filter/modifytheresponsereceivedfromtheserver,

afterreadingthefullresponsewithitsbodyifthereisone,

beforeitissentbacktotheclient.

Thismethodmaycallset_response()iftheresponseneedstobeblocked

(e.g.replacedbyasimpleresponse)beforebeingsenttotheclient.

filter_response_headers(self)
Methodtobeoverridden:

Calledtoanalyse/filter/modifytheresponsereceivedfromtheserver,

beforereadingthefullresponsewithitsbodyifthereisone,

beforeitissentbacktotheclient.

Thismethodmaycallset_response()iftheresponseneedstobeblocked

(e.g.replacedbyasimpleresponse)beforebeingsenttotheclient.

set_response(self, status, reason=None, data=None, content_type='text/plain')
setaHTTPresponsetobesenttotheclientinsteadoftheonefrom

theserver.

-status:int,HTTPstatuscode(seeRFC2616)

-reason:str,optionaltextfortheresponseline,standardtextbydefault

-data:str,optionalbodyfortheresponse,default="statusreason"

-content_type:str,content-typecorrespondingtodata

set_response_forbidden(self, status=403, reason='Forbidden', data=None, content_type='text/plain')
setaHTTP403Forbiddenresponsetobesenttotheclientinsteadof

theonefromtheserver.

-status:int,HTTPstatuscode(seeRFC2616)

-reason:str,optionaltextfortheresponseline,standardtextbydefault

-data:str,optionalbodyfortheresponse,default="statusreason"

-content_type:str,content-typecorrespondingtodata

start(self)
startproxyserver
stop(self)
stopproxyserver