What is Methabot?
Methabot is an open source web crawler and command line tool optimized for speed. It supports scripted filetype parsing, a wide variety of customization options and is easily configured to fit anyones particular needs.
WEBSITE MOVED: This project has moved to a new website: http://metha-sys.org/
Latest News
- 
Methanol/1.7.0 Released! (2009/06/23 15:18)
- 
Methabot/1.6.0.1 and lmm_mysql-1.0.0 (2009/02/23 21:59)
- 
Methabot/1.6.0 Released! (2009/02/21 12:59)
- 
Methabot/1.5.0 Released! (2009/01/15 22:27)
Features
Methabot is rich with fine features, some of them, but not all, are listed below.
- 
It's fast, designed from the ground and up with speed-optimization in mind.
- 
Scriptable through Javascript with E4X
- 
User-defined filetype filtering (according to MIME type, file extension or UMEX expression)
- 
Multi-threaded
- 
Highly configurable from command line
- 
Extensible module system, supporting custom data parsers, filters and protocol handlers.
- 
MySQL support through the Javascript-MySQL binding (lmm_mysql).
- 
Simple yet powerful filtering of URLs through UMEX.
- 
Automated downloading
- 
Support for automatic cookie handling when running over HTTP
- 
Robots Exclusion Standard
- 
Reliable, fault-tolerant networking, redirect-loop detection and some spider trap detection
- 
Parser chaining, share data easily between C and javascript parsers
- 
Unix-friendly interface, piping in and out data for parsing and crawling
- 
HTML to XML/XHTML conversion
- 
Portable, tested with success on 32-bit/64-bit Linux 2.6, 32-bit/64-bit FreeBSD 6.x/7.0 and Mac OS X. Should work on almost any Unix-like OS, partial support for Windows. Old versions of Methabot have full support for Windows.
