Project Name: Nutch
Official Web Page: http://nutch.org/
current wiki: http://wiki.apache.org/nutch/
Old Wiki: http://www.nutch.org/cgi-bin/twiki/view/Main/Nutch
Nutch is an OpenSource search engine application. It consists of a fetcher, indexer, parser and searcher - each of these functional area are provided as plugins.
Fetchers:
- HTML
- FTP
- file
Indexer:
- Basic
Parsers:
- HTML
- MS Word
- text
- MP3
Searcher:
- Basic
- Site