Thursday, September 27, 2007

Web Crawler Application Design

The Web Crawler Application is divided into three main modules.

1. Controller
2. Fetcher
3. Parser

Controller Module - This module focuses on the Graphical User Interface (GUI) designed for the web crawler and is responsible for controlling the operations of the crawler. The GUI enables the user to enter the start URL, enter the maximum number of
URL’s to crawl, view the URL’s that are being fetched. It controls the Fetcher and Parser.


Fetcher Module - This module starts by fetching the page according to the start URL specified by the user. The fetcher module also retrieves all the links in a particular page and continues doing that until the maximum number of URL’s is reached.

Parser Module - This module parses the URL’s fetched by the Fetcher module and saves the contents of those pages to the disk.

No comments: