Web Crawling Configuration

Overview

Web Crawling Configuaration page manages configurations for Web crawling.

Select Crawler > Web in the left menu to display a list page of Web Crawling Configuration, as below.

Click a configuration name if you want to edit it.

Click Create New button to display a form page for Web crawling configuration.

Configuration name.

This URLs are locations to start crawling.

This regular expression(Java Format) is allowed url patterns for Fess crawler.

This regular expression(Java Format) is rejected url patterns for Fess crawler.

This regular expression(Java Format) is allowed url patterns for Fess indexer.

This regular expression(Java Format) is rejected url patterns for Fess indexer.

You can specify the crawl configuration information.

The number of linked urls.

The number of indexed urls.

Name of Fess crawler.

The number of crawler threads for this configuration.

Interval time to crawl urls for each thread.

Boost value is a weight for indexed documents of this configuration.

Roles for this configuration.

Labels for this configuration.

If enabled, the scheduled job of Default Crawler includes this configuration.

Click a configuration on a list page, and click Delete button to display a confirmation dialog. Click Delete button to delete the configuration.

If you want to create Web crawling configuration to crawl pages under https://fess.codelibs.org/, parameters are:

For other parameters, use a default value.