Overview
WebConfig API is an API for managing Fess web crawl configurations. You can configure crawl target URLs, crawl depth, exclusion patterns, and more.
Base URL
Endpoint List
| Method | Path | Description |
|---|---|---|
| GET/PUT | /settings | List web crawl configurations |
| GET | /setting/{id} | Get web crawl configuration |
| POST | /setting | Create web crawl configuration |
| PUT | /setting | Update web crawl configuration |
| DELETE | /setting/{id} | Delete web crawl configuration |
List Web Crawl Configurations
Request
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
size | Integer | No | Number of items per page (default: 20) |
page | Integer | No | Page number (starts from 0) |
Response
Get Web Crawl Configuration
Request
Response
Create Web Crawl Configuration
Request
Request Body
Field Description
| Field | Required | Description |
|---|---|---|
name | Yes | Configuration name |
urls | Yes | Crawl start URLs (newline-separated for multiple URLs) |
includedUrls | No | Regex pattern for URLs to crawl |
excludedUrls | No | Regex pattern for URLs to exclude from crawling |
includedDocUrls | No | Regex pattern for URLs to index |
excludedDocUrls | No | Regex pattern for URLs to exclude from indexing |
configParameter | No | Additional configuration parameters |
depth | No | Crawl depth (default: -1 = unlimited) |
maxAccessCount | No | Maximum access count (default: 100) |
userAgent | No | Custom User-Agent |
numOfThread | No | Number of parallel threads (default: 1) |
intervalTime | No | Request interval in milliseconds (default: 0) |
boost | No | Search result boost value (default: 1.0) |
available | No | Enable/disable (default: true) |
sortOrder | No | Display order |
permissions | No | Access permission roles |
virtualHosts | No | Virtual hosts |
labelTypeIds | No | Label type IDs |
Response
Update Web Crawl Configuration
Request
Request Body
Response
Delete Web Crawl Configuration
Request
Response
URL Pattern Examples
includedUrls / excludedUrls
| Pattern | Description |
|---|---|
.*example\\.com.* | All URLs containing example.com |
https://example\\.com/docs/.* | Only URLs under /docs/ |
.*\\.(pdf|doc|docx)$ | PDF, DOC, DOCX files |
.*\\?.* | URLs with query parameters |
.*/(login|logout|admin)/.* | URLs containing specific paths |
Usage Examples
Corporate Site Crawl Configuration
Documentation Site Crawl Configuration
Reference
Admin API Overview - Admin API Overview
FileConfig API - File Crawl Configuration API
DataConfig API - Data Store Configuration API
Web Crawling - Web Crawl Configuration Guide