Overview
Fess allows you to implement custom logic using scripts in various scenarios. By utilizing scripts, you can flexibly control data processing during crawling, URL transformation, and scheduled job execution.
Supported Scripting Languages
Fess supports the following scripting languages:
| Language | Identifier | Description |
|---|---|---|
| Groovy | groovy | The scripting language registered by default. Java-compatible with powerful features |
Note
The only scripting engine registered in Fess by default is Groovy. The default scripting language is groovy (Constants.DEFAULT_SCRIPT). All script examples in this documentation are written in Groovy syntax.
Use Cases for Scripts
Data Store Configuration
Data store connectors use scripts to map retrieved data to index fields. Configuration is written one line per entry in the format field=expression, and each line is evaluated as a single independent Groovy expression.
The variable names available in data store scripts differ depending on the connector type. For example, in the CSV data store and JSON data store, each column name or field name is available directly as a variable (no common prefix such as data is added). For file-based connectors (Box, Google Drive, OneDrive, etc.) the prefix is file.*, for Slack it is message.*, and so on — each connector has its own prefix convention. Refer to the documentation for each data store connector for details on available variables.
Note
Because each line in a data store script is evaluated as a single expression, multi-line if blocks, import statements, and variable declarations using def cannot be used. To conditionally assign a value, use the ternary operator on a per-field basis (e.g., title=enabled == "true" ? name : null). When referencing a class, write its fully qualified class name (FQCN) inline.
Path Mapping
Path mapping is a feature for normalizing and transforming crawl target URLs. By default, it is configured as a pair of a regular expression and a replacement string, and is not a Groovy script. For example, specifying http:// as the regular expression and https:// as the replacement string replaces the URL scheme.
A replacement string is evaluated as a Groovy script only when it is prefixed with groovy:. Inside this script, url (the URL string being transformed) and matcher (the java.util.regex.Matcher for the regular expression) are available.
Scheduled Jobs
Scheduled jobs allow you to write custom processing logic in Groovy scripts. Because the entire script is evaluated as a single Groovy script, multi-line expressions, import statements, and variable declarations using def are all supported.
Methods such as logLevel("info") are methods of the job class (ExecJob and its subclasses) and can be chained. For the executor variable, see “Execution Context and Available Objects”.
Basic Syntax
The following are basic Groovy syntax examples. Comments use // (line comments) or /* */ (block comments). Note that comments starting with # cannot be used in Groovy.
Variable Access
String Operations
Conditional Branching
Date Operations
Execution Context and Available Objects
The objects available inside a script depend on the context in which the script runs. Only container is available in all contexts.
| Execution Context | Available Objects | Description |
|---|---|---|
| All contexts | container | The DI container. Access individual components via |
| Data store scripts | Connector-specific field variables | Each field retrieved from the data store is available as a variable (variable names and prefixes differ by connector; CSV/JSON use the field name directly) |
| Path mapping | url matcher | The URL string being transformed and the |
| Scheduled jobs | executor | The job execution instance (JobExecutor). Used to control job shutdown |
Note
Objects other than container are injected only in specific contexts. For example, executor is available only in scheduled jobs and cannot be used in data store scripts or path mapping.
Security
Warning
Scripts have powerful capabilities, so only use them from trusted sources.
Scripts are executed on the server
Access to the file system and network is possible
Ensure that only users with administrator privileges can edit scripts
Script execution is recorded in the audit log (
audit.log). Whether recording is enabled is controlled byscript.audit.log.enabled, which defaults totrue. The maximum length of the script string that is recorded is controlled byscript.audit.log.max.length, which defaults to100characters.
Performance
Tips for optimizing script performance:
Avoid complex processing: Data store scripts are executed for each document
Minimize external resource access: Network calls cause delays
Use caching: Consider caching values that are used repeatedly
Debugging
In scheduled job scripts, because the entire script is evaluated as a single Groovy script, you can use log output for debugging. (Data store scripts evaluate one line as one expression, so import statements and multi-line processing cannot be used.)
The example above uses a logger named fess.script. To output this log, add the corresponding logger configuration to app/WEB-INF/classes/log4j2.xml.
To enable debug logging for the scripting engine itself, set the log level of the org.codelibs.fess.script package to DEBUG.
Reference Information
Groovy Scripting Guide - Groovy Scripting Guide
Data Store Crawling - Data Store Configuration Guide
Scheduler - Scheduler Configuration Guide