Overview
The Index Export feature exports search documents indexed in OpenSearch to HTML or JSON files on the local filesystem. This functionality is useful for:
Creating static backups of indexed content
Generating offline copies of documents for archival purposes
Building static search result pages
Content migration to other systems
The exported files maintain the original URL path structure from the source documents, making it easy to manage the exported content.
How It Works
When the Index Export job runs, it performs the following steps:
Document Retrieval: Fetches documents from OpenSearch in efficient batches using the Scroll API
Content Processing: Extracts document fields (title, content, URL, etc.) and removes any excluded fields
Directory Structure Creation: Replicates the URL path structure in the export directory based on each document’s
urlfieldFile Generation: Creates files (HTML or JSON) containing the document content
Continue Until Complete: Continues batch processing until the index is fully exported
The Scroll API enables efficient handling of large document sets without memory issues.
Note
Only documents in the search index (fess.search) are eligible for export. Documents that do not have a url field are skipped.
Configuration Properties
Configure the Index Export feature in fess_config.properties:
| Property | Default Value | Description |
|---|---|---|
index.export.path | /var/lib/fess/export | Directory where exported files are stored |
index.export.exclude.fields | cache | Comma-separated list of fields to exclude from export |
index.export.scroll.size | 100 | Number of documents processed per batch |
index.export.format | html | Export file format (html or json) |
Example configuration:
Enabling the Job
The Index Export job is registered as a scheduled job but is disabled by default.
To enable the job:
Log in to the Fess administration console
Navigate to System > Scheduler
Find Index Exporter in the job list
Click to edit the job settings
Set the schedule using a cron expression
Save the settings
Example cron expressions:
0 0 2 * * ?- Run daily at 2:00 AM0 0 3 ? * SUN- Run every Sunday at 3:00 AM0 0 0 1 * ?- Run on the first day of each month at midnight
Custom Query Filtering
You can customize the export to target only specific documents by modifying the job script.
The default script for the Index Exporter job exports all documents:
To add a custom query filter:
Navigate to System > Scheduler
Edit the Index Exporter
Modify the job script to include a query filter
Example date filter (export only documents from the last 7 days):
Example site filter (export only documents from a specific site):
Example to export in JSON format:
Exported File Structure
Exported files are organized to mirror the original URL structure.
For example, a document with URL https://example.com/docs/guide/intro.html would be exported to:
The file path is determined from the document’s url field according to the following rules:
The hostname becomes the top-level directory. If the URL contains no hostname,
_localis used.If the path ends with a slash or has no path component, an index file (
index.htmlorindex.json) is created.If the path contains no file extension, an extension matching the format (
.htmlor.json) is appended.Characters that are invalid in file names (
< > : " | ? * \) are replaced with_, and each path component is truncated to a maximum of 200 characters.If the URL cannot be parsed or a path traversal is detected, the document is saved under the
_invaliddirectory using a hash of the URL as the filename.
For HTML format, each file is generated with the following structure:
titlefield →<title>elementlangfield →langattribute of the<html>elementcontentfield → body of the<body>elementAll other non-excluded fields →
<meta name="fess:fieldname" content="value">tags inside<head>
For JSON format, each file is a JSON object containing all non-excluded fields:
Best Practices
Storage Considerations
Ensure sufficient disk space in the export directory
Consider using dedicated storage for large document sets
Implement regular cleanup of old exports if running periodic exports
Performance Tips
Adjust
index.export.scroll.sizebased on document size: - Smaller documents: larger batch size (200-500) - Larger documents: smaller batch size (50-100)Schedule exports during low-usage periods
Monitor disk I/O during export operations
Security Recommendations
Set appropriate file permissions on the export directory
Do not expose the export directory directly to the web
Consider encrypting exported content if it contains sensitive information
Regularly audit access to exported files
Troubleshooting
Export Job Does Not Run
Verify the job is enabled in Scheduler
Check the cron expression syntax
Review Fess logs for error messages:
Empty Export Directory
Confirm documents exist in the index
Check the export path permissions
Verify the query filter (if custom) matches documents
Export Fails Midway
Check available disk space
Review logs for memory or timeout errors
Consider reducing
scroll.sizefor large documentsCheck OpenSearch scroll context timeout settings
Files Not Accessible
Verify file permissions:
ls -la /var/lib/fess/exportCheck directory ownership matches Fess process user
Confirm SELinux or AppArmor policies allow access