Index Export Feature

Overview

The Index Export feature allows you to export search documents indexed in OpenSearch to HTML files on the local filesystem. This functionality is useful for:

Creating static backups of indexed content
Generating offline copies of documents for archival purposes
Building static search result pages
Content migration to other systems

The exported files maintain the original URL path structure from the source documents, making it easy to navigate and manage the exported content.

How It Works

When the Index Export job runs, it performs the following process:

Query Documents: Retrieves documents from OpenSearch using scroll API for efficient batch processing
Process Content: Extracts document fields (title, content, URL, etc.)
Create Directory Structure: Replicates the URL path structure in the export directory
Generate HTML Files: Creates HTML files containing the document content
Continue Until Complete: Processes all documents in batches until the index is fully exported

The scroll API ensures efficient handling of large document sets without memory issues.

Configuration Properties

Configure the Index Export feature in fess_config.properties:

Property	Default Value	Description
`index.export.path`	`/var/fess/export`	Directory where exported files are stored
`index.export.exclude.fields`	`cache`	Comma-separated list of fields to exclude from export
`index.export.scroll.size`	`100`	Number of documents processed per batch

Example configuration:

index.export.path=/data/fess/export
index.export.exclude.fields=cache,boost,role
index.export.scroll.size=200

Enabling the Job

The Index Export job is registered as a scheduled job but is disabled by default.

To enable the job:

Log in to the Fess administration console
Navigate to System > Scheduler
Find Index Export Job in the job list
Click to edit the job settings
Set the schedule using a cron expression
Save the settings

Example cron expressions:

0 0 2 * * ? - Run daily at 2:00 AM
0 0 3 ? * SUN - Run every Sunday at 3:00 AM
0 0 0 1 * ? - Run on the first day of each month at midnight

Custom Query Filtering

You can customize the export job to export only specific documents by modifying the job script.

To add a custom query filter:

Navigate to System > Scheduler
Edit the Index Export Job
Modify the job script to include a query filter

Example script with date filter:

import org.codelibs.fess.exec.IndexExportJob

def job = new IndexExportJob()
job.query = "created:>=now-7d"
job.execute()

Example script with site filter:

import org.codelibs.fess.exec.IndexExportJob

def job = new IndexExportJob()
job.query = "url:*example.com*"
job.execute()

Exported File Structure

Exported files are organized to mirror the original URL structure.

For example, a document with URL https://example.com/docs/guide/intro.html would be exported to:

/var/fess/export/
└── example.com/
    └── docs/
        └── guide/
            └── intro.html

Each exported HTML file contains:

Document title
Main content body
Metadata (last modified date, content type, etc.)
Original URL reference

Best Practices

Storage Considerations

Ensure sufficient disk space in the export directory
Consider using dedicated storage for large document sets
Implement regular cleanup of old exports if running periodic exports

Performance Tips

Adjust index.export.scroll.size based on document size: - Smaller documents: larger batch size (200-500) - Larger documents: smaller batch size (50-100)
Schedule exports during low-usage periods
Monitor disk I/O during export operations

Security Recommendations

Set appropriate file permissions on the export directory
Do not expose the export directory directly to the web
Consider encrypting exported content if it contains sensitive information
Regularly audit access to exported files

Troubleshooting

Export Job Does Not Run

Verify the job is enabled in Scheduler
Check the cron expression syntax
Review Fess logs for error messages:

tail -f /var/log/fess/fess.log | grep IndexExport

Empty Export Directory

Confirm documents exist in the index
Check the export path permissions
Verify the query filter (if custom) matches documents

# Check index document count
curl -X GET "localhost:9201/fess.YYYYMMDD/_count?pretty"

Export Fails Midway

Check available disk space
Review logs for memory or timeout errors
Consider reducing scroll.size for large documents
Check OpenSearch scroll context timeout settings

Files Not Accessible

Verify file permissions: ls -la /var/fess/export
Check directory ownership matches Fess process user
Confirm SELinux or AppArmor policies allow access