Overview
The Elasticsearch/OpenSearch Connector provides functionality to retrieve data from Elasticsearch or OpenSearch clusters and register it in the Fess index.
This feature requires the fess-ds-elasticsearch plugin.
Supported Versions
Elasticsearch 7.x / 8.x
OpenSearch 1.x / 2.x
Prerequisites
Plugin installation is required
Read access to the Elasticsearch/OpenSearch cluster is required
Query execution permissions are required
Plugin Installation
Method 1: Place JAR file directly
Method 2: Install from admin console
Open “System” -> “Plugins”
Upload the JAR file
Restart Fess
Configuration
Configure from admin console via “Crawler” -> “Data Store” -> “Create New”.
Basic Settings
| Item | Example |
|---|---|
| Name | External Elasticsearch |
| Handler Name | ElasticsearchDataStore / ElasticsearchListDataStore |
| Enabled | On |
Note
ElasticsearchListDataStore is an extension of ElasticsearchDataStore that processes retrieved data as a file list and supports multi-threaded index registration. The number of threads can be specified with the numOfThreads parameter (default: 1).
Parameter Settings
Basic connection:
Authenticated connection:
Parameter List
Additional Connection Parameters
Parameters with the settings. prefix are passed as configuration to the internal Elasticsearch/OpenSearch client (fesen HTTP client). The main additional settings are as follows.
| Parameter | Description |
|---|---|
settings.http.ssl.certificate_authorities | Path to the CA certificate file (X.509 format) to trust for HTTPS connections |
settings.http.compression | Whether to enable HTTP compression (default: true) |
settings.http.proxy_host | Proxy server hostname (settings.https.proxy_host also works) |
settings.http.proxy_port | Proxy server port number (settings.https.proxy_port also works) |
settings.http.proxy_username | Proxy authentication username (settings.https.proxy_username also works) |
settings.http.proxy_password | Proxy authentication password (settings.https.proxy_password also works) |
Script Settings
Basic mapping:
Accessing nested fields:
Available Fields
source.<field_name>- Elasticsearch document_sourcefieldid- Document IDindex- Index namescore- Search scoreversion- Document versionseqNo- Sequence numberprimaryTerm- Primary termclusterAlias- Cluster alias (for cross-cluster search)hit- SearchHit object (advanced usage)
Query Configuration
Retrieve All Documents
By default, all documents are retrieved. If the query parameter is not specified, match_all is used.
Filtering with Specific Conditions
Range query:
Multiple conditions:
Note
The query parameter accepts only the query body. The outer {"query":...} wrapper is not needed. Search-level options such as sort cannot be specified in this parameter.
Retrieving Specific Fields Only
Limiting fields with the fields parameter
To retrieve all fields, do not specify fields or leave it empty.
Usage Examples
Basic Index Crawl
Parameters:
Script:
Authenticated Cluster Crawl
Parameters:
Script:
Multiple Indices Crawl
Parameters:
Script:
OpenSearch Cluster Crawl
Parameters:
Script:
Crawl with Limited Fields
Parameters:
Script:
Load Balancing Across Multiple Hosts
Specifying multiple hosts in settings.http.hosts with comma separation distributes requests across each host.
Parameters:
Script:
Troubleshooting
Connection Error
Symptom: Connection refused or No route to host
Check:
Verify host URL is correct (protocol, hostname, port)
Verify Elasticsearch/OpenSearch is running
Check firewall settings
For HTTPS, verify certificate is valid
Authentication Error
Symptom: 401 Unauthorized or 403 Forbidden
Check:
Verify username and password are correct
Verify user has appropriate permissions:
Read permission on index
Scroll API usage permission
If Elasticsearch Security (X-Pack) is enabled, verify proper configuration
Index Not Found
Symptom: index_not_found_exception
Check:
Verify index name is correct (including case)
Verify index exists:
Verify wildcard pattern is correct (e.g.,
logs-*)
Query Error
Symptom: parsing_exception or search_phase_execution_exception
Check:
Verify query JSON is correct
Verify query is compatible with Elasticsearch/OpenSearch version
Verify field names are correct
Test query directly on Elasticsearch/OpenSearch:
Scroll Timeout
Symptom: No search context found or Scroll timeout
Solution:
Increase
scroll:Decrease
size:Check cluster resources
Large Data Crawl
Symptom: Crawl is slow or times out
Solution:
Adjust
size(too large can slow down):Limit fields with
fieldsFilter documents with
querySplit into multiple data stores (by index, time range, etc.)
Out of Memory
Symptom: OutOfMemoryError
Solution:
Decrease
sizeLimit fields with
fieldsIncrease Fess heap size
Exclude large fields (binary data, etc.)
SSL/TLS Connection
Self-Signed Certificate
Warning
Use properly signed certificates in production environments.
Method 1: Specify the CA certificate with the settings.http.ssl.certificate_authorities parameter (recommended)
Specify the path to the CA certificate file (X.509 format) to trust. This method does not affect the Fess-wide keystore.
Method 2: Add certificate to Java keystore
Add the certificate to the trust store of the JVM that starts Fess.
Connecting via Proxy
To connect through a proxy server, specify settings.http.proxy_host and settings.http.proxy_port.
Advanced Query Examples
Query with Aggregation
Note
The query parameter accepts only the query body. Aggregations (aggs), sort, and other search-level options cannot be specified. Only documents are retrieved.
Script Fields
Note
Elasticsearch/OpenSearch script fields are not included in _source, so they cannot be accessed via the source.* prefix. To use script fields, access them via the hit object using hit.getFields().
Reference
Data Store Connector Overview - DataStore Connector Overview
Database Connector - Database Connector
Data Store Crawling - Data Store Configuration Guide