Overview
The Elasticsearch/OpenSearch Connector provides functionality to retrieve data from Elasticsearch or OpenSearch clusters and register it in the Fess index.
This feature requires the fess-ds-elasticsearch plugin.
Supported Versions
Elasticsearch 7.x / 8.x
OpenSearch 1.x / 2.x
Prerequisites
Plugin installation is required
Read access to the Elasticsearch/OpenSearch cluster is required
Query execution permissions are required
Plugin Installation
Method 1: Place JAR file directly
# Download from Maven Central
wget https://repo1.maven.org/maven2/org/codelibs/fess/fess-ds-elasticsearch/X.X.X/fess-ds-elasticsearch-X.X.X.jar
# Place the file
cp fess-ds-elasticsearch-X.X.X.jar $FESS_HOME/app/WEB-INF/lib/
# or
cp fess-ds-elasticsearch-X.X.X.jar /usr/share/fess/app/WEB-INF/lib/
Method 2: Install from admin console
Open “System” -> “Plugins”
Upload the JAR file
Restart Fess
Configuration
Configure from admin console via “Crawler” -> “Data Store” -> “Create New”.
Basic Settings
| Item | Example |
|---|---|
| Name | External Elasticsearch |
| Handler Name | ElasticsearchDataStore |
| Enabled | On |
Parameter Settings
Basic connection:
hosts=http://localhost:9200
index=myindex
scroll_size=100
scroll_timeout=5m
Authenticated connection:
hosts=https://elasticsearch.example.com:9200
index=myindex
username=elastic
password=changeme
scroll_size=100
scroll_timeout=5m
Multiple hosts:
hosts=http://es-node1:9200,http://es-node2:9200,http://es-node3:9200
index=myindex
scroll_size=100
scroll_timeout=5m
Parameter List
| Parameter | Required | Description |
|---|---|---|
hosts | Yes | Elasticsearch/OpenSearch hosts (comma-separated for multiple) |
index | Yes | Target index name |
username | No | Authentication username |
password | No | Authentication password |
scroll_size | No | Number of documents per scroll (default: 100) |
scroll_timeout | No | Scroll timeout (default: 5m) |
query | No | Query JSON (default: match_all) |
fields | No | Fields to retrieve (comma-separated) |
Script Settings
Basic mapping:
url=data.url
title=data.title
content=data.content
last_modified=data.timestamp
Accessing nested fields:
url=data.metadata.url
title=data.title
content=data.body.content
author=data.author.name
created=data.created_at
last_modified=data.updated_at
Available Fields
data.<field_name>- Elasticsearch document fielddata._id- Document IDdata._index- Index namedata._type- Document type (Elasticsearch < 7)data._score- Search score
Query Configuration
Retrieve All Documents
By default, all documents are retrieved. If the query parameter is not specified, match_all is used.
Filtering with Specific Conditions
query={"query":{"term":{"status":"published"}}}
Range query:
query={"query":{"range":{"timestamp":{"gte":"2024-01-01","lte":"2024-12-31"}}}}
Multiple conditions:
query={"query":{"bool":{"must":[{"term":{"category":"news"}},{"range":{"views":{"gte":100}}}]}}}
Sorting:
query={"query":{"match_all":{}},"sort":[{"timestamp":{"order":"desc"}}]}
Retrieving Specific Fields Only
Limiting fields with fields parameter
hosts=http://localhost:9200
index=myindex
fields=title,content,url,timestamp
scroll_size=100
To retrieve all fields, do not specify fields or leave it empty.
Usage Examples
Basic Index Crawl
Parameters:
hosts=http://localhost:9200
index=articles
scroll_size=100
scroll_timeout=5m
Script:
url=data.url
title=data.title
content=data.content
created=data.created_at
last_modified=data.updated_at
Authenticated Cluster Crawl
Parameters:
hosts=https://es.example.com:9200
index=products
username=elastic
password=changeme
scroll_size=200
scroll_timeout=10m
Script:
url="https://shop.example.com/product/" + data.product_id
title=data.name
content=data.description + " " + data.specifications
digest=data.category
last_modified=data.updated_at
Multiple Indices Crawl
Parameters:
hosts=http://localhost:9200
index=logs-2024-*
query={"query":{"term":{"level":"error"}}}
scroll_size=100
Script:
url="https://logs.example.com/view/" + data._id
title=data.message
content=data.stack_trace
digest=data.service + " - " + data.level
last_modified=data.timestamp
OpenSearch Cluster Crawl
Parameters:
hosts=https://opensearch.example.com:9200
index=documents
username=admin
password=admin
scroll_size=100
scroll_timeout=5m
Script:
url=data.url
title=data.title
content=data.body
last_modified=data.modified_date
Crawl with Limited Fields
Parameters:
hosts=http://localhost:9200
index=myindex
fields=id,title,content,url,timestamp
scroll_size=100
Script:
url=data.url
title=data.title
content=data.content
last_modified=data.timestamp
Load Balancing with Multiple Hosts
Parameters:
hosts=http://es1.example.com:9200,http://es2.example.com:9200,http://es3.example.com:9200
index=articles
scroll_size=100
scroll_timeout=5m
Script:
url=data.url
title=data.title
content=data.content
last_modified=data.timestamp
Troubleshooting
Connection Error
Symptom: Connection refused or No route to host
Check:
Verify host URL is correct (protocol, hostname, port)
Verify Elasticsearch/OpenSearch is running
Check firewall settings
For HTTPS, verify certificate is valid
Authentication Error
Symptom: 401 Unauthorized or 403 Forbidden
Check:
Verify username and password are correct
Verify user has appropriate permissions:
Read permission on index
Scroll API usage permission
If Elasticsearch Security (X-Pack) is enabled, verify proper configuration
Index Not Found
Symptom: index_not_found_exception
Check:
Verify index name is correct (including case)
Verify index exists:
GET /_cat/indices
Verify wildcard pattern is correct (e.g.,
logs-*)
Query Error
Symptom: parsing_exception or search_phase_execution_exception
Check:
Verify query JSON is correct
Verify query is compatible with Elasticsearch/OpenSearch version
Verify field names are correct
Test query directly on Elasticsearch/OpenSearch:
POST /myindex/_search { "query": {...} }
Scroll Timeout
Symptom: No search context found or Scroll timeout
Solution:
Increase
scroll_timeout:scroll_timeout=10m
Decrease
scroll_size:scroll_size=50
Check cluster resources
Large Data Crawl
Symptom: Crawl is slow or times out
Solution:
Adjust
scroll_size(too large can slow down):scroll_size=100 # Default scroll_size=500 # Larger
Limit fields with
fieldsFilter documents with
querySplit into multiple data stores (by index, time range, etc.)
Out of Memory
Symptom: OutOfMemoryError
Solution:
Decrease
scroll_sizeLimit fields with
fieldsIncrease Fess heap size
Exclude large fields (binary data, etc.)
SSL/TLS Connection
Self-Signed Certificate
Warning
Use properly signed certificates in production environments.
For self-signed certificates, add certificate to Java keystore:
keytool -import -alias es-cert -file es-cert.crt -keystore $JAVA_HOME/lib/security/cacerts
Client Certificate Authentication
For client certificate authentication, additional parameter configuration is required. Refer to Elasticsearch client documentation for details.
Advanced Query Examples
Query with Aggregation
Note
Aggregation results are not retrieved, only documents.
query={"query":{"match_all":{}},"aggs":{"categories":{"terms":{"field":"category"}}}}
Script Fields
query={"query":{"match_all":{}},"script_fields":{"full_url":{"script":"doc['protocol'].value + '://' + doc['host'].value + doc['path'].value"}}}
Script:
url=data.full_url
title=data.title
content=data.content
Reference
Data Store Connector Overview - DataStore Connector Overview
Database Connector - Database Connector
Data Store Crawling - Data Store Configuration Guide