Overview
The CSV Connector provides functionality to retrieve data from CSV files and register it in the Fess index.
This feature requires the fess-ds-csv plugin.
Prerequisites
Plugin installation is required
Access to the CSV file is required
You must know the character encoding of the CSV file
Plugin Installation
Method 1: Place JAR file directly
Method 2: Install from admin console
Open “System” -> “Plugins”
Upload the JAR file
Restart Fess
Configuration
Configure from the admin console via “Crawler” -> “Data Store” -> “Create New”.
Basic Settings
| Item | Example |
|---|---|
| Name | Products CSV |
| Handler Name | CsvDataStore |
| Enabled | On |
Parameter Settings
Local file:
Multiple files:
Note
Quote processing and escape processing are disabled by default. If you need to handle CSV files where fields enclosed in quotes contain delimiters or line breaks (RFC 4180 compliant), explicitly set quote_disabled=false to enable quote processing. See “Enabling Quote and Escape Processing” below for details.
Parameter List
| Parameter | Required | Description |
|---|---|---|
files | No | CSV file path (local path; multiple paths can be specified separated by commas). Either files or directories must be specified. If both are specified, files takes precedence. Files must have a .csv or .tsv extension; files with any other extension are skipped. |
directories | No | Path to a directory containing CSV files (multiple paths can be specified separated by commas). Only .csv and .tsv files within the directory are processed. Used when files is not specified. |
file_encoding | No | Character encoding (default: UTF-8) |
has_header_line | No | Whether a header row exists (default: false) |
separator_character | No | Separator character (default: comma ,). Escape sequences such as \t can be specified (for tab-separated files). |
quote_character | No | Quote character (default: double quote "). Note that quote processing is disabled by default (see quote_disabled). |
escape_character | No | Escape character (default: backslash \). Note that escape processing is disabled by default (see escape_disabled). |
Note
If both files and directories are empty, an error (DataStoreException) is raised. At least one of them must be specified.
Advanced Parameters
The following parameters provide fine-grained control over CSV parsing behaviour:
| Parameter | Description |
|---|---|
quote_disabled | Whether to disable quote processing (default: true). Set to false to handle RFC 4180 quoted fields. |
escape_disabled | Whether to disable escape processing (default: true). Set to false to enable escaping via escape_character. |
skip_lines | Number of leading lines to skip (default: 0) |
ignore_line_patterns | Regular expression pattern for lines to ignore (e.g., ^#.* to ignore comment lines) |
ignore_empty_lines | Whether to ignore empty lines (default: false) |
ignore_trailing_whitespaces | Whether to ignore trailing whitespace (default: false) |
ignore_leading_whitespaces | Whether to ignore leading whitespace (default: false) |
null_string | String value to treat as null |
break_string | String used to replace line breaks within field values |
readInterval | Wait time in milliseconds between processing each record (default: 0) |
Script Settings
Field values are assembled by referencing the values of each CSV column. CSV columns are referenced directly in scripts as variables without any prefix (there is no data. prefix).
With header row (reference by column name):
Without header row (reference by column index):
Available Fields
<column_name>- Reference by header row column name (only whenhas_header_line=trueand the column name is not blank)cell<N>- Reference by column index (1-based:cell1,cell2, …; available regardless of whether a header row is present)csvfile- Full path of the CSV file being processedcsvfilename- File name of the CSV file being processed
Note
If a column name contains characters that are invalid as a Groovy identifier, such as spaces or hyphens, the column cannot be referenced by name. Use cell<N> instead.
CSV Format Details
Standard CSV (RFC 4180 compliant)
Note
To include a delimiter inside a field by enclosing it in quotes, as in "Book, Programming" above, you must set quote_disabled=false to enable quote processing. When quote processing is disabled (the default), quotes are treated as ordinary characters and fields are split on the delimiter character.
Enabling Quote and Escape Processing
Quote processing and escape processing are disabled by default. Enable them explicitly as follows.
To enable quote processing:
To enable escape processing:
Changing Separator
Tab-separated (TSV):
Semicolon-separated:
Custom Quote Character
Single quote (quote processing must be enabled):
Encoding
Non-ASCII file (Shift_JIS):
Non-ASCII file (EUC-JP):
Usage Examples
Product Catalog CSV
CSV file (products.csv):
Parameters:
Script:
Filtering by stock status:
Employee Directory CSV
CSV file (employees.csv):
Parameters:
Script:
CSV Without Header
CSV file (data.csv):
Parameters:
Script:
Multiple CSV Files Integration
Parameters:
Script:
Tab-Separated (TSV) File
TSV file (data.tsv):
Parameters:
Script:
Troubleshooting
File Not Found
Symptom: The crawl runs but no files are processed; is not found appears in the log
Check:
Verify the file path is correct (absolute path recommended)
Verify the file exists
Verify the file extension is
.csvor.tsv(files with other extensions are skipped)Verify the file has read permissions
Verify the file is accessible by the Fess process user
Character Encoding Issues
Symptom: Non-ASCII characters are not displayed correctly
Solution:
Specify the correct character encoding:
Check file encoding:
Columns Not Recognized Correctly
Symptom: Column separation is not recognized correctly, or a quoted field is split
Check:
Verify the separator is correct:
To handle quoted fields (fields that contain the delimiter character), enable quote processing:
Verify the CSV file format (RFC 4180 compliant)
Header Row Handling
Symptom: The first row is recognized as data
Solution:
When a header row is present:
When no header row is present:
No Data Retrieved
Symptom: Crawl succeeds but the document count is 0
Check:
Verify the CSV file is not empty
Verify the script settings are correct (column names and
cell<N>references must be used without adata.prefix)Verify the column names are correct (when has_header_line=true)
Check the log for error messages
Large CSV Files
Symptom: Out of memory or timeout
Solution:
Split the CSV file into multiple smaller files
Use only the necessary columns in the script
Increase the Fess heap size
Filter out unnecessary rows
Fields with Line Breaks
In RFC 4180 format, fields containing line breaks can be handled by enclosing them in quotes. Since quote processing is disabled by default, quote_disabled=false must be specified:
Parameters:
CsvListDataStore
The fess-ds-csv plugin also includes the CsvListDataStore handler in addition to CsvDataStore.
CsvListDataStore extends CsvDataStore and provides the following additional features:
Multi-threaded processing (controlled by the
numOfThreadsparameter)Automatic deletion of processed CSV files
Timestamp-based file filtering (skips files that may still be written to)
All parameters and script settings of CsvDataStore are available as-is.
Basic Settings
| Item | Example |
|---|---|
| Handler Name | CsvListDataStore |
Additional Parameters
| Parameter | Required | Description |
|---|---|---|
timestamp_margin | No | Elapsed time in milliseconds since the file’s last modification time. Files that have not yet exceeded this threshold are considered to still be written to and are skipped (default: 10000). |
numOfThreads | No | Number of processing threads (default: 1) |
Note
CsvListDataStore automatically deletes CSV files after processing is complete. If an error occurs during processing, the file is renamed to .txt (if renaming fails, the file is deleted).
Advanced Script Examples
Data Processing
Conditional Indexing
Combining Multiple Columns
Date Formatting
Reference
Data Store Connector Overview - DataStore Connector Overview
JSON Connector - JSON Connector
Database Connector - Database Connector
Data Store Crawling - Data Store Configuration Guide