Overview
AI mode (RAG: Retrieval-Augmented Generation) extends Fess search results with LLM (Large Language Model) capabilities, providing information through conversational interaction. Users can ask questions in natural language and receive detailed answers based on search results.
How AI Mode Works
AI mode operates through the following multi-stage flow.
Intent Analysis Phase: Analyzes the user’s question and extracts optimal keywords for search
Search Phase: Uses Fess search engine to find documents with the extracted keywords
Evaluation Phase: Evaluates relevance of search results and selects the most appropriate documents
Generation Phase: LLM generates a response based on the selected documents
Output Phase: Returns the response and source information to the user
This flow enables higher quality responses that understand context better than simple keyword searches.
Basic Configuration
Basic settings for enabling AI mode functionality.
app/WEB-INF/conf/system.properties:
# Enable AI mode functionality
rag.chat.enabled=true
# Select LLM provider (ollama, openai, gemini)
rag.llm.type=ollama
For detailed LLM provider configuration, refer to:
Ollama Configuration - Ollama Configuration
OpenAI Configuration - OpenAI Configuration
Google Gemini Configuration - Google Gemini Configuration
Generation Parameters
Parameters that control LLM generation behavior.
| Property | Description | Default |
|---|---|---|
rag.chat.max.tokens | Maximum number of tokens to generate | 4096 |
rag.chat.temperature | Generation randomness (0.0-1.0) | 0.7 |
Temperature Setting
0.0: Deterministic responses (always the same response for the same input)
0.3-0.5: Consistent responses (appropriate for fact-based questions)
0.7: Balanced responses (default)
1.0: Creative responses (appropriate for brainstorming, etc.)
Context Settings
Settings for the context passed from search results to the LLM.
| Property | Description | Default |
|---|---|---|
rag.chat.context.max.documents | Maximum number of documents to include in context | 5 |
rag.chat.context.max.chars | Maximum number of characters in context | 4000 |
rag.chat.content.fields | Fields to retrieve from documents | title,url,content,... |
rag.chat.evaluation.max.relevant.docs | Maximum number of relevant documents to select in evaluation phase | 3 |
Content Fields
Fields that can be specified in rag.chat.content.fields:
title- Document titleurl- Document URLcontent- Document bodydoc_id- Document IDcontent_title- Content titlecontent_description- Content description
System Prompt
The system prompt defines the basic behavior of the LLM.
Default Setting
rag.chat.system.prompt=You are an AI assistant for Fess search engine. Answer questions based on the search results provided. Always cite your sources using [1], [2], etc.
Customization Examples
For prioritizing Japanese responses:
rag.chat.system.prompt=You are an AI assistant for the Fess search engine. Please answer questions based on the provided search results. Respond in Japanese and always cite your sources using [1], [2], etc.
For specialized domains:
rag.chat.system.prompt=You are a technical documentation assistant. Provide detailed and accurate answers based on the search results. Include code examples when relevant. Always cite your sources using [1], [2], etc.
Session Management
Settings for chat session management.
| Property | Description | Default |
|---|---|---|
rag.chat.session.timeout.minutes | Session timeout duration (in minutes) | 30 |
rag.chat.session.max.size | Maximum number of concurrent sessions | 10000 |
rag.chat.history.max.messages | Maximum number of messages to retain in conversation history | 20 |
Session Behavior
When a user starts a new chat, a new session is created
Conversation history is saved in the session, enabling context-aware dialogue
Sessions are automatically deleted after the timeout period
When conversation history exceeds the maximum message count, older messages are deleted first
Rate Limiting
Rate limiting settings to prevent API overload.
| Property | Description | Default |
|---|---|---|
rag.chat.rate.limit.enabled | Enable rate limiting | true |
rag.chat.rate.limit.requests.per.minute | Maximum requests per minute | 10 |
Rate Limiting Considerations
Consider the LLM provider’s rate limits when configuring
In high-load environments, stricter limits are recommended
When rate limits are reached, users will see an error message
API Usage
AI mode functionality is available through REST APIs.
Non-Streaming API
Endpoint: POST /api/v1/chat
Parameters:
| Parameter | Required | Description |
|---|---|---|
message | Yes | User’s message |
sessionId | No | Session ID (when continuing a conversation) |
clear | No | Set to true to clear the session |
Request Example:
curl -X POST "http://localhost:8080/api/v1/chat" \
-d "message=How do I install Fess?"
Response Example:
{
"status": "ok",
"sessionId": "abc123",
"content": "To install Fess...",
"sources": [
{"title": "Installation Guide", "url": "https://..."}
]
}
Streaming API
Endpoint: POST /api/v1/chat/stream
Streams responses in Server-Sent Events (SSE) format.
Parameters:
| Parameter | Required | Description |
|---|---|---|
message | Yes | User’s message |
sessionId | No | Session ID (when continuing a conversation) |
Request Example:
curl -X POST "http://localhost:8080/api/v1/chat/stream" \
-d "message=What are the features of Fess?" \
-H "Accept: text/event-stream"
SSE Events:
| Event | Description |
|---|---|
session | Session information (sessionId) |
phase | Processing phase start/completion (intent_analysis, search, evaluation, generation) |
chunk | Generated text fragments |
sources | Reference document information |
done | Processing complete (sessionId, htmlContent) |
error | Error information |
For detailed API documentation, see Chat API.
Web Interface
In the Fess web interface, AI mode functionality is available from the search screen.
Starting a Chat
Access the Fess search screen
Click the chat icon
The chat panel will be displayed
Using the Chat
Enter your question in the text box
Click the send button or press Enter
The AI assistant’s response will be displayed
Responses include links to reference sources
Continuing a Conversation
You can continue conversations within the same chat session
Responses will consider the context of previous questions
Click “New Chat” to reset the session
Troubleshooting
AI Mode Won’t Enable
Check the following:
Is
rag.chat.enabled=trueconfigured?Is the LLM provider correctly configured?
Is the connection to the LLM provider possible?
Low Response Quality
Improvements:
Use a higher-performance LLM model
Increase
rag.chat.context.max.documentsCustomize the system prompt
Adjust
rag.chat.temperature
Slow Responses
Improvements:
Use a faster LLM model (e.g., Gemini Flash)
Decrease
rag.chat.max.tokensDecrease
rag.chat.context.max.chars
Session Not Maintained
Check the following:
Is sessionId being sent correctly from the client?
Check the
rag.chat.session.timeout.minutessettingCheck session storage capacity
Debug Settings
When investigating issues, adjust log levels to output detailed logs.
app/WEB-INF/classes/log4j2.xml:
<Logger name="org.codelibs.fess.llm" level="DEBUG"/>
<Logger name="org.codelibs.fess.api.chat" level="DEBUG"/>
<Logger name="org.codelibs.fess.chat" level="DEBUG"/>
References
LLM Integration Overview - LLM Integration Overview
Ollama Configuration - Ollama Configuration
OpenAI Configuration - OpenAI Configuration
Google Gemini Configuration - Google Gemini Configuration
Chat API - Chat API Reference
AI Chat Search - End User Chat Search Guide