CrawlingInfo API

Overview

CrawlingInfo API is an API for retrieving Fess crawl information. You can view crawl session status, progress, and statistics.

Base URL

/api/admin/crawlinginfo

Endpoint List

Method Path Description
GET / List crawl information
GET /{sessionId} Get crawl session details
DELETE /{sessionId} Delete crawl session

List Crawl Information

Request

GET /api/admin/crawlinginfo

Parameters

Parameter Type Required Description
size Integer No Number of items per page (default: 20)
page Integer No Page number (starts from 0)

Response

{
  "response": {
    "status": 0,
    "sessions": [
      {
        "sessionId": "session_20250129_100000",
        "name": "Default Crawler",
        "status": "running",
        "startTime": "2025-01-29T10:00:00Z",
        "endTime": null,
        "crawlingInfoCount": 567,
        "createdDocCount": 234,
        "updatedDocCount": 123,
        "deletedDocCount": 12
      },
      {
        "sessionId": "session_20250128_100000",
        "name": "Default Crawler",
        "status": "completed",
        "startTime": "2025-01-28T10:00:00Z",
        "endTime": "2025-01-28T10:45:23Z",
        "crawlingInfoCount": 1234,
        "createdDocCount": 456,
        "updatedDocCount": 678,
        "deletedDocCount": 23
      }
    ],
    "total": 10
  }
}

Response Fields

Field Description
sessionId Session ID
name Crawler name
status Status (running/completed/failed)
startTime Start time
endTime End time
crawlingInfoCount Number of crawl info records
createdDocCount Number of created documents
updatedDocCount Number of updated documents
deletedDocCount Number of deleted documents

Get Crawl Session Details

Request

GET /api/admin/crawlinginfo/{sessionId}

Response

{
  "response": {
    "status": 0,
    "session": {
      "sessionId": "session_20250129_100000",
      "name": "Default Crawler",
      "status": "running",
      "startTime": "2025-01-29T10:00:00Z",
      "endTime": null,
      "crawlingInfoCount": 567,
      "createdDocCount": 234,
      "updatedDocCount": 123,
      "deletedDocCount": 12,
      "infos": [
        {
          "url": "https://example.com/page1",
          "status": "OK",
          "method": "GET",
          "httpStatusCode": 200,
          "contentLength": 12345,
          "executionTime": 123,
          "lastModified": "2025-01-29T09:55:00Z"
        }
      ]
    }
  }
}

Delete Crawl Session

Request

DELETE /api/admin/crawlinginfo/{sessionId}

Response

{
  "response": {
    "status": 0,
    "message": "Crawling session deleted successfully"
  }
}

Usage Examples

List Crawl Information

curl -X GET "http://localhost:8080/api/admin/crawlinginfo?size=50&page=0" \
     -H "Authorization: Bearer YOUR_TOKEN"

Get Running Crawl Sessions

# Get all sessions and filter for running ones
curl -X GET "http://localhost:8080/api/admin/crawlinginfo" \
     -H "Authorization: Bearer YOUR_TOKEN" | jq '.response.sessions[] | select(.status=="running")'

Get Specific Session Details

curl -X GET "http://localhost:8080/api/admin/crawlinginfo/session_20250129_100000" \
     -H "Authorization: Bearer YOUR_TOKEN"

Delete Old Sessions

curl -X DELETE "http://localhost:8080/api/admin/crawlinginfo/session_20250101_100000" \
     -H "Authorization: Bearer YOUR_TOKEN"

Monitor Progress

# Periodically check progress of running sessions
while true; do
  curl -s "http://localhost:8080/api/admin/crawlinginfo" \
       -H "Authorization: Bearer YOUR_TOKEN" | \
       jq '.response.sessions[] | select(.status=="running") | {sessionId, crawlingInfoCount, createdDocCount}'
  sleep 10
done

Reference