API WebConfig

Vue d’ensemble

L’API WebConfig permet de gerer les configurations de crawl Web de Fess. Vous pouvez manipuler les parametres tels que les URLs cibles du crawl, la profondeur de crawl, les patterns d’exclusion, etc.

URL de base

/api/admin/webconfig

Liste des endpoints

Methode Chemin Description
GET/PUT /settings Obtention de la liste des configurations de crawl Web
GET /setting/{id} Obtention d’une configuration de crawl Web
POST /setting Creation d’une configuration de crawl Web
PUT /setting Mise a jour d’une configuration de crawl Web
DELETE /setting/{id} Suppression d’une configuration de crawl Web

Obtention de la liste des configurations de crawl Web

Requete

GET /api/admin/webconfig/settings
PUT /api/admin/webconfig/settings

Parametres

Reponse

{
  "response": {
    "status": 0,
    "settings": [
      {
        "id": "webconfig_id_1",
        "name": "Example Site",
        "urls": "https://example.com/",
        "includedUrls": ".*example\\.com.*",
        "excludedUrls": ".*\\.(pdf|zip)$",
        "includedDocUrls": "",
        "excludedDocUrls": "",
        "configParameter": "",
        "depth": 3,
        "maxAccessCount": 1000,
        "userAgent": "",
        "numOfThread": 1,
        "intervalTime": 1000,
        "boost": 1.0,
        "available": true,
        "sortOrder": 0
      }
    ],
    "total": 5
  }
}

Obtention d’une configuration de crawl Web

Requete

GET /api/admin/webconfig/setting/{id}

Reponse

{
  "response": {
    "status": 0,
    "setting": {
      "id": "webconfig_id_1",
      "name": "Example Site",
      "urls": "https://example.com/",
      "includedUrls": ".*example\\.com.*",
      "excludedUrls": ".*\\.(pdf|zip)$",
      "includedDocUrls": "",
      "excludedDocUrls": "",
      "configParameter": "",
      "depth": 3,
      "maxAccessCount": 1000,
      "userAgent": "",
      "numOfThread": 1,
      "intervalTime": 1000,
      "boost": 1.0,
      "available": true,
      "sortOrder": 0,
      "permissions": ["admin"],
      "virtualHosts": [],
      "labelTypeIds": []
    }
  }
}

Creation d’une configuration de crawl Web

Requete

POST /api/admin/webconfig/setting
Content-Type: application/json

Corps de la requete

{
  "name": "Corporate Site",
  "urls": "https://www.example.com/",
  "includedUrls": ".*www\\.example\\.com.*",
  "excludedUrls": ".*\\.(pdf|zip|exe)$",
  "depth": 5,
  "maxAccessCount": 5000,
  "numOfThread": 3,
  "intervalTime": 500,
  "boost": 1.0,
  "available": true,
  "permissions": ["admin", "user"],
  "labelTypeIds": ["label_id_1"]
}

Description des champs

Reponse

{
  "response": {
    "status": 0,
    "id": "new_webconfig_id",
    "created": true
  }
}

Mise a jour d’une configuration de crawl Web

Requete

PUT /api/admin/webconfig/setting
Content-Type: application/json

Corps de la requete

{
  "id": "existing_webconfig_id",
  "name": "Updated Corporate Site",
  "urls": "https://www.example.com/",
  "includedUrls": ".*www\\.example\\.com.*",
  "excludedUrls": ".*\\.(pdf|zip|exe|dmg)$",
  "depth": 10,
  "maxAccessCount": 10000,
  "numOfThread": 5,
  "intervalTime": 300,
  "boost": 1.2,
  "available": true,
  "versionNo": 1
}

Reponse

{
  "response": {
    "status": 0,
    "id": "existing_webconfig_id",
    "created": false
  }
}

Suppression d’une configuration de crawl Web

Requete

DELETE /api/admin/webconfig/setting/{id}

Reponse

{
  "response": {
    "status": 0,
    "id": "deleted_webconfig_id",
    "created": false
  }
}

Exemples de patterns d’URL

includedUrls / excludedUrls

Pattern Description
.*example\\.com.* Toutes les URLs contenant example.com
https://example\\.com/docs/.* Uniquement sous /docs/
.*\\.(pdf|doc|docx)$ Fichiers PDF, DOC, DOCX
.*\\?.* URLs avec parametres de requete
.*/(login|logout|admin)/.* URLs contenant certains chemins

Exemples d’utilisation

Configuration de crawl pour un site d’entreprise

curl -X POST "http://localhost:8080/api/admin/webconfig/setting" \
     -H "Authorization: Bearer YOUR_TOKEN" \
     -H "Content-Type: application/json" \
     -d '{
       "name": "Corporate Website",
       "urls": "https://www.example.com/",
       "includedUrls": ".*www\\.example\\.com.*",
       "excludedUrls": ".*/(login|admin|api)/.*",
       "depth": 5,
       "maxAccessCount": 10000,
       "numOfThread": 3,
       "intervalTime": 500,
       "available": true,
       "permissions": ["guest"]
     }'

Configuration de crawl pour un site de documentation

curl -X POST "http://localhost:8080/api/admin/webconfig/setting" \
     -H "Authorization: Bearer YOUR_TOKEN" \
     -H "Content-Type: application/json" \
     -d '{
       "name": "Documentation Site",
       "urls": "https://docs.example.com/",
       "includedUrls": ".*docs\\.example\\.com.*",
       "excludedUrls": "",
       "includedDocUrls": ".*\\.(html|htm)$",
       "depth": -1,
       "maxAccessCount": 50000,
       "numOfThread": 5,
       "intervalTime": 200,
       "boost": 1.5,
       "available": true,
       "labelTypeIds": ["documentation_label_id"]
     }'

Informations complementaires