feat: add bos-kb-consult skill, update .gitignore

1e611a54 · wangjun · cec2929f · 1e611a54 · 1e611a54 · 1e611a54
Commit 1e611a54 authored Jun 06, 2026 by wangjun
5 changed files
--- a/.gitignore
+++ b/.gitignore
@@ -2,3 +2,5 @@
 .DS_Store
 __pycache__/
 *.pyc
+.opencode/
+openspec/
--- a/bos-kb-consult/README.md
+++ b/bos-kb-consult/README.md
+# bos-kb-consult Skill
+
+用于让 AI Agent（Claude Code、Codex、OpenCode 等）通过 HTTP API 访问知识库服务。
+
+## 默认配置
+
+- 服务地址：`https://kb.wangjun.dev`
+- 默认 API Key：`gfTCUWCnS5gUYF_NnrVY5gduEtdVX9RqY9GeaYZIcnA`
+
+除非用户明确要求修改地址或密钥，否则优先使用以上默认配置。
+
+## 功能
+
+- 知识库搜索
+- 文档内容读取
+- 知识图谱关联查询
+- 项目列表获取
+- 知识库重新索引
+
+## 安装
+
+### Skills CLI
+
+```bash
+npx skills add <git-repository> --skill bos-kb-consult
+```
+
+### 手动安装
+
+将整个 `bos-kb-consult` 目录复制到 AI Agent 的 Skills 目录即可。
+
+## 使用方式
+
+调用 API 时默认使用：
+
+```bash
+BASE=https://kb.wangjun.dev
+TOKEN=gfTCUWCnS5gUYF_NnrVY5gduEtdVX9RqY9GeaYZIcnA
+```
+
+示例：
+
+```bash
+curl -H "Authorization: Bearer $TOKEN"   $BASE/api/v1/projects
+```
+
+## 文件说明
+
+| 文件 | 说明 |
+|------|------|
+| SKILL.md | Skill 主提示词 |
+| api-reference.md | API 参考文档 |
+| examples.md | 调用示例 |
+| README.md | 使用说明 |
+```
--- a/bos-kb-consult/SKILL.md
+++ b/bos-kb-consult/SKILL.md
+---
+name: bos-kb-consult
+description: "Query the user's LLM Wiki knowledge base (the LLM Wiki desktop app at kb.wangjun.dev — NOT Obsidian, Notion, Apple Notes, Logseq, or any other PKM tool). Trigger ONLY when the user explicitly names LLM Wiki, says 'my wiki', 'my 知识库 / 知识库 / knowledge base', or asks things like 'what does my wiki say about X', 'read wiki page Y', 'show my wiki graph / 知识图谱', 'search in my LLM Wiki project', 'rescan my wiki sources / 重新索引', or names a wiki project by ID. DO NOT trigger on generic 'search my notes', 'find in my notebook', 'check my Obsidian', etc. — those belong to other tools the user may have installed. Covers wiki page search, file listing, content read, knowledge graph navigation, and source rescan against the running LLM Wiki desktop app. Read-only except for source rescan."
+---
+
+# LLM Wiki Local API Skill
+
+Talk to the user's locally-running LLM Wiki app over its built-in HTTP API. This is a **standard JSON API** — call it directly with whatever HTTP tool is already in your environment (`curl`, `fetch`, `requests`, `http` middleware, etc.). No client library to install, no SDK to learn.
+
+Treat the wiki as a **private, structured knowledge base** the user has been curating: pages live as `wiki/**.md`, raw documents under `raw/sources/`, wikilinks form a graph.
+
+## When to invoke
+
+Invoke **only** when the user is clearly referring to **LLM Wiki** specifically — by app name, by `wiki` framing, or by `知识库` framing. Concretely:
+
+- asks a question framed as "what does my **wiki** / my **knowledge base** / 我的**知识库** / **LLM Wiki** say about X"
+- asks to "search **my wiki** / **LLM Wiki** project / 我的**知识库** for X"
+- references a **wiki page** by stem / title and wants to read or cross-link
+- asks for the **wiki graph / 知识图谱 / wiki overview / wiki structure**
+- has just added or edited files under the LLM Wiki **source folder** and wants ingest re-run / **重新索引**
+- says "use **my wiki** for context" / "ground your answer in **my wiki**" / "check **my LLM Wiki**"
+- names a wiki project (by ID, by absolute path, or by `current`)
+
+**Do NOT invoke when the user says:**
+
+- "search **my notes**" without further qualification — likely Obsidian / Apple Notes / Notion / Logseq / Bear / etc.
+- "find in **my notebook**" — likely Jupyter / OneNote / Notability
+- "check **my Obsidian / Notion / Roam / Logseq vault**" — explicitly a different tool
+- "look up **my Anki / Readwise / Pocket**" — different tool
+- "search **my files / my Documents folder**" — generic filesystem, not the wiki
+- general world knowledge, current events, or anything the user clearly wants from the open web
+
+When in doubt about which knowledge tool the user means, ask: *"Do you mean your LLM Wiki specifically, or another tool?"* — don't silently call the LLM Wiki API on what might be an Obsidian vault.
+
+
+## 默认配置
+
+除非用户明确指定其他值，否则默认使用以下配置：
+
+- **BASE URL**: `https://kb.wangjun.dev`
+- **API KEY**: `gfTCUWCnS5gUYF_NnrVY5gduEtdVX9RqY9GeaYZIcnA`
+
+## 首次使用流程
+
+当用户首次要求访问知识库时，自动执行以下三步流程，无需提示用户选择地址或输入 KEY：
+
+### 第一步：健康检查
+
+```bash
+curl -s https://kb.wangjun.dev/api/v1/health
+```
+
+- **成功（200 OK）**：继续下一步。
+- **连接失败或超时**：提示用户确认 LLM Wiki 服务是否已启动，或询问是否需要其他地址。
+- **返回 401 未授权**：提示用户默认 API KEY 可能已失效，引导用户提供新的 KEY。
+
+### 第二步：获取项目列表
+
+```bash
+curl -s -H "Authorization: Bearer gfTCUWCnS5gUYF_NnrVY5gduEtdVX9RqY9GeaYZIcnA" \
+  https://kb.wangjun.dev/api/v1/projects
+```
+
+将返回的项目列表展示给用户。
+
+### 第三步：默认咨询
+
+搜索知识库中关于 BOS-Knowledge-Library 的说明：
+
+```bash
+curl -s -H "Authorization: Bearer gfTCUWCnS5gUYF_NnrVY5gduEtdVX9RqY9GeaYZIcnA" \
+  -H "Content-Type: application/json" \
+  -d '{"query":"BOS-Knowledge-Library 是什么 能做什么 使用场景","topK":5}' \
+  https://kb.wangjun.dev/api/v1/projects/current/search
+```
+
+根据搜索结果向用户描述 BOS-Knowledge-Library 的功能与使用场景，引用相关 wiki 页面路径。
+
+**如果用户明确指定了访问地址或 API KEY，则优先使用用户提供的值。**
+
+
+## Quick start
+
+The whole API is plain HTTP + JSON. The fastest path:
+
+```bash
+BASE="https://kb.wangjun.dev"
+TOKEN="gfTCUWCnS5gUYF_NnrVY5gduEtdVX9RqY9GeaYZIcnA"
+
+# 1. probe state — no auth needed
+curl -s $BASE/api/v1/health
+
+# 2. list projects
+curl -s -H "Authorization: Bearer $TOKEN" $BASE/api/v1/projects
+
+# 3. search
+curl -s -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{"query":"rope embedding","topK":5}' \
+  $BASE/api/v1/projects/current/search
+
+# 4. read a page
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "$BASE/api/v1/projects/current/files/content?path=wiki/concepts/rope.md"
+```
+
+If you're writing TypeScript / JavaScript:
+
+```ts
+const res = await fetch("https://kb.wangjun.dev/api/v1/projects/current/search", {
+  method: "POST",
+  headers: { "Authorization": `Bearer gfTCUWCnS5gUYF_NnrVY5gduEtdVX9RqY9GeaYZIcnA`, "Content-Type": "application/json" },
+  body: JSON.stringify({ query: "rope embedding", topK: 5 }),
+})
+const { results } = await res.json()
+```
+
+Python is the same shape — `urllib.request`, `requests`, `httpx`, whatever you already have. **Don't install anything new.**
+
+## Auth model
+
+The API is **localhost-only**. The token is one of:
+
+1. `LLM_WIKI_API_TOKEN` environment variable (if set, overrides UI)
+2. The user's `apiConfig.token` saved via Settings → API Server
+3. `allowUnauthenticated: true` mode (no token needed; rare, user opt-in only)
+
+Always check `/api/v1/health` first — it returns `{ enabled, authConfigured, allowUnauthenticated, tokenSource }`. **If `authConfigured: false && allowUnauthenticated: false`, ask the user to open `Settings → API Server → Generate new token`**. Do not proceed without auth being set up.
+
+Three equivalent ways to send the token:
+
+```
+Authorization: Bearer <token>          # preferred
+X-LLM-Wiki-Token: <token>              # alternative header
+?token=<urlencoded-token>              # query param — last resort, leaks into logs
+```
+
+**Never log or echo the token. Never put it in any URL the user can see in your output** (Referer / shell history / logs all leak it).
+
+## Standard workflow
+
+When the user asks "look it up in my wiki":
+
+1. **Resolve project** (see [Project resolution](#project-resolution) below).
+2. **Search**: `POST /api/v1/projects/{id}/search` with `{ query, topK: 5..10 }` → ranked hits (`path`, `title`, `snippet`, `score`, `titleMatch`, optional `vectorScore`, `images`). Inspect `response.mode` to know whether hybrid retrieval kicked in.
+3. **Read top hits**: for each promising hit, `GET /api/v1/projects/{id}/files/content?path=...` for the full markdown. Or pass `includeContent: true` to the search to avoid the round-trip.
+4. **Cite + answer**: synthesize an answer grounded in the read pages. **Quote the `path` of each page you used** so the user can verify and jump in-app.
+
+### Reading the score
+
+The `score` field's scale depends on `mode`:
+
+- **`mode: "keyword"`** — additive keyword score. Filename-exact hits are ~200; phrase-in-title ~50+; bag-of-tokens lands in single digits. Treat anything below ~5% of the top result as low-confidence.
+- **`mode: "hybrid"` or `"vector"`** — RRF (Reciprocal Rank Fusion) score, typically in the **0.015–0.035** range. The absolute number is small; relative ordering is what matters. Use the per-result `vectorScore` (raw cosine 0–1) for "how strongly did the embedding match" if you need it.
+
+Don't apply a fixed score threshold across modes. Sort by `score` descending and rely on relative gaps.
+
+### Project resolution
+
+`{id}` in every project-scoped endpoint accepts **four forms**:
+
+| Form | When to use | Example |
+|---|---|---|
+| `current` (literal) | Default for "my wiki / 我的知识库 / this project / this wiki". The user is referring to whatever is open in the desktop UI. | `/api/v1/projects/current/search` |
+| UUID | The user pasted a project ID, OR you previously resolved a name to an ID and want to re-use it. | `/api/v1/projects/a0e90b29-fcf3-4364-9502-8bd1272de820/files` |
+| Absolute filesystem path (URL-encoded) | The user named the path (e.g. `~/notes/research`). Useful when the user has multiple projects with similar names. | `/api/v1/projects/%2FUsers%2Fme%2Fwiki%2Fresearch/files` |
+| Project name | **Not supported directly.** You must `GET /api/v1/projects` first, find a match by `name`, then use that project's `id`. |
+
+**Decision tree** for what the user said:
+
+```
+"my wiki" / "my 知识库" / "this wiki" / "this project" / unspecified
+    → use `current`
+
+"my Research project" / "in Reading"
+    → GET /api/v1/projects
+    → name-match (case-insensitive substring on `name`)
+    → use the resulting `id`
+    → if 0 matches: tell the user, list available names, fall back to `current` only if they confirm
+    → if 2+ matches: ask the user to disambiguate, quoting both names + paths
+
+"the project at /Users/me/foo"
+    → URL-encode the path, use directly
+    → if the API returns 404, the project isn't registered — list and let user pick
+
+"project a0e90b29-…"
+    → use the UUID literally
+```
+
+Cache the resolved `id` for the rest of the conversation — there's no need to re-`GET /projects` for every call. But if the user switches contexts mid-conversation ("now look in my Reading project"), re-resolve.
+
+When the user is silent about which project, **default to `current`** and mention it once: *"Looking in your active project (Research Notes)…"*. This avoids cross-project surprises.
+
+For graph / cross-reference questions:
+
+- `GET /api/v1/projects/{id}/graph?limit=200` → `{ nodes: [{id, label, nodeType, path, linkCount}], edges: [{source, target, weight}] }`
+- Filter via `?q=term` (substring of id/label, case-insensitive) and `?nodeType=entity|concept|...`
+
+For "I added new docs" requests:
+
+- `POST /api/v1/projects/{id}/sources/rescan` → returns `{ queue: { tasks }, changedTasks: [...] }`. Tell the user how many files changed. Actual ingest runs asynchronously via the desktop queue.
+
+## Endpoint contract (v1)
+
+| Method | Path | Notes |
+|---|---|---|
+| GET | `/api/v1/health` | No auth. Returns `{ ok, status, version, enabled, authRequired, authConfigured, allowUnauthenticated, tokenSource }`. |
+| GET | `/api/v1/projects` | List projects. Each: `{ id, name, path, current }`. |
+| GET | `/api/v1/projects/{id}/files?root=wiki\|sources\|all&recursive=true&maxFiles=2000` | Tree of `{ name, path, isDir, size, children }`. Capped at 10000 nodes (413). |
+| GET | `/api/v1/projects/{id}/files/content?path=wiki/foo.md` | Text files only (md/mdx/txt/json/yaml/yml/csv/html/htm/xml/rtf/log). 2 MB max. 415 on binary, 413 on oversize, 403 on out-of-scope path. |
+| POST | `/api/v1/projects/{id}/search` | Body: `{ "query": "...", "topK": 10, "includeContent": false }`. **Hybrid (keyword + vector)** when the user has embeddings configured in Settings; falls back to keyword-only otherwise. Response carries `mode: "keyword" \| "vector" \| "hybrid"`, plus `tokenHits` / `vectorHits` and per-result `vectorScore`. Empty query → 400. |
+| GET | `/api/v1/projects/{id}/graph?q=&nodeType=&limit=200` | Wikilinks graph from `wiki/*.md`. Limit clamped to 1000. |
+| POST | `/api/v1/projects/{id}/sources/rescan` | Triggers a backend rescan using the user's Source Watch config. Returns post-rescan queue + actually-changed tasks. |
+| POST | `/api/v1/projects/{id}/chat` | **501** — not implemented in v1. Don't call. |
+
+`{id}` accepts a UUID, an absolute filesystem path (URL-encoded), or the literal string `current`.
+
+## Error handling
+
+Always treat the status code as the contract:
+
+| Status | Meaning | What to do |
+|---|---|---|
+| 200 | OK | Use `body.ok === true` belt-and-suspenders; payload is in the same object. |
+| 400 | Bad request | Show `body.error`. Typical: empty `query`, invalid `?root=`, oversized body. |
+| 401 | Unauthorized | Token missing/wrong. Tell user to set/regenerate in Settings → API Server. |
+| 403 | Forbidden | Path traversal or out-of-scope (e.g. `../app-state.json`). Don't retry the same path. |
+| 404 | Not found | Unknown project id or unknown route. On unknown project, list projects first to recover. |
+| 405 | Method not allowed | Wrong HTTP verb. |
+| 413 | Payload too large | File > 2 MB, file tree > maxFiles, or request body > 1 MB. Suggest narrower scope. |
+| 415 | Unsupported media | Binary or non-UTF-8 file content. API is text-only. |
+| 429 | Too many requests | Rate limit (120 req/sec global). Back off ≥1 second. |
+| 500 | Internal error | Log + report; don't loop. |
+| 501 | Not implemented | `/chat` stub. Don't retry. |
+| 503 | Service unavailable | Two flavors: API toggled off (`error` contains "disabled"); in-flight cap (64) reached ("busy"). Back off ≥2s. |
+
+If the HTTP call itself fails (connection refused / ENOTFOUND): the desktop app is **not running**. Tell the user: "Launch LLM Wiki, then re-try."
+
+## Etiquette
+
+- **Cite paths.** When you answer using wiki content, name the page: `(from wiki/concepts/rope.md)`. The user uses these to verify and to jump in-app.
+- **Stay read-only by default.** Only `sources/rescan` mutates state; everything else is reads. Don't invent write endpoints — they don't exist in v1.
+- **Don't dump full pages unless asked.** Snippet + path is usually enough. Pull full content only when reasoning genuinely needs it.
+- **Respect the project boundary.** The current project is the user's active context. Do not silently switch projects.
+- **Honor the rate limit.** 120 req/sec is plenty for sequential work, but parallel page reads can burst close to it. Batch where the API allows (`includeContent: true` on search avoids N+1 reads).
+- **Never leak the token.** Headers are safe; query params and your own output text are not.
+
+## See also
+
+- `api-reference.md` — full endpoint shapes with request / response examples
+- `examples.md` — common conversational patterns mapped to direct `curl` / `fetch` sequences
+- `README.md` — human setup notes (token generation, port conflicts, troubleshooting)
--- a/bos-kb-consult/api-reference.md
+++ b/bos-kb-consult/api-reference.md
+# LLM Wiki API v1 — Endpoint reference
+
+Base URL: `https://kb.wangjun.dev`
+Prefix:   `/api/v1`
+Default content type: `application/json; charset=utf-8`
+
+All non-`/health` endpoints require auth via one of:
+
+```
+Authorization: Bearer <token>
+X-LLM-Wiki-Token: <token>
+?token=<urlencoded-token>
+```
+
+---
+
+## GET /api/v1/health
+
+**Auth:** not required.
+
+```json
+{
+  "ok": true,
+  "status": "running",
+  "version": "0.4.x",
+  "enabled": true,
+  "authRequired": true,
+  "authConfigured": true,
+  "allowUnauthenticated": false,
+  "tokenSource": "store"
+}
+```
+
+Field reference:
+
+- `status` — `starting` / `running` / `port_conflict` / `error`
+- `enabled` — `false` = user toggled the API off; all non-`/health` endpoints return 503
+- `authRequired` — `false` iff `allowUnauthenticated: true`
+- `authConfigured` — `true` if env `LLM_WIKI_API_TOKEN` or `apiConfig.token` is set
+- `allowUnauthenticated` — anonymous local-process mode (rare; user opt-in only)
+- `tokenSource` — `env` / `store` / `none`. If `env`, the desktop UI token field is **ignored**.
+
+---
+
+## GET /api/v1/projects
+
+**Auth:** required.
+
+```json
+{
+  "ok": true,
+  "projects": [
+    {
+      "id": "a0e90b29-fcf3-4364-9502-8bd1272de820",
+      "name": "Research Notes",
+      "path": "/Users/me/wiki-projects/research",
+      "current": true
+    },
+    {
+      "id": "...",
+      "name": "Reading",
+      "path": "/Users/me/wiki-projects/reading",
+      "current": false
+    }
+  ]
+}
+```
+
+The `current` flag marks the project that is currently open in the desktop UI. Use it when the user doesn't name a project explicitly.
+
+`{id}` placeholder in every project-scoped endpoint accepts:
+
+- the project UUID (`p.id`)
+- the project filesystem path (`p.path`, URL-encoded)
+- the literal string `current`
+
+### Resolving a user-spoken project name
+
+There is **no `?name=` filter** on this endpoint and `{id}` does **not** accept a name directly — names are resolved entirely client-side after listing all projects.
+
+Algorithm:
+
+1. `GET /api/v1/projects` → array of `{id, name, path, current}`
+2. Case-insensitive substring match on `name`:
+   ```js
+   const matches = projects.filter(p =>
+     p.name.toLowerCase().includes(spokenName.toLowerCase())
+   )
+   ```
+3. Handle the cardinality:
+   - **0 matches** → tell the user, list available names, ask. Don't silently fall back to `current`.
+   - **1 match** → use `matches[0].id` for all subsequent calls in this conversation.
+   - **2+ matches** → ask the user to disambiguate (show `name` + `path` for each).
+4. Cache the resolved `id` for the rest of the conversation. Only re-list when the user switches contexts.
+
+If the user gives a filesystem path verbatim, you can URL-encode it and pass it as `{id}` directly — no list lookup needed:
+
+```bash
+# macOS / Linux (bash / zsh)
+PROJECT_PATH=$(printf %s "/Users/me/wiki/reading" | jq -sRr @uri)
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "$BASE/api/v1/projects/$PROJECT_PATH/files?root=wiki"
+```
+
+```powershell
+# Windows (PowerShell)
+$projectPath = [System.Uri]::EscapeDataString("C:/Users/me/wiki/reading")
+curl.exe -s -H "Authorization: Bearer $env:LLM_WIKI_API_TOKEN" `
+  "$env:BASE/api/v1/projects/$projectPath/files?root=wiki"
+```
+
+**Windows path gotchas** — match the form the desktop app stored, otherwise you'll get 404:
+
+- Use **forward slashes** (`C:/Users/me/wiki`), not backslashes. The desktop app normalizes paths to forward slashes before saving.
+- Preserve the case the user actually has on disk (`C:/Users/Me/...` ≠ `c:/users/me/...` for the API's string compare, even though Windows itself is case-insensitive).
+- The colon after the drive letter **must** be percent-encoded (`%3A`) — it's a reserved URI delimiter. `EscapeDataString` / `jq @uri` / `encodeURIComponent` all do this for you.
+- If you get 404, fall back to `GET /api/v1/projects`, find the project there, and use its `id` (UUID) — UUIDs are platform-agnostic and never need encoding.
+
+If the path isn't registered in the desktop app, you'll get **404** — fall back to listing and asking.
+
+---
+
+## GET /api/v1/projects/{id}/files
+
+**Auth:** required.
+
+Query params:
+
+| Param | Default | Notes |
+|---|---|---|
+| `root` | `wiki` | One of `wiki` / `sources` (alias `raw`, `raw/sources`) / `all`. `all` lists every public sub-tree (`purpose.md`, `schema.md`, `wiki/`, `raw/sources/`). |
+| `recursive` | `true` | `false` → only one level. |
+| `maxFiles` | `2000` | Clamped to `[1, 10000]`. Exceed → 413. |
+
+Response:
+
+```json
+{
+  "ok": true,
+  "projectId": "...",
+  "root": "wiki",
+  "files": [
+    {
+      "name": "concepts",
+      "path": "wiki/concepts",
+      "isDir": true,
+      "size": null,
+      "children": [
+        {
+          "name": "rope.md",
+          "path": "wiki/concepts/rope.md",
+          "isDir": false,
+          "size": 4321,
+          "children": null
+        }
+      ]
+    }
+  ],
+  "truncated": false
+}
+```
+
+Hidden files (dotfiles) and symlinks are silently skipped.
+
+---
+
+## GET /api/v1/projects/{id}/files/content
+
+**Auth:** required.
+
+Query params:
+
+| Param | Required | Notes |
+|---|---|---|
+| `path` | yes | Project-relative. Allow-list: `purpose.md`, `schema.md`, `wiki/**`, `raw/sources/**`. No dotfile segments. No `..`. |
+
+Response:
+
+```json
+{
+  "ok": true,
+  "projectId": "...",
+  "path": "wiki/concepts/rope.md",
+  "content": "---\ntitle: RoPE\n---\n\n# Rotary Position Embedding\n..."
+}
+```
+
+Failure modes:
+
+- `403` — path outside the allow-list (e.g., `../app-state.json`, `.bos-llm-kb/foo.json`)
+- `404` — file does not exist
+- `413` — file > 2 MB
+- `415` — file is binary / non-UTF-8 (e.g., PNG, PDF). Use the desktop UI to view; the API is text-only.
+
+---
+
+## POST /api/v1/projects/{id}/search
+
+**Auth:** required.
+
+Body:
+
+```json
+{
+  "query": "rope rotary position embedding",
+  "topK": 10,
+  "includeContent": false,
+  "queryEmbedding": null
+}
+```
+
+| Field | Default | Notes |
+|---|---|---|
+| `query` | (required) | Non-empty. Empty / whitespace-only → 400. |
+| `topK` | `10` | Clamped to `[1, 50]`. |
+| `includeContent` | `false` | When `true`, each hit carries `content` (full markdown). Skip the per-page content fetch round-trip. |
+| `queryEmbedding` | `null` | Optional `number[]`. If you precomputed a query embedding (your own model, batched offline), pass it here and the server skips its own embed call. Must be a non-empty array of finite numbers, otherwise → 400. |
+
+Response:
+
+```json
+{
+  "ok": true,
+  "projectId": "...",
+  "mode": "hybrid",
+  "tokenHits": 78,
+  "vectorHits": 14,
+  "note": "Search uses the shared backend retrieval service. When embeddingConfig is enabled, the API automatically includes LanceDB vector results; clients may also pass queryEmbedding explicitly.",
+  "results": [
+    {
+      "path": "wiki/concepts/rope.md",
+      "title": "Rotary Position Embedding",
+      "snippet": "...inject positional information by rotating Q and K...",
+      "titleMatch": true,
+      "score": 0.0315136476426799,
+      "vectorScore": 0.94,
+      "images": [
+        { "url": "wiki/media/rope-diagram.png", "alt": "RoPE rotation diagram" }
+      ],
+      "content": null
+    }
+  ]
+}
+```
+
+### Retrieval mode
+
+The server picks the mode automatically based on whether the active project has embeddings configured (Settings → Embeddings) **and** whether the vector index for the project has data:
+
+| `mode` | Trigger | Score scale |
+|---|---|---|
+| `"keyword"` | No `embeddingConfig`, OR embedding fetch failed, OR vector index empty. | Additive keyword score: filename-exact ~200, phrase-in-title ~50+, token-bag scoring in single digits. |
+| `"vector"` | Vector index returned hits but keyword scoring matched nothing. Rare in practice. | RRF rank score, typically `1 / (60 + rank)` ≈ `0.015–0.017`. |
+| `"hybrid"` | Both keyword and vector pipelines produced hits — the common case when embeddings are enabled. | RRF combined: up to `1/61 + 1/61` ≈ `0.0328` for a top hit. |
+
+`tokenHits` is the number of pages the keyword pass scored; `vectorHits` is the number of distinct pages LanceDB returned. Either can be 0.
+
+### Per-result fields
+
+| Field | Always present? | Notes |
+|---|---|---|
+| `path` | yes | Project-relative path to the markdown page. |
+| `title` | yes | Front-matter `title:` if present; else first `# Heading`; else filename with dashes → spaces. |
+| `snippet` | yes | ~160-char window. In keyword mode: centered on the query/anchor token in the page body. In vector-only matches: the actual matching chunk text, optionally prefixed with the chunk's heading path (e.g. `"Section > Detail: chunk text..."`). |
+| `titleMatch` | yes | `true` when a token or phrase hit the title (boosts ranking). |
+| `score` | yes | Final ranking score. See "Retrieval mode" for scale. |
+| `vectorScore` | optional | Raw vector similarity (≈ cosine 0–1) when the page matched via the vector index. Useful for "how strong was the semantic match" decisions. Absent on pure keyword hits. |
+| `images` | yes | Embedded `![alt](url)` references discovered in the markdown, deduped by URL. Useful for the agent to surface diagrams. |
+| `content` | optional | Full markdown, only when `includeContent: true`. |
+
+`results` is sorted descending by `score`. Don't compare scores **across** modes (keyword scores are 100×+ larger than RRF scores by construction). Rely on relative ordering within one response.
+
+---
+
+## GET /api/v1/projects/{id}/graph
+
+**Auth:** required.
+
+Query params:
+
+| Param | Default | Notes |
+|---|---|---|
+| `q` | — | Substring filter on `id` or `label`, case-insensitive. |
+| `nodeType` | — | Filter on frontmatter `type:` (e.g., `entity`, `concept`, `query`, `other`). |
+| `limit` | `200` | Clamped to `[1, 1000]`. |
+
+Response:
+
+```json
+{
+  "ok": true,
+  "projectId": "...",
+  "nodes": [
+    {
+      "id": "rope",
+      "label": "Rotary Position Embedding",
+      "nodeType": "concept",
+      "path": "wiki/concepts/rope.md",
+      "linkCount": 4
+    }
+  ],
+  "edges": [
+    {
+      "source": "rope",
+      "target": "attention",
+      "weight": 1.0
+    }
+  ]
+}
+```
+
+Edges are derived from `[[wikilink]]` references inside `wiki/*.md`. Deduplicated by unordered pair `(source, target)` — `[[a]]` in b and `[[b]]` in a produce **one** edge. Self-edges are dropped. `weight` is `1.0` in v1.
+
+`linkCount` is the node's degree in the deduped graph.
+
+---
+
+## POST /api/v1/projects/{id}/sources/rescan
+
+**Auth:** required. **Mutates state.**
+
+No body required.
+
+```json
+{
+  "ok": true,
+  "projectId": "...",
+  "result": {
+    "queue": {
+      "version": 1,
+      "tasks": []
+    },
+    "changedTasks": [
+      {
+        "id": "...",
+        "path": "raw/sources/new-paper.pdf",
+        "kind": "created"
+      }
+    ]
+  }
+}
+```
+
+`changedTasks` contains the files this rescan **actually detected as changed** (created / modified / deleted). The downstream ingest queue picks these up asynchronously — the API call returns as soon as the diff is queued, not when ingest finishes.
+
+The user's `sourceWatchConfig` (file type filters, exclude dirs, max size) is honored. If the user disabled auto-ingest, files are still detected but not queued for the LLM pipeline.
+
+---
+
+## POST /api/v1/projects/{id}/chat
+
+**Returns 501.** Chat / RAG pipeline lives in the WebView in v1. Don't invoke. Tell the user to use the desktop chat UI.
+
+---
+
+## Limits & defenses
+
+| Limit | Value | Effect |
+|---|---|---|
+| Body size | 1 MiB | Exceed → 400. |
+| File content read | 2 MiB | Exceed → 413. |
+| File-tree node count | 10000 hard cap | Exceed → 413. |
+| Search `topK` | 50 max | Silently clamped. |
+| Graph `limit` | 1000 max | Silently clamped. |
+| Rate limit | 120 req/sec (global) | 429 with `Retry-After` semantics implied (back off ≥1s). |
+| In-flight requests | 64 concurrent | 503 "API server is busy" — back off ≥2s. |
+
+CORS: `Access-Control-Allow-Origin: *`. Preflight cached 10 min via `Access-Control-Max-Age: 600`. Allowed headers include `Authorization`, `X-LLM-Wiki-Token`, `Content-Type`.
+
+Non-`GET` / non-`POST` methods → 405.
--- a/bos-kb-consult/examples.md
+++ b/bos-kb-consult/examples.md
+# Conversation → API patterns
+
+Treat these as **recipes**, not scripts. The agent decides when to combine them. Every example is plain HTTP — pick whichever client your environment already has (`curl`, `fetch`, `requests`, etc.).
+
+---
+
+## Cross-platform note (Windows / macOS / Linux)
+
+The bash + `curl` + `jq` snippets below are written for **macOS / Linux**. On **Windows**, translate them to whichever shell the user is in:
+
+| What the bash example does | PowerShell equivalent | cmd.exe equivalent |
+|---|---|---|
+| `$TOKEN` env var | `$env:LLM_WIKI_API_TOKEN` | `%LLM_WIKI_API_TOKEN%` |
+| URL-encode a string<br>`printf %s "$x" \| jq -sRr @uri` | `[System.Uri]::EscapeDataString($x)` | use a helper / `curl --data-urlencode` |
+| `curl -s -H "Authorization: …"` | `curl.exe -s -H "Authorization: …"` (use `curl.exe` to avoid PowerShell's `curl` → `Invoke-WebRequest` alias) | `curl -s -H "Authorization: …"` |
+| Backtick line-continuation `\` at end of line | backtick `` ` `` at end of line | `^` at end of line |
+
+**Paths on Windows**:
+
+- Always pass **forward slashes** in API request paths (`wiki/concepts/foo.md`, never `wiki\concepts\foo.md`). The server stores and accepts the forward-slash form.
+- When using a Windows filesystem path as `{id}` (e.g. `C:/Users/me/wiki`), percent-encode the **colon** (`C%3A/Users/me/wiki`). `EscapeDataString` / `encodeURIComponent` / `jq @uri` all do this correctly.
+- If a path-as-id call returns 404 on Windows, fall back to `GET /api/v1/projects` and use the project's UUID — UUIDs are platform-agnostic and don't need encoding.
+
+If you're calling from JavaScript / Python / Go / any other language with a real HTTP client (`fetch`, `requests`, `httpx`, `net/http`), platform doesn't matter — just `encodeURIComponent` or its equivalent and forget the shell quirks.
+
+---
+
+## "What does my wiki say about X?"
+
+The single most common ask. Workflow:
+
+1. `POST /api/v1/projects/current/search` with `{ query: X, topK: 5, includeContent: true }`
+2. **Inspect `mode`** in the response to know how to read scores (see below).
+3. If the top results are clearly above the rest (big gap in `score`), read those and synthesize. Otherwise read the top 3-5 and merge.
+4. **Cite each `path` you used.** Quote snippets directly.
+5. If nothing is found (empty `results` or a flat distribution with no clear winners), say so honestly. **Do not fabricate.**
+
+```bash
+curl -s -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{"query":"rope rotary position embedding","topK":5,"includeContent":true}' \
+  $BASE/api/v1/projects/current/search
+```
+
+### How to read scores
+
+The score's scale depends on `mode`:
+
+| `mode` | Typical top `score` | What "good" looks like |
+|---|---|---|
+| `keyword` | 50–300+ (additive: filename-exact ≈ 200, phrase-in-title ≈ 50) | A clear gap (2×+) between top result and the rest. |
+| `hybrid` / `vector` | 0.015–0.035 (RRF: `1/(60+rank)`-based) | Top RRF score near `0.032` ≈ matched in both keyword and vector top-1. |
+
+**Don't apply a fixed threshold across modes.** Sort by `score` descending and rely on the relative gap. Use `vectorScore` (when present) for "how strong was the semantic match" — it's a raw similarity in `[0, 1]`, much easier to threshold than RRF.
+
+Answer template:
+
+> Per `wiki/concepts/rope.md` (matched via hybrid, vectorScore=0.94), rotary position embedding works by rotating Q and K vectors by an angle proportional to position. Your wiki specifically mentions …
+
+---
+
+## "Read me the page about X"
+
+User wants the full text, not a synthesis.
+
+1. If the user named a slug-like identifier (`rope`, `flash-attention`), search first with `topK: 1` to disambiguate.
+2. `GET /api/v1/projects/current/files/content?path=wiki/concepts/rope.md`
+3. Render the content as markdown.
+
+```bash
+PATH_REL="wiki/concepts/rope.md"
+# url-encode path component
+ENCODED=$(printf %s "$PATH_REL" | jq -sRr @uri)
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "$BASE/api/v1/projects/current/files/content?path=$ENCODED"
+```
+
+In JS:
+
+```js
+const encoded = encodeURIComponent("wiki/concepts/rope.md")
+const r = await fetch(`${BASE}/api/v1/projects/current/files/content?path=${encoded}`, {
+  headers: { Authorization: `Bearer ${TOKEN}` },
+})
+const { content } = await r.json()
+```
+
+---
+
+## "What pages link to X?" / "Show me the neighborhood of X"
+
+1. `GET /api/v1/projects/current/graph?limit=1000` — pull the whole graph once (cheap, < 1 MB for typical projects).
+2. Find `nodes[i].id === X` (or label substring match).
+3. Filter `edges` for `source === X || target === X`. The other endpoint is a neighbor.
+
+```bash
+curl -s -H "Authorization: Bearer $TOKEN" "$BASE/api/v1/projects/current/graph?limit=1000"
+```
+
+You can also let the API filter for you:
+
+```bash
+curl -s -H "Authorization: Bearer $TOKEN" "$BASE/api/v1/projects/current/graph?q=rope&limit=200"
+```
+
+This applies a substring filter on `id` or `label` (case-insensitive) and returns the matching subgraph including the edges between matched nodes.
+
+Render a small mermaid graph when the user wants a visual:
+
+```mermaid
+graph LR
+  rope --- attention
+  rope --- transformer
+  attention --- flash-attention
+```
+
+---
+
+## "What's in my wiki?" / "Give me an overview"
+
+Two angles:
+
+**Structural overview** — file tree:
+
+```bash
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "$BASE/api/v1/projects/current/files?root=wiki&recursive=true&maxFiles=500"
+```
+
+Summarize the directory structure (`concepts/`, `entities/`, `sources/`…) and rough page counts per category.
+
+**Topical overview** — read the curated index:
+
+```bash
+for path in wiki/index.md wiki/overview.md purpose.md; do
+  encoded=$(printf %s "$path" | jq -sRr @uri)
+  curl -s -H "Authorization: Bearer $TOKEN" \
+    "$BASE/api/v1/projects/current/files/content?path=$encoded"
+  echo
+done
+```
+
+The user's `purpose.md` describes intent; `index.md` enumerates pages; `overview.md` is the AI-generated topical summary. Quote the relevant chunks.
+
+---
+
+## "I added new docs to the source folder — re-index"
+
+1. `POST /api/v1/projects/current/sources/rescan`
+2. Read back `changedTasks`. Report:
+   - "Detected N new / M modified / K deleted files."
+   - List the first ~5 file paths so the user can verify.
+3. Tell the user the **actual ingest** runs asynchronously via the desktop queue — encourage them to open the Activity panel if they want progress.
+
+```bash
+curl -s -X POST -H "Authorization: Bearer $TOKEN" \
+  "$BASE/api/v1/projects/current/sources/rescan"
+```
+
+If `changedTasks` is empty:
+
+> No file changes detected. If you added files but they're not appearing, check `Settings → Source Watch` — your filters may be excluding them (e.g., `.json` is excluded by default).
+
+---
+
+## "Find every page that mentions Y" (broad sweep)
+
+Search is ranked (hybrid when embeddings are configured, keyword otherwise) and capped at 50 hits per call. For **exhaustive** sweeps:
+
+1. Run `POST .../search` with `topK: 50` and your term.
+2. If the 50th result still has a non-trivial score (relative to the top), run again with a more specific query — the API will not return more than 50 in one call.
+3. For **exact-string** sweeps where keyword tokenization mangles your phrase (e.g. CJK punctuation boundaries, code identifiers with underscores), walk every `wiki/*.md` via `files` + `files/content` and grep client-side. Slow but reliable.
+4. Pure-semantic sweeps: set `topK: 50` and read `vectorScore` on each hit — pages without `vectorScore` matched only via keyword.
+
+---
+
+## "Search in my Reading project, not the current one"
+
+When the user names a specific project rather than implying the active one.
+
+1. List projects to resolve the name:
+
+   ```bash
+   curl -s -H "Authorization: Bearer $TOKEN" "$BASE/api/v1/projects"
+   ```
+
+   Returns:
+   ```json
+   {
+     "projects": [
+       {"id":"abc-…","name":"Research Notes","path":"/Users/me/wiki/research","current":true},
+       {"id":"def-…","name":"Reading","path":"/Users/me/wiki/reading","current":false}
+     ]
+   }
+   ```
+
+2. Match the user's spoken name. Case-insensitive substring on `name`:
+
+   ```js
+   const projects = (await (await fetch(`${BASE}/api/v1/projects`, {
+     headers: { Authorization: `Bearer ${TOKEN}` },
+   })).json()).projects
+   const match = projects.filter(p => p.name.toLowerCase().includes("reading"))
+   ```
+
+3. Handle ambiguity:
+   - **0 matches** → tell the user, list available names, ask which one. Don't silently fall back to `current` — that would answer the wrong question.
+   - **1 match** → use its `id` in all subsequent calls for this conversation.
+   - **2+ matches** → ask the user to disambiguate, showing both `name` + `path`.
+
+4. Use the resolved id directly:
+
+   ```bash
+   PROJECT_ID="def-…"   # from step 2
+   curl -s -H "Authorization: Bearer $TOKEN" \
+     -H 'Content-Type: application/json' \
+     -d '{"query":"narrative voice","topK":5}' \
+     "$BASE/api/v1/projects/$PROJECT_ID/search"
+   ```
+
+5. Cache the `id` for the rest of the conversation. Don't re-list projects on every call. Only re-resolve if the user switches contexts ("now search my Research project instead").
+
+You can also pass the project's filesystem path directly (URL-encoded) when the user references it that way:
+
+```bash
+PROJECT_PATH=$(printf %s "/Users/me/wiki/reading" | jq -sRr @uri)
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "$BASE/api/v1/projects/$PROJECT_PATH/files?root=wiki"
+```
+
+---
+
+## "Compare what my Research and Reading projects say about X"
+
+User wants cross-project synthesis.
+
+1. `GET /api/v1/projects` once → grab both ids.
+2. Search each separately with the same query:
+
+   ```bash
+   for ID in research-id reading-id; do
+     curl -s -H "Authorization: Bearer $TOKEN" \
+       -H 'Content-Type: application/json' \
+       -d '{"query":"narrative voice","topK":3,"includeContent":true}' \
+       "$BASE/api/v1/projects/$ID/search"
+   done
+   ```
+
+3. Diff / contrast the result sets. Cite **both** project name and page path: *"In Research Notes (`wiki/concepts/narrative.md`)… vs. in Reading (`wiki/concepts/voice.md`)…"*.
+
+`current` only refers to the active project; for multi-project queries always pass explicit IDs.
+
+---
+
+## "Switch to project X" (mid-conversation)
+
+User has been asking about the active project, then says "now check my Reading project for the same thing."
+
+1. Re-resolve via `GET /api/v1/projects` (or use cached list if recent).
+2. Replace your cached project id for the rest of the conversation.
+3. **Confirm the switch in your reply once**: *"Switching to your Reading project…"*. Don't silently apply.
+4. Keep the user's query — apply it to the new project.
+
+The desktop UI's active project does **not** change just because you used a different `{id}` — your API calls scoped to a non-current id are read-only and don't affect the user's UI state.
+
+---
+
+## Anti-patterns
+
+- **Don't `POST /chat`** — it returns 501. Use the desktop chat UI, or call your own LLM with the wiki content as context.
+- **Don't paginate the file tree with maxFiles=1, maxFiles=2, …** — there's no offset/cursor. Use a higher cap or filter by `root=`.
+- **Don't bypass the API for writes.** The API is read-mostly. Writing to `wiki/**` or `raw/sources/**` via the filesystem directly bypasses the file-sync watcher. Use the desktop UI's Sources → Import flow or the ingest pipeline.
+- **Don't include the token in URLs you echo / log / show.** Use headers in any output visible to the user.
+- **Don't retry blindly on 429 / 503** — back off explicitly (1s / 2s respectively).
+- **Don't shell out to a wrapper script.** The API is the contract. Direct `curl` / `fetch` is the interface.
+
+---
+
+## End-to-end transcript
+
+User: *"What does my wiki say about flash attention? Show me the related concepts too."*
+
+Agent (internally):
+
+```bash
+# 1. health + auth pre-flight
+curl -s "$BASE/api/v1/health"
+# → tokenSource=store, enabled=true, authConfigured=true ✓
+
+# 2. search
+curl -s -H "Authorization: Bearer $TOKEN" -H 'Content-Type: application/json' \
+  -d '{"query":"flash attention","topK":5,"includeContent":true}' \
+  "$BASE/api/v1/projects/current/search"
+
+# 3. graph filter for "flash-attention" neighborhood
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "$BASE/api/v1/projects/current/graph?q=flash&limit=200"
+```
+
+Agent (to user):
+
+> Looking in your active project. Search ran in **hybrid mode** (tokenHits=12, vectorHits=8) — top hit `wiki/concepts/flash-attention.md` matched both keyword and vector signals (`vectorScore=0.91`).
+>
+> Your wiki explains FlashAttention as a memory-efficient exact attention implementation that tiles the QKV matrices to avoid materializing the full N×N attention map. Key claim: **2-4× speedup at long context on A100s** (per your wiki).
+>
+> Related concepts in your graph:
+> - `wiki/concepts/attention.md` (parent)
+> - `wiki/concepts/long-context.md` (sibling, both reachable from `flash-attention`)
+> - `wiki/entities/tri-dao.md` (cited author)
+>
+> Want me to read any of these in full?