Cherry Studio 客户端优化

2025-12-10 17:40:43 +08:00
parent 0840f35408
commit 95722c97e4
10 changed files with 1515 additions and 69 deletions
--- a/docs/CHERRY_STUDIO_OPTIMIZATION.md
+++ b/docs/CHERRY_STUDIO_OPTIMIZATION.md
@@ -0,0 +1,354 @@
+# Cherry Studio 客户端优化
+
+本文档说明针对 Cherry Studio 客户端的专属优化功能。
+
+## 优化内容
+
+### 1. 客户端名称日志显示
+
+**功能描述**：
+- 从请求头 `x-title` 中提取客户端名称
+- 在日志中显示客户端信息，便于追踪和调试
+- 支持任何设置 `x-title` 头的客户端，不限于 Cherry Studio
+
+**日志格式**：
+```
+2025-12-10 15:09:17 - api.routers.chat - INFO - Chat completion request for model: google.gemini-2.5-pro, client: Cherry Studio
+```
+
+**实现位置**：
+- [src/api/routers/chat.py](../src/api/routers/chat.py#L295-L296)
+
+**使用示例**：
+```bash
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-oci-genai-default-key" \
+  -H "x-title: Cherry Studio" \
+  -d '{
+    "model": "google.gemini-2.5-pro",
+    "messages": [{"role": "user", "content": "Hello"}]
+  }'
+```
+
+### 2. thinking_budget 到 reasoning_effort 的自动映射
+
+**功能描述**：
+- Cherry Studio 使用 Google Gemini 的 `thinking_budget` 参数控制推理深度
+- 网关自动将 `thinking_budget` 映射到 OCI SDK 的 `reasoning_effort` 参数
+- 支持 meta、xai、google、openai 提供商的模型（不支持 Cohere）
+- 对其他客户端透明，不影响标准 OpenAI API 兼容性
+
+**映射规则**：
+
+| thinking_budget 值 | reasoning_effort | 说明 |
+|-------------------|------------------|------|
+| ≤ 1760 | `low` | 快速响应，较少推理 |
+| 1760 < X ≤ 16448 | `medium` | 平衡速度和推理深度 |
+| > 16448 | `high` | 深度推理，更完整的答案 |
+| -1 | None | 使用模型默认值 |
+
+**extra_body 结构**：
+
+Cherry Studio 通过 `extra_body` 传递 Google Gemini 特定的配置：
+
+```json
+{
+  "model": "google.gemini-2.5-pro",
+  "messages": [...],
+  "extra_body": {
+    "google": {
+      "thinking_config": {
+        "thinking_budget": 1760,
+        "include_thoughts": true
+      }
+    }
+  }
+}
+```
+
+**实现位置**：
+- 映射函数: [src/api/routers/chat.py](../src/api/routers/chat.py#L37-L102)
+  - `map_thinking_budget_to_reasoning_effort()` - 将 thinking_budget 数值映射到 reasoning_effort 枚举值
+  - `extract_reasoning_effort_from_extra_body()` - 从 extra_body 中提取 thinking_budget 并执行映射
+- OCI 客户端: [src/core/oci_client.py](../src/core/oci_client.py#L333-L336)
+
+**日志输出**：
+```
+2025-12-10 15:09:17 - api.routers.chat - INFO - Chat completion request for model: google.gemini-2.5-pro, client: Cherry Studio
+2025-12-10 15:09:17 - api.routers.chat - INFO - Cherry Studio thinking_budget 1760 mapped to reasoning_effort: low
+2025-12-10 15:09:17 - core.oci_client - INFO - Setting reasoning_effort to LOW for google model
+```
+
+## Cherry Studio 使用示例
+
+### 基本对话
+
+```bash
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-oci-genai-default-key" \
+  -H "x-title: Cherry Studio" \
+  -d '{
+    "model": "google.gemini-2.5-pro",
+    "messages": [
+      {"role": "user", "content": "Hello, how are you?"}
+    ]
+  }'
+```
+
+### 使用 thinking_budget (低推理深度)
+
+```bash
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-oci-genai-default-key" \
+  -H "x-title: Cherry Studio" \
+  -d '{
+    "model": "google.gemini-2.5-pro",
+    "messages": [
+      {"role": "user", "content": "What is 2+2?"}
+    ],
+    "extra_body": {
+      "google": {
+        "thinking_config": {
+          "thinking_budget": 1000
+        }
+      }
+    }
+  }'
+```
+
+### 使用 thinking_budget (中等推理深度)
+
+```bash
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-oci-genai-default-key" \
+  -H "x-title: Cherry Studio" \
+  -d '{
+    "model": "google.gemini-2.5-pro",
+    "messages": [
+      {"role": "user", "content": "Explain quantum entanglement"}
+    ],
+    "extra_body": {
+      "google": {
+        "thinking_config": {
+          "thinking_budget": 5000
+        }
+      }
+    }
+  }'
+```
+
+### 使用 thinking_budget (高推理深度)
+
+```bash
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-oci-genai-default-key" \
+  -H "x-title: Cherry Studio" \
+  -d '{
+    "model": "google.gemini-2.5-pro",
+    "messages": [
+      {"role": "user", "content": "Solve this complex math problem: ..."}
+    ],
+    "extra_body": {
+      "google": {
+        "thinking_config": {
+          "thinking_budget": 20000
+        }
+      }
+    }
+  }'
+```
+
+## 验证日志
+
+启动服务并查看日志以验证 Cherry Studio 优化功能：
+
+```bash
+# 启动服务（开发模式）
+cd src
+python main.py
+
+# 查看日志（另一个终端）
+tail -f logs/app.log | grep -E "(client:|thinking_budget|reasoning_effort)"
+```
+
+期望看到的日志：
+```
+2025-12-10 15:09:17 - api.routers.chat - INFO - Chat completion request for model: google.gemini-2.5-pro, client: Cherry Studio
+2025-12-10 15:09:17 - api.routers.chat - INFO - Cherry Studio thinking_budget 1760 mapped to reasoning_effort: low
+2025-12-10 15:09:17 - core.oci_client - INFO - Setting reasoning_effort to LOW for google model
+```
+
+## 技术实现
+
+### Schema 变更
+
+在 [src/api/schemas.py](../src/api/schemas.py) 中添加了 `extra_body` 字段：
+
+```python
+class ChatCompletionRequest(BaseModel):
+    # ... 其他字段 ...
+    extra_body: Optional[Dict[str, Any]] = None  # Cherry Studio and other client extensions
+```
+
+### 映射函数
+
+实现了两个工具函数来处理 Cherry Studio 的 thinking_budget：
+
+1. **map_thinking_budget_to_reasoning_effort**: 将 thinking_budget 数值映射到 reasoning_effort 枚举值
+2. **extract_reasoning_effort_from_extra_body**: 从 extra_body 中提取 thinking_budget 并执行映射
+
+```python
+def map_thinking_budget_to_reasoning_effort(thinking_budget: int) -> Optional[str]:
+    """Map Cherry Studio's thinking_budget to OCI's reasoning_effort parameter."""
+    if thinking_budget == -1:
+        return None
+    elif thinking_budget <= 1760:
+        return "low"
+    elif thinking_budget <= 16448:
+        return "medium"
+    else:
+        return "high"
+
+def extract_reasoning_effort_from_extra_body(extra_body: Optional[dict]) -> Optional[str]:
+    """Extract reasoning_effort from Cherry Studio's extra_body parameter."""
+    if not extra_body:
+        return None
+
+    try:
+        google_config = extra_body.get("google", {})
+        thinking_config = google_config.get("thinking_config", {})
+        thinking_budget = thinking_config.get("thinking_budget")
+
+        if thinking_budget is not None and isinstance(thinking_budget, (int, float)):
+            effort = map_thinking_budget_to_reasoning_effort(int(thinking_budget))
+            if effort:
+                logger.info(f"Cherry Studio thinking_budget {thinking_budget} mapped to reasoning_effort: {effort}")
+            return effort
+    except (AttributeError, TypeError, KeyError) as e:
+        logger.debug(f"Failed to extract thinking_budget from extra_body: {e}")
+
+    return None
+```
+
+### OCI SDK 集成
+
+更新了 `OCIGenAIClient.chat()` 方法和 `_build_generic_request()` 方法，支持传递 `reasoning_effort` 参数到 OCI SDK 的 `GenericChatRequest`。
+
+## 兼容性
+
+### 支持的模型
+
+**reasoning_effort 参数支持**（通过 thinking_budget 映射）:
+
+- ✅ Google Gemini 模型 (google.gemini-2.5-pro, google.gemini-2.0-flash-exp)
+- ✅ Meta Llama 模型 (meta.llama-3.1-405b-instruct, meta.llama-3.2-90b-vision-instruct)
+- ✅ xAI 模型
+- ✅ OpenAI 模型
+- ❌ Cohere 模型（不支持 reasoning_effort 参数）
+
+**注意**: reasoning_effort 是可选参数，如果模型不支持，会自动忽略并记录警告日志。
+
+### 向后兼容性
+
+- ✅ 不提供 `extra_body` 时，行为与之前完全一致
+- ✅ 不提供 `x-title` 时，客户端名称显示为 "Unknown"
+- ✅ 其他客户端不受影响，可以继续正常使用
+- ✅ 标准 OpenAI API 兼容性完全保留
+
+### 与其他客户端的兼容性
+
+虽然此优化专为 Cherry Studio 设计，但实现方式确保了：
+
+1. **其他客户端不受影响**：不使用 `extra_body.google.thinking_config` 的客户端完全不受影响
+2. **标准 API 兼容**：所有标准 OpenAI API 功能仍然正常工作
+
+## 故障排除
+
+### 问题 1: thinking_budget 参数未生效
+
+**症状**：日志中没有看到 "mapped to reasoning_effort" 消息
+
+**解决方案**：
+1. 确认 `extra_body` 结构正确，嵌套路径为 `extra_body.google.thinking_config.thinking_budget`
+2. 确认使用的是支持的模型（meta、xai、google、openai，不支持 Cohere）
+3. 检查 thinking_budget 值是否有效（非 null 的数字）
+4. 查看日志中是否有错误或警告信息
+
+**验证 extra_body 结构**：
+```bash
+# 正确的结构
+{
+  "extra_body": {
+    "google": {                    # 必须是 "google" 键
+      "thinking_config": {         # 必须是 "thinking_config" 键
+        "thinking_budget": 5000    # 必须是 "thinking_budget" 键，值为数字
+      }
+    }
+  }
+}
+```
+
+### 问题 2: 客户端名称显示为 "Unknown"
+
+**症状**：日志中客户端显示为 "Unknown" 而不是 "Cherry Studio"
+
+**解决方案**：
+1. 确认请求头中包含 `x-title` 字段
+2. 检查 Cherry Studio 是否正确设置了自定义请求头
+3. 尝试手动添加请求头进行测试
+
+**测试命令**：
+```bash
+curl http://localhost:8000/v1/chat/completions \
+  -H "x-title: Cherry Studio" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-oci-genai-default-key" \
+  -d '{"model": "google.gemini-2.5-pro", "messages": [{"role": "user", "content": "test"}]}'
+```
+
+### 问题 3: thinking_budget 映射到错误的 reasoning_effort
+
+**症状**：期望的 reasoning_effort 与实际不符
+
+**验证映射规则**：
+- thinking_budget ≤ 1760 → low
+- 1760 < thinking_budget ≤ 16448 → medium
+- thinking_budget > 16448 → high
+- thinking_budget = -1 → None (使用模型默认)
+
+**示例**：
+```python
+# thinking_budget = 1000 → low ✓
+# thinking_budget = 5000 → medium ✓
+# thinking_budget = 20000 → high ✓
+# thinking_budget = -1 → None (默认) ✓
+```
+
+## 测试
+
+### 自动化测试
+
+运行 Cherry Studio 优化测试脚本：
+
+```bash
+./tests/test_cherry_studio_optimization.sh
+```
+
+测试脚本会验证以下场景：
+1. thinking_budget = 1000 → reasoning_effort = low
+2. thinking_budget = 5000 → reasoning_effort = medium
+3. thinking_budget = 20000 → reasoning_effort = high
+4. thinking_budget = -1 → 使用模型默认值
+5. 无 extra_body（正常请求）
+6. 不同客户端名称（验证 x-title 识别）
+
+## 参考资料
+
+- [OCI GenAI Python SDK - GenericChatRequest](https://docs.oracle.com/en-us/iaas/tools/python/latest/api/generative_ai_inference/models/oci.generative_ai_inference.models.GenericChatRequest.html)
+- [OpenAI API - Reasoning Models](https://platform.openai.com/docs/guides/reasoning)
+- [Google Gemini - Thinking](https://ai.google.dev/gemini-api/docs/thinking)