Files
oracle-openai/docs/CHERRY_STUDIO_OPTIMIZATION.md
Wang Defa 95722c97e4
All checks were successful
Build and Push OCI GenAI Gateway Docker Image / docker-build-push (push) Successful in 35s
Cherry Studio 客户端优化
2025-12-10 17:40:43 +08:00

355 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Cherry Studio 客户端优化
本文档说明针对 Cherry Studio 客户端的专属优化功能。
## 优化内容
### 1. 客户端名称日志显示
**功能描述**
- 从请求头 `x-title` 中提取客户端名称
- 在日志中显示客户端信息,便于追踪和调试
- 支持任何设置 `x-title` 头的客户端,不限于 Cherry Studio
**日志格式**
```
2025-12-10 15:09:17 - api.routers.chat - INFO - Chat completion request for model: google.gemini-2.5-pro, client: Cherry Studio
```
**实现位置**
- [src/api/routers/chat.py](../src/api/routers/chat.py#L295-L296)
**使用示例**
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oci-genai-default-key" \
-H "x-title: Cherry Studio" \
-d '{
"model": "google.gemini-2.5-pro",
"messages": [{"role": "user", "content": "Hello"}]
}'
```
### 2. thinking_budget 到 reasoning_effort 的自动映射
**功能描述**
- Cherry Studio 使用 Google Gemini 的 `thinking_budget` 参数控制推理深度
- 网关自动将 `thinking_budget` 映射到 OCI SDK 的 `reasoning_effort` 参数
- 支持 meta、xai、google、openai 提供商的模型(不支持 Cohere
- 对其他客户端透明,不影响标准 OpenAI API 兼容性
**映射规则**
| thinking_budget 值 | reasoning_effort | 说明 |
|-------------------|------------------|------|
| ≤ 1760 | `low` | 快速响应,较少推理 |
| 1760 < X ≤ 16448 | `medium` | 平衡速度和推理深度 |
| > 16448 | `high` | 深度推理,更完整的答案 |
| -1 | None | 使用模型默认值 |
**extra_body 结构**
Cherry Studio 通过 `extra_body` 传递 Google Gemini 特定的配置:
```json
{
"model": "google.gemini-2.5-pro",
"messages": [...],
"extra_body": {
"google": {
"thinking_config": {
"thinking_budget": 1760,
"include_thoughts": true
}
}
}
}
```
**实现位置**
- 映射函数: [src/api/routers/chat.py](../src/api/routers/chat.py#L37-L102)
- `map_thinking_budget_to_reasoning_effort()` - 将 thinking_budget 数值映射到 reasoning_effort 枚举值
- `extract_reasoning_effort_from_extra_body()` - 从 extra_body 中提取 thinking_budget 并执行映射
- OCI 客户端: [src/core/oci_client.py](../src/core/oci_client.py#L333-L336)
**日志输出**
```
2025-12-10 15:09:17 - api.routers.chat - INFO - Chat completion request for model: google.gemini-2.5-pro, client: Cherry Studio
2025-12-10 15:09:17 - api.routers.chat - INFO - Cherry Studio thinking_budget 1760 mapped to reasoning_effort: low
2025-12-10 15:09:17 - core.oci_client - INFO - Setting reasoning_effort to LOW for google model
```
## Cherry Studio 使用示例
### 基本对话
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oci-genai-default-key" \
-H "x-title: Cherry Studio" \
-d '{
"model": "google.gemini-2.5-pro",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
```
### 使用 thinking_budget (低推理深度)
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oci-genai-default-key" \
-H "x-title: Cherry Studio" \
-d '{
"model": "google.gemini-2.5-pro",
"messages": [
{"role": "user", "content": "What is 2+2?"}
],
"extra_body": {
"google": {
"thinking_config": {
"thinking_budget": 1000
}
}
}
}'
```
### 使用 thinking_budget (中等推理深度)
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oci-genai-default-key" \
-H "x-title: Cherry Studio" \
-d '{
"model": "google.gemini-2.5-pro",
"messages": [
{"role": "user", "content": "Explain quantum entanglement"}
],
"extra_body": {
"google": {
"thinking_config": {
"thinking_budget": 5000
}
}
}
}'
```
### 使用 thinking_budget (高推理深度)
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oci-genai-default-key" \
-H "x-title: Cherry Studio" \
-d '{
"model": "google.gemini-2.5-pro",
"messages": [
{"role": "user", "content": "Solve this complex math problem: ..."}
],
"extra_body": {
"google": {
"thinking_config": {
"thinking_budget": 20000
}
}
}
}'
```
## 验证日志
启动服务并查看日志以验证 Cherry Studio 优化功能:
```bash
# 启动服务(开发模式)
cd src
python main.py
# 查看日志(另一个终端)
tail -f logs/app.log | grep -E "(client:|thinking_budget|reasoning_effort)"
```
期望看到的日志:
```
2025-12-10 15:09:17 - api.routers.chat - INFO - Chat completion request for model: google.gemini-2.5-pro, client: Cherry Studio
2025-12-10 15:09:17 - api.routers.chat - INFO - Cherry Studio thinking_budget 1760 mapped to reasoning_effort: low
2025-12-10 15:09:17 - core.oci_client - INFO - Setting reasoning_effort to LOW for google model
```
## 技术实现
### Schema 变更
在 [src/api/schemas.py](../src/api/schemas.py) 中添加了 `extra_body` 字段:
```python
class ChatCompletionRequest(BaseModel):
# ... 其他字段 ...
extra_body: Optional[Dict[str, Any]] = None # Cherry Studio and other client extensions
```
### 映射函数
实现了两个工具函数来处理 Cherry Studio 的 thinking_budget
1. **map_thinking_budget_to_reasoning_effort**: 将 thinking_budget 数值映射到 reasoning_effort 枚举值
2. **extract_reasoning_effort_from_extra_body**: 从 extra_body 中提取 thinking_budget 并执行映射
```python
def map_thinking_budget_to_reasoning_effort(thinking_budget: int) -> Optional[str]:
"""Map Cherry Studio's thinking_budget to OCI's reasoning_effort parameter."""
if thinking_budget == -1:
return None
elif thinking_budget <= 1760:
return "low"
elif thinking_budget <= 16448:
return "medium"
else:
return "high"
def extract_reasoning_effort_from_extra_body(extra_body: Optional[dict]) -> Optional[str]:
"""Extract reasoning_effort from Cherry Studio's extra_body parameter."""
if not extra_body:
return None
try:
google_config = extra_body.get("google", {})
thinking_config = google_config.get("thinking_config", {})
thinking_budget = thinking_config.get("thinking_budget")
if thinking_budget is not None and isinstance(thinking_budget, (int, float)):
effort = map_thinking_budget_to_reasoning_effort(int(thinking_budget))
if effort:
logger.info(f"Cherry Studio thinking_budget {thinking_budget} mapped to reasoning_effort: {effort}")
return effort
except (AttributeError, TypeError, KeyError) as e:
logger.debug(f"Failed to extract thinking_budget from extra_body: {e}")
return None
```
### OCI SDK 集成
更新了 `OCIGenAIClient.chat()` 方法和 `_build_generic_request()` 方法,支持传递 `reasoning_effort` 参数到 OCI SDK 的 `GenericChatRequest`
## 兼容性
### 支持的模型
**reasoning_effort 参数支持**(通过 thinking_budget 映射):
- ✅ Google Gemini 模型 (google.gemini-2.5-pro, google.gemini-2.0-flash-exp)
- ✅ Meta Llama 模型 (meta.llama-3.1-405b-instruct, meta.llama-3.2-90b-vision-instruct)
- ✅ xAI 模型
- ✅ OpenAI 模型
- ❌ Cohere 模型(不支持 reasoning_effort 参数)
**注意**: reasoning_effort 是可选参数,如果模型不支持,会自动忽略并记录警告日志。
### 向后兼容性
- ✅ 不提供 `extra_body` 时,行为与之前完全一致
- ✅ 不提供 `x-title` 时,客户端名称显示为 "Unknown"
- ✅ 其他客户端不受影响,可以继续正常使用
- ✅ 标准 OpenAI API 兼容性完全保留
### 与其他客户端的兼容性
虽然此优化专为 Cherry Studio 设计,但实现方式确保了:
1. **其他客户端不受影响**:不使用 `extra_body.google.thinking_config` 的客户端完全不受影响
2. **标准 API 兼容**:所有标准 OpenAI API 功能仍然正常工作
## 故障排除
### 问题 1: thinking_budget 参数未生效
**症状**:日志中没有看到 "mapped to reasoning_effort" 消息
**解决方案**
1. 确认 `extra_body` 结构正确,嵌套路径为 `extra_body.google.thinking_config.thinking_budget`
2. 确认使用的是支持的模型meta、xai、google、openai不支持 Cohere
3. 检查 thinking_budget 值是否有效(非 null 的数字)
4. 查看日志中是否有错误或警告信息
**验证 extra_body 结构**
```bash
# 正确的结构
{
"extra_body": {
"google": { # 必须是 "google" 键
"thinking_config": { # 必须是 "thinking_config" 键
"thinking_budget": 5000 # 必须是 "thinking_budget" 键,值为数字
}
}
}
}
```
### 问题 2: 客户端名称显示为 "Unknown"
**症状**:日志中客户端显示为 "Unknown" 而不是 "Cherry Studio"
**解决方案**
1. 确认请求头中包含 `x-title` 字段
2. 检查 Cherry Studio 是否正确设置了自定义请求头
3. 尝试手动添加请求头进行测试
**测试命令**
```bash
curl http://localhost:8000/v1/chat/completions \
-H "x-title: Cherry Studio" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oci-genai-default-key" \
-d '{"model": "google.gemini-2.5-pro", "messages": [{"role": "user", "content": "test"}]}'
```
### 问题 3: thinking_budget 映射到错误的 reasoning_effort
**症状**:期望的 reasoning_effort 与实际不符
**验证映射规则**
- thinking_budget ≤ 1760 → low
- 1760 < thinking_budget ≤ 16448 → medium
- thinking_budget > 16448 → high
- thinking_budget = -1 → None (使用模型默认)
**示例**
```python
# thinking_budget = 1000 → low ✓
# thinking_budget = 5000 → medium ✓
# thinking_budget = 20000 → high ✓
# thinking_budget = -1 → None (默认) ✓
```
## 测试
### 自动化测试
运行 Cherry Studio 优化测试脚本:
```bash
./tests/test_cherry_studio_optimization.sh
```
测试脚本会验证以下场景:
1. thinking_budget = 1000 → reasoning_effort = low
2. thinking_budget = 5000 → reasoning_effort = medium
3. thinking_budget = 20000 → reasoning_effort = high
4. thinking_budget = -1 → 使用模型默认值
5. 无 extra_body正常请求
6. 不同客户端名称(验证 x-title 识别)
## 参考资料
- [OCI GenAI Python SDK - GenericChatRequest](https://docs.oracle.com/en-us/iaas/tools/python/latest/api/generative_ai_inference/models/oci.generative_ai_inference.models.GenericChatRequest.html)
- [OpenAI API - Reasoning Models](https://platform.openai.com/docs/guides/reasoning)
- [Google Gemini - Thinking](https://ai.google.dev/gemini-api/docs/thinking)