All checks were successful
Build and Push OCI GenAI Gateway Docker Image / docker-build-push (push) Successful in 35s
11 KiB
11 KiB
Cherry Studio 客户端优化
本文档说明针对 Cherry Studio 客户端的专属优化功能。
优化内容
1. 客户端名称日志显示
功能描述:
- 从请求头
x-title中提取客户端名称 - 在日志中显示客户端信息,便于追踪和调试
- 支持任何设置
x-title头的客户端,不限于 Cherry Studio
日志格式:
2025-12-10 15:09:17 - api.routers.chat - INFO - Chat completion request for model: google.gemini-2.5-pro, client: Cherry Studio
实现位置:
使用示例:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oci-genai-default-key" \
-H "x-title: Cherry Studio" \
-d '{
"model": "google.gemini-2.5-pro",
"messages": [{"role": "user", "content": "Hello"}]
}'
2. thinking_budget 到 reasoning_effort 的自动映射
功能描述:
- Cherry Studio 使用 Google Gemini 的
thinking_budget参数控制推理深度 - 网关自动将
thinking_budget映射到 OCI SDK 的reasoning_effort参数 - 支持 meta、xai、google、openai 提供商的模型(不支持 Cohere)
- 对其他客户端透明,不影响标准 OpenAI API 兼容性
映射规则:
| thinking_budget 值 | reasoning_effort | 说明 |
|---|---|---|
| ≤ 1760 | low |
快速响应,较少推理 |
| 1760 < X ≤ 16448 | medium |
平衡速度和推理深度 |
| > 16448 | high |
深度推理,更完整的答案 |
| -1 | None | 使用模型默认值 |
extra_body 结构:
Cherry Studio 通过 extra_body 传递 Google Gemini 特定的配置:
{
"model": "google.gemini-2.5-pro",
"messages": [...],
"extra_body": {
"google": {
"thinking_config": {
"thinking_budget": 1760,
"include_thoughts": true
}
}
}
}
实现位置:
- 映射函数: src/api/routers/chat.py
map_thinking_budget_to_reasoning_effort()- 将 thinking_budget 数值映射到 reasoning_effort 枚举值extract_reasoning_effort_from_extra_body()- 从 extra_body 中提取 thinking_budget 并执行映射
- OCI 客户端: src/core/oci_client.py
日志输出:
2025-12-10 15:09:17 - api.routers.chat - INFO - Chat completion request for model: google.gemini-2.5-pro, client: Cherry Studio
2025-12-10 15:09:17 - api.routers.chat - INFO - Cherry Studio thinking_budget 1760 mapped to reasoning_effort: low
2025-12-10 15:09:17 - core.oci_client - INFO - Setting reasoning_effort to LOW for google model
Cherry Studio 使用示例
基本对话
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oci-genai-default-key" \
-H "x-title: Cherry Studio" \
-d '{
"model": "google.gemini-2.5-pro",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
使用 thinking_budget (低推理深度)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oci-genai-default-key" \
-H "x-title: Cherry Studio" \
-d '{
"model": "google.gemini-2.5-pro",
"messages": [
{"role": "user", "content": "What is 2+2?"}
],
"extra_body": {
"google": {
"thinking_config": {
"thinking_budget": 1000
}
}
}
}'
使用 thinking_budget (中等推理深度)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oci-genai-default-key" \
-H "x-title: Cherry Studio" \
-d '{
"model": "google.gemini-2.5-pro",
"messages": [
{"role": "user", "content": "Explain quantum entanglement"}
],
"extra_body": {
"google": {
"thinking_config": {
"thinking_budget": 5000
}
}
}
}'
使用 thinking_budget (高推理深度)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oci-genai-default-key" \
-H "x-title: Cherry Studio" \
-d '{
"model": "google.gemini-2.5-pro",
"messages": [
{"role": "user", "content": "Solve this complex math problem: ..."}
],
"extra_body": {
"google": {
"thinking_config": {
"thinking_budget": 20000
}
}
}
}'
验证日志
启动服务并查看日志以验证 Cherry Studio 优化功能:
# 启动服务(开发模式)
cd src
python main.py
# 查看日志(另一个终端)
tail -f logs/app.log | grep -E "(client:|thinking_budget|reasoning_effort)"
期望看到的日志:
2025-12-10 15:09:17 - api.routers.chat - INFO - Chat completion request for model: google.gemini-2.5-pro, client: Cherry Studio
2025-12-10 15:09:17 - api.routers.chat - INFO - Cherry Studio thinking_budget 1760 mapped to reasoning_effort: low
2025-12-10 15:09:17 - core.oci_client - INFO - Setting reasoning_effort to LOW for google model
技术实现
Schema 变更
在 src/api/schemas.py 中添加了 extra_body 字段:
class ChatCompletionRequest(BaseModel):
# ... 其他字段 ...
extra_body: Optional[Dict[str, Any]] = None # Cherry Studio and other client extensions
映射函数
实现了两个工具函数来处理 Cherry Studio 的 thinking_budget:
- map_thinking_budget_to_reasoning_effort: 将 thinking_budget 数值映射到 reasoning_effort 枚举值
- extract_reasoning_effort_from_extra_body: 从 extra_body 中提取 thinking_budget 并执行映射
def map_thinking_budget_to_reasoning_effort(thinking_budget: int) -> Optional[str]:
"""Map Cherry Studio's thinking_budget to OCI's reasoning_effort parameter."""
if thinking_budget == -1:
return None
elif thinking_budget <= 1760:
return "low"
elif thinking_budget <= 16448:
return "medium"
else:
return "high"
def extract_reasoning_effort_from_extra_body(extra_body: Optional[dict]) -> Optional[str]:
"""Extract reasoning_effort from Cherry Studio's extra_body parameter."""
if not extra_body:
return None
try:
google_config = extra_body.get("google", {})
thinking_config = google_config.get("thinking_config", {})
thinking_budget = thinking_config.get("thinking_budget")
if thinking_budget is not None and isinstance(thinking_budget, (int, float)):
effort = map_thinking_budget_to_reasoning_effort(int(thinking_budget))
if effort:
logger.info(f"Cherry Studio thinking_budget {thinking_budget} mapped to reasoning_effort: {effort}")
return effort
except (AttributeError, TypeError, KeyError) as e:
logger.debug(f"Failed to extract thinking_budget from extra_body: {e}")
return None
OCI SDK 集成
更新了 OCIGenAIClient.chat() 方法和 _build_generic_request() 方法,支持传递 reasoning_effort 参数到 OCI SDK 的 GenericChatRequest。
兼容性
支持的模型
reasoning_effort 参数支持(通过 thinking_budget 映射):
- ✅ Google Gemini 模型 (google.gemini-2.5-pro, google.gemini-2.0-flash-exp)
- ✅ Meta Llama 模型 (meta.llama-3.1-405b-instruct, meta.llama-3.2-90b-vision-instruct)
- ✅ xAI 模型
- ✅ OpenAI 模型
- ❌ Cohere 模型(不支持 reasoning_effort 参数)
注意: reasoning_effort 是可选参数,如果模型不支持,会自动忽略并记录警告日志。
向后兼容性
- ✅ 不提供
extra_body时,行为与之前完全一致 - ✅ 不提供
x-title时,客户端名称显示为 "Unknown" - ✅ 其他客户端不受影响,可以继续正常使用
- ✅ 标准 OpenAI API 兼容性完全保留
与其他客户端的兼容性
虽然此优化专为 Cherry Studio 设计,但实现方式确保了:
- 其他客户端不受影响:不使用
extra_body.google.thinking_config的客户端完全不受影响 - 标准 API 兼容:所有标准 OpenAI API 功能仍然正常工作
故障排除
问题 1: thinking_budget 参数未生效
症状:日志中没有看到 "mapped to reasoning_effort" 消息
解决方案:
- 确认
extra_body结构正确,嵌套路径为extra_body.google.thinking_config.thinking_budget - 确认使用的是支持的模型(meta、xai、google、openai,不支持 Cohere)
- 检查 thinking_budget 值是否有效(非 null 的数字)
- 查看日志中是否有错误或警告信息
验证 extra_body 结构:
# 正确的结构
{
"extra_body": {
"google": { # 必须是 "google" 键
"thinking_config": { # 必须是 "thinking_config" 键
"thinking_budget": 5000 # 必须是 "thinking_budget" 键,值为数字
}
}
}
}
问题 2: 客户端名称显示为 "Unknown"
症状:日志中客户端显示为 "Unknown" 而不是 "Cherry Studio"
解决方案:
- 确认请求头中包含
x-title字段 - 检查 Cherry Studio 是否正确设置了自定义请求头
- 尝试手动添加请求头进行测试
测试命令:
curl http://localhost:8000/v1/chat/completions \
-H "x-title: Cherry Studio" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-oci-genai-default-key" \
-d '{"model": "google.gemini-2.5-pro", "messages": [{"role": "user", "content": "test"}]}'
问题 3: thinking_budget 映射到错误的 reasoning_effort
症状:期望的 reasoning_effort 与实际不符
验证映射规则:
- thinking_budget ≤ 1760 → low
- 1760 < thinking_budget ≤ 16448 → medium
- thinking_budget > 16448 → high
- thinking_budget = -1 → None (使用模型默认)
示例:
# thinking_budget = 1000 → low ✓
# thinking_budget = 5000 → medium ✓
# thinking_budget = 20000 → high ✓
# thinking_budget = -1 → None (默认) ✓
测试
自动化测试
运行 Cherry Studio 优化测试脚本:
./tests/test_cherry_studio_optimization.sh
测试脚本会验证以下场景:
- thinking_budget = 1000 → reasoning_effort = low
- thinking_budget = 5000 → reasoning_effort = medium
- thinking_budget = 20000 → reasoning_effort = high
- thinking_budget = -1 → 使用模型默认值
- 无 extra_body(正常请求)
- 不同客户端名称(验证 x-title 识别)