wangdefa/oracle-openai

Fork 0

Files

Wang Defa 95722c97e4

Build and Push OCI GenAI Gateway Docker Image / docker-build-push (push) Successful in 35s

Details

Cherry Studio 客户端优化

2025-12-10 17:40:43 +08:00

11 KiB

Raw Blame History

Cherry Studio 客户端优化

本文档说明针对 Cherry Studio 客户端的专属优化功能。

优化内容

1. 客户端名称日志显示

功能描述：

从请求头 x-title 中提取客户端名称
在日志中显示客户端信息，便于追踪和调试
支持任何设置 x-title 头的客户端，不限于 Cherry Studio

日志格式：

2025-12-10 15:09:17 - api.routers.chat - INFO - Chat completion request for model: google.gemini-2.5-pro, client: Cherry Studio

实现位置：

src/api/routers/chat.py

使用示例：

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-oci-genai-default-key" \
  -H "x-title: Cherry Studio" \
  -d '{
    "model": "google.gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2. thinking_budget 到 reasoning_effort 的自动映射

功能描述：

Cherry Studio 使用 Google Gemini 的 thinking_budget 参数控制推理深度
网关自动将 thinking_budget 映射到 OCI SDK 的 reasoning_effort 参数
支持 meta、xai、google、openai 提供商的模型（不支持 Cohere）
对其他客户端透明，不影响标准 OpenAI API 兼容性

映射规则：

thinking_budget 值	reasoning_effort	说明
≤ 1760	`low`	快速响应，较少推理
1760 < X ≤ 16448	`medium`	平衡速度和推理深度
> 16448	`high`	深度推理，更完整的答案
-1	None	使用模型默认值

extra_body 结构：

Cherry Studio 通过 extra_body 传递 Google Gemini 特定的配置：

{
  "model": "google.gemini-2.5-pro",
  "messages": [...],
  "extra_body": {
    "google": {
      "thinking_config": {
        "thinking_budget": 1760,
        "include_thoughts": true
      }
    }
  }
}

实现位置：

映射函数: src/api/routers/chat.py
- map_thinking_budget_to_reasoning_effort() - 将 thinking_budget 数值映射到 reasoning_effort 枚举值
- extract_reasoning_effort_from_extra_body() - 从 extra_body 中提取 thinking_budget 并执行映射
OCI 客户端: src/core/oci_client.py

日志输出：

2025-12-10 15:09:17 - api.routers.chat - INFO - Chat completion request for model: google.gemini-2.5-pro, client: Cherry Studio
2025-12-10 15:09:17 - api.routers.chat - INFO - Cherry Studio thinking_budget 1760 mapped to reasoning_effort: low
2025-12-10 15:09:17 - core.oci_client - INFO - Setting reasoning_effort to LOW for google model

Cherry Studio 使用示例

基本对话

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-oci-genai-default-key" \
  -H "x-title: Cherry Studio" \
  -d '{
    "model": "google.gemini-2.5-pro",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

使用 thinking_budget (低推理深度)

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-oci-genai-default-key" \
  -H "x-title: Cherry Studio" \
  -d '{
    "model": "google.gemini-2.5-pro",
    "messages": [
      {"role": "user", "content": "What is 2+2?"}
    ],
    "extra_body": {
      "google": {
        "thinking_config": {
          "thinking_budget": 1000
        }
      }
    }
  }'

使用 thinking_budget (中等推理深度)

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-oci-genai-default-key" \
  -H "x-title: Cherry Studio" \
  -d '{
    "model": "google.gemini-2.5-pro",
    "messages": [
      {"role": "user", "content": "Explain quantum entanglement"}
    ],
    "extra_body": {
      "google": {
        "thinking_config": {
          "thinking_budget": 5000
        }
      }
    }
  }'

使用 thinking_budget (高推理深度)

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-oci-genai-default-key" \
  -H "x-title: Cherry Studio" \
  -d '{
    "model": "google.gemini-2.5-pro",
    "messages": [
      {"role": "user", "content": "Solve this complex math problem: ..."}
    ],
    "extra_body": {
      "google": {
        "thinking_config": {
          "thinking_budget": 20000
        }
      }
    }
  }'

验证日志

启动服务并查看日志以验证 Cherry Studio 优化功能：

# 启动服务（开发模式）
cd src
python main.py

# 查看日志（另一个终端）
tail -f logs/app.log | grep -E "(client:|thinking_budget|reasoning_effort)"

期望看到的日志：

2025-12-10 15:09:17 - api.routers.chat - INFO - Chat completion request for model: google.gemini-2.5-pro, client: Cherry Studio
2025-12-10 15:09:17 - api.routers.chat - INFO - Cherry Studio thinking_budget 1760 mapped to reasoning_effort: low
2025-12-10 15:09:17 - core.oci_client - INFO - Setting reasoning_effort to LOW for google model

技术实现

Schema 变更

在 src/api/schemas.py 中添加了 extra_body 字段：

class ChatCompletionRequest(BaseModel):
    # ... 其他字段 ...
    extra_body: Optional[Dict[str, Any]] = None  # Cherry Studio and other client extensions

映射函数

实现了两个工具函数来处理 Cherry Studio 的 thinking_budget：

map_thinking_budget_to_reasoning_effort: 将 thinking_budget 数值映射到 reasoning_effort 枚举值
extract_reasoning_effort_from_extra_body: 从 extra_body 中提取 thinking_budget 并执行映射

def map_thinking_budget_to_reasoning_effort(thinking_budget: int) -> Optional[str]:
    """Map Cherry Studio's thinking_budget to OCI's reasoning_effort parameter."""
    if thinking_budget == -1:
        return None
    elif thinking_budget <= 1760:
        return "low"
    elif thinking_budget <= 16448:
        return "medium"
    else:
        return "high"

def extract_reasoning_effort_from_extra_body(extra_body: Optional[dict]) -> Optional[str]:
    """Extract reasoning_effort from Cherry Studio's extra_body parameter."""
    if not extra_body:
        return None

    try:
        google_config = extra_body.get("google", {})
        thinking_config = google_config.get("thinking_config", {})
        thinking_budget = thinking_config.get("thinking_budget")

        if thinking_budget is not None and isinstance(thinking_budget, (int, float)):
            effort = map_thinking_budget_to_reasoning_effort(int(thinking_budget))
            if effort:
                logger.info(f"Cherry Studio thinking_budget {thinking_budget} mapped to reasoning_effort: {effort}")
            return effort
    except (AttributeError, TypeError, KeyError) as e:
        logger.debug(f"Failed to extract thinking_budget from extra_body: {e}")

    return None

OCI SDK 集成

更新了 OCIGenAIClient.chat() 方法和 _build_generic_request() 方法，支持传递 reasoning_effort 参数到 OCI SDK 的 GenericChatRequest。

兼容性

支持的模型

reasoning_effort 参数支持（通过 thinking_budget 映射）:

✅ Google Gemini 模型 (google.gemini-2.5-pro, google.gemini-2.0-flash-exp)
✅ Meta Llama 模型 (meta.llama-3.1-405b-instruct, meta.llama-3.2-90b-vision-instruct)
✅ xAI 模型
✅ OpenAI 模型
❌ Cohere 模型（不支持 reasoning_effort 参数）

注意: reasoning_effort 是可选参数，如果模型不支持，会自动忽略并记录警告日志。

向后兼容性

✅ 不提供 extra_body 时，行为与之前完全一致
✅ 不提供 x-title 时，客户端名称显示为 "Unknown"
✅ 其他客户端不受影响，可以继续正常使用
✅ 标准 OpenAI API 兼容性完全保留

与其他客户端的兼容性

虽然此优化专为 Cherry Studio 设计，但实现方式确保了：

其他客户端不受影响：不使用 extra_body.google.thinking_config 的客户端完全不受影响
标准 API 兼容：所有标准 OpenAI API 功能仍然正常工作

故障排除

问题 1: thinking_budget 参数未生效

症状：日志中没有看到 "mapped to reasoning_effort" 消息

解决方案：

确认 extra_body 结构正确，嵌套路径为 extra_body.google.thinking_config.thinking_budget
确认使用的是支持的模型（meta、xai、google、openai，不支持 Cohere）
检查 thinking_budget 值是否有效（非 null 的数字）
查看日志中是否有错误或警告信息

验证 extra_body 结构：

# 正确的结构
{
  "extra_body": {
    "google": {                    # 必须是 "google" 键
      "thinking_config": {         # 必须是 "thinking_config" 键
        "thinking_budget": 5000    # 必须是 "thinking_budget" 键，值为数字
      }
    }
  }
}

问题 2: 客户端名称显示为 "Unknown"

症状：日志中客户端显示为 "Unknown" 而不是 "Cherry Studio"

解决方案：

确认请求头中包含 x-title 字段
检查 Cherry Studio 是否正确设置了自定义请求头
尝试手动添加请求头进行测试

测试命令：

curl http://localhost:8000/v1/chat/completions \
  -H "x-title: Cherry Studio" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-oci-genai-default-key" \
  -d '{"model": "google.gemini-2.5-pro", "messages": [{"role": "user", "content": "test"}]}'

问题 3: thinking_budget 映射到错误的 reasoning_effort

症状：期望的 reasoning_effort 与实际不符

验证映射规则：

thinking_budget ≤ 1760 → low
1760 < thinking_budget ≤ 16448 → medium
thinking_budget > 16448 → high
thinking_budget = -1 → None (使用模型默认)

示例：

# thinking_budget = 1000 → low ✓
# thinking_budget = 5000 → medium ✓
# thinking_budget = 20000 → high ✓
# thinking_budget = -1 → None (默认) ✓

测试

自动化测试

运行 Cherry Studio 优化测试脚本：

./tests/test_cherry_studio_optimization.sh

测试脚本会验证以下场景：

thinking_budget = 1000 → reasoning_effort = low
thinking_budget = 5000 → reasoning_effort = medium
thinking_budget = 20000 → reasoning_effort = high
thinking_budget = -1 → 使用模型默认值
无 extra_body（正常请求）
不同客户端名称（验证 x-title 识别）

11 KiB Raw Blame History Unescape Escape

Cherry Studio 客户端优化

优化内容

1. 客户端名称日志显示

2. thinking_budget 到 reasoning_effort 的自动映射

Cherry Studio 使用示例

基本对话

使用 thinking_budget (低推理深度)

使用 thinking_budget (中等推理深度)

使用 thinking_budget (高推理深度)

验证日志

技术实现

Schema 变更

映射函数

OCI SDK 集成

兼容性

支持的模型

向后兼容性

与其他客户端的兼容性

故障排除

问题 1: thinking_budget 参数未生效

问题 2: 客户端名称显示为 "Unknown"

问题 3: thinking_budget 映射到错误的 reasoning_effort

测试

自动化测试

参考资料

11 KiB

Raw Blame History