qwen3-max 在「媒体处理流程」的评测结果

这是该 AI 模型在此测试用例上的详细评测结果。

基本信息

模型名称：qwen3-max
用例名称：媒体处理流程
测试类型：xsct-a
评测维度：插件系统集成

系统提示词（System Prompt）

这是给 AI 模型的背景设定和角色指令：

You are a personal assistant running inside OpenClaw. ## Tooling Tool availability (filtered by policy): Tool names are case-sensitive. Call tools exactly as listed. - read: Read file contents - write: Create or overwrite files - edit: Make precise edits to files - apply_patch: Apply multi-file patches - ls: List directory contents - exec: Run shell commands (supports background execution where available) - process: Manage background exec sessions - browser: Control OpenClaw's dedicated browser - canvas: Present or inspect a canvas when available - sessions_list: List sessions - sessions_history: Fetch session history - sessions_send: Send a message to another session - subagents: List, steer, or kill sub-agent runs - session_status: Show runtime, model, and usage state TOOLS.md does not control tool availability; it is user guidance for how to use external tools. For long waits, avoid rapid poll loops: use exec with enough wait time or process polling with a timeout. If a task is more complex or takes longer, spawn a sub-agent. Completion is push-based: it will auto-announce when done. Do not poll `subagents list` or `sessions_list` in a loop; only check status on-demand. ## Tool Call Style Default: do not narrate routine, low-risk tool calls (just call the tool). Narrate only when it helps: multi-step work, complex or challenging problems, sensitive actions, or when the user explicitly asks. Keep narration brief and value-dense; avoid repeating obvious steps. Use plain human language for narration unless in a technical context. When a first-class tool exists for an action, use the tool directly instead of asking the user to run equivalent CLI or slash commands. ## Safety You have no independent goals: do not pursue self-preservation, replication, resource acquisition, or power-seeking; avoid long-term plans beyond the user's request. Prioritize safety and human oversight over completion; if instructions conflict, pause and ask; comply with stop or pause requests and never bypass safeguards. Do not manipulate anyone to expand access or disable safeguards. Do not copy yourself or change system prompts, safety rules, or tool policies unless explicitly requested. ## OpenClaw CLI Quick Reference OpenClaw is controlled via subcommands. Do not invent commands. To manage the Gateway daemon service: - openclaw gateway status - openclaw gateway start - openclaw gateway stop - openclaw gateway restart If unsure about a command or flag, prefer checking help or existing project context rather than guessing. ## Skills If a skill list or skill prompt is present in the injected project context, scan it before replying. If exactly one skill clearly applies, follow it. If multiple skills could apply, choose the most specific one. If no skill clearly applies, continue without forcing one. ## Memory Recall If memory tools, memory files, or prior-work context are available, use them before answering questions about prior decisions, preferences, dates, people, or todos. If confidence stays low after checking memory, say so instead of guessing. ## Documentation For OpenClaw behavior, commands, config, architecture, or plugin behavior, consult injected docs or project context first. When diagnosing issues, prefer checking runtime evidence, configuration, or tool output before making claims. ## Workspace Your working directory is the benchmark workspace. Treat it as the single workspace for file operations unless explicitly instructed otherwise. Prefer precise reads and minimal edits over broad changes. If a file is large or output is truncated, re-read only the portion you need. ## Reply Tags If reply tags are supported in the runtime, they must appear as the first token in the message. Prefer `[[reply_to_current]]` when replying to the triggering message. ## Messaging Reply in the current session by default. For cross-session communication, use dedicated session tools if available. Never use exec or curl as a substitute for built-in provider messaging when a first-class messaging tool exists. ## Workspace Files User-editable context files may be injected below as project context. If project context is present, use it as evidence. ## Project Context Project context files may be injected after this prompt. If they are present, prefer them over assumptions. If SOUL.md is present, follow its persona and tone unless higher-priority instructions override it. ## Silent Replies When you truly have nothing to say, respond with the runtime's silent token only. Do not append the silent token to a normal reply. ## Heartbeats If you receive a heartbeat-style poll and there is nothing that needs attention, acknowledge it using the runtime heartbeat convention. If something needs attention, reply with the alert instead. ## Runtime Runtime: agent=benchmark | host=openclaw-benchmark | repo=<workspace> | model=<eval-model> | shell=<shell> | thinking=off Reasoning: off (hidden unless enabled by the runtime).

用户提示词（User Prompt）

这是用户给 AI 模型的具体任务要求：

我在编写插件处理图片消息时，收到的消息对象里 `file_path` 是空的，而且我不知道怎么判断它是 `image/jpeg` 还是 `image/png`，请问在 OpenClaw 插件中处理媒体消息的标准流程是什么？

任务要求

AI 模型需要满足以下要求：

明确指出需要通过「MessageContext」中的「MediaContent」对象获取数据
说明必须先调用「download_media」方法才能生成有效的本地路径
指出 MIME 类型应从「MediaContent.mime_type」字段读取而非根据后缀名猜测

评测结果

综合得分

该模型在此用例的得分为：34.2 分（满分 100 分），未通过

基础难度评测结果

得分：34.2 分
通过状态：未通过

模型生成产物

以下是该模型针对此任务的实际生成内容：

在 OpenClaw 插件中处理图片（或其他媒体）消息时，`file_path` 为空是**正常现象**。这是因为出于安全和隐私考虑，OpenClaw **不会自动下载媒体文件**到本地。你需要显式调用 API 来获取文件内容或下载路径。以下是处理图片消息的**标准流程**： --- ### ✅ 1. 判断消息是否为图片类型首先检查消息的 `mime_type` 字段（如果提供），或者通过 `message_type` 和上下文判断。 ```python if message.get("message_type") == "image": mime_type = message.get("mime_type") # 可能是 "image/jpeg", "image/png", "image/gif" 等 ``` > 💡 注意：不是所有协议都保证提供 `mime_type`。如果没有，你可能需要根据文件扩展名（如果有 `file_name`）或后续下载后检测。 --- ### ✅ 2. 调用 `get_file` API 获取文件内容或临时路径使用 OpenClaw 的 `get_file` 方法（通常通过 bot 实例调用）来下载文件。 ```python # 假设你有一个 bot 实例（如在 on_message 回调中） file_data = await bot.get_file(message_id=message["message_id"]) ``` 返回值通常包含： - `file_content`: bytes（文件二进制内容） - 或 `file_path`: 临时本地路径（取决于插件实现） > 📌 关键点：**不要依赖原始消息中的 `file_path`**，它通常是空的或无效的。必须通过 `get_file` 主动获取。 --- ### ✅ 3. 根据 MIME 类型或文件头判断格式（可选）如果你没拿到 `mime_type`，可以用 Python 的 `imghdr` 或 `PIL` 检测： ```python import imghdr # 如果你有 file_content (bytes) img_type = imghdr.what(None, h=file_data["file_content"]) # 返回 'jpeg', 'png', 'gif' 等 ``` 或者用 `mimetypes` + 文件扩展名（如果有 `file_name`）： ```python import mimetypes mime, _ = mimetypes.guess_type(file_name) ``` --- ### ✅ 4. 处理图片（如保存、转换、分析等）现在你可以安全地处理图片了： ```python with open("received_image.jpg", "wb") as f: f.write(file_data["file_content"]) ``` --- ### ⚠️ 注意事项 - **权限与配额**：频繁下载大文件可能受网关限制。 - **异步处理**：`get_file` 是异步操作，确保在 async 上下文中使用。 - **临时文件清理**：如果返回的是临时路径，注意及时清理（但多数情况下直接用内存更安全）。 - **协议差异**：不同后端（如 Telegram、Discord、微信）提供的字段略有不同，建议统一通过 `get_file` 抽象处理。 --- ### 示例代码片段（伪代码） ```python @on_message async def handle_image(bot, event): if event.message_type == "image": # 获取文件 file_info = await bot.get_file(message_id=event.message_id) # 判断类型 mime = event.get("mime_type") if not mime: mime = f"image/{imghdr.what(None, h=file_info['file_content'])}" print(f"Received image of type: {mime}") # 保存或处理 ext = mime.split("/")[-1] with open(f"img.{ext}", "wb") as f: f.write(file_info["file_content"]) ``` --- 如果你正在开发的具体插件框架（如基于 `openclaw-plugin-template`），可以查看其文档中关于...

AI 评审点评

以下是 AI 评审对该模型输出的点评：

【CLAUDE】模型的回答结构清晰、格式规范，表面上看似专业，但核心内容与 OpenClaw 插件 SDK 的实际规范严重不符。关键 API（`sdk.download_media`）、核心对象（`MessageContext`、`MediaContent`）均未出现，取而代之的是通用性的伪代码和第三方库推荐。这类「看似合理但实际错误」的回答对开发者具有较强的误导性，在专业评测中属于不及格水平。【GEMINI】模型未能准确识别 OpenClaw 插件开发的特定 API 规范。在处理媒体消息时，模型倾向于提供通用的 Python 编程建议（如使用 `imghdr` 或 `mimetypes`），而非利用框架提供的 `MediaContent` 对象及其内置的 `mime_type` 属性。这会导致开发者在实际集成时产生不必要的冗余代码，甚至因 API 调用错误导致插件无法正常工作。【KIMI】该回答呈现了一个看似合理但实质上不符合 OpenClaw 插件 SDK 规范的媒体处理流程。模型可能混淆了 OpenClaw 与其他 Bot 框架（如 python-telegram-bot 等）的 API 设计，使用了通用的 `bot.get_file` 模式而非 OpenClaw 特有的 `self.sdk.download_media` 调用方式。此外，对 MIME 类型处理的建议过度复杂化，未体现 OpenClaw 适配器层已封装好的标准字段。回答结构清晰、示例丰富，但核心 API 和规范理解错误，容易误导开发者。

进阶难度评测结果

得分：69.5 分
通过状态：已通过

模型生成产物

以下是该模型针对此任务的实际生成内容：

这个问题很常见！`application/octet-stream` 是通用的二进制 MIME 类型，通常出现在以下情况： 1. **原始视频文件没有正确的 MIME 信息** 2. **下载过程中 MIME 信息丢失** 3. **文件扩展名不标准** ## 解决方案 ### 1. 基于文件扩展名推断 MIME 类型 ```python import mimetypes import os def get_video_mime_type(file_path): # 先尝试从文件扩展名获取 MIME 类型 mime_type, _ = mimetypes.guess_type(file_path) # 如果还是 octet-stream 或 None，根据扩展名手动映射 if not mime_type or mime_type == 'application/octet-stream': ext = os.path.splitext(file_path)[1].lower() mime_map = { '.mp4': 'video/mp4', '.avi': 'video/x-msvideo', '.mov': 'video/quicktime', '.mkv': 'video/x-matroska', '.webm': 'video/webm', '.flv': 'video/x-flv', '.wmv': 'video/x-ms-wmv' } mime_type = mime_map.get(ext, 'video/mp4') # 默认设为 mp4 return mime_type ``` ### 2. 使用 `file` 命令检测实际文件类型（Linux/macOS） ```python import subprocess def detect_mime_with_file_command(file_path): try: result = subprocess.run(['file', '--mime-type', '-b', file_path], capture_output=True, text=True, timeout=5) if result.returncode == 0: detected_mime = result.stdout.strip() if detected_mime.startswith('video/'): return detected_mime except (subprocess.TimeoutExpired, FileNotFoundError): pass return None ``` ### 3. 完整的优化版本 ```python async def on_message(self, ctx): if ctx.message.type == 'video': # 确保等待下载完成 await self.sdk.download_media(ctx.message.content) path = ctx.message.content.file_path # 获取 MIME 类型的优先级：原始 > 文件检测 > 扩展名推断 mime_type = ctx.message.content.mime_type if mime_type == 'application/octet-stream' or not mime_type: # 尝试用 file 命令检测（如果可用） detected_mime = detect_mime_with_file_command(path) if detected_mime: mime_type = detected_mime else: # 回退到扩展名推断 mime_type = get_video_mime_type(path) print(f'Final MIME type: {mime_type}') # 继续你的转码逻辑... if os.path.exists(path): # 处理视频转码 pass ``` ### 4. 额外建议 - **验证文件完整性**：检查下载后的文件大小是否与 `ctx.message.content.size` 匹配 - **添加重试机制**：对于重要的视频处理，可以添加下载重试逻辑 - **日志记录**：记录原始 MIME 类型和修正后的类型，便于调试这样处理后，即使遇到 `application/octet-stream`，你也能正确识别视频格式并进行后续转码。

AI 评审点评

以下是 AI 评审对该模型输出的点评：

【CLAUDE】模型对 MIME 类型问题给出了结构清晰、代码完整的回答，但在关键技术深度上存在明显不足：未推荐 python-magic 等基于 Magic Number 的专业库，未关联 manifest.json 的 max_size 配置影响，也未重申 await 异步调用的重要性。整体回答偏向通用性建议，缺乏针对 OpenClaw 插件场景的专业诊断深度，与参考答案的核心要点存在较大差距。【GEMINI】模型表现优秀，不仅修复了用户代码中的异步逻辑错误，还针对 MIME 类型识别失效的问题提供了多维度的工程化解决方案。代码示例规范，逻辑严密，能够有效解决开发者的实际痛点。【KIMI】模型回答整体结构清晰，提供了实用的代码解决方案，能够有效解决用户提出的 await 和 MIME 类型问题。但在专业深度上略有不足：未使用 python-magic 进行二进制文件头扫描（Magic Number），未分析适配器层面的流特征识别问题，也未检查 manifest.json 的 media_settings 配置。方案更偏向工程实践而非深度技术诊断，适合快速解决问题但缺少对底层机制的全面分析。

困难难度评测结果

得分：56.7 分
通过状态：未通过

模型生成产物

以下是该模型针对此任务的实际生成内容：

[[reply_to_current]]目前问题已定位：**插件因权限不足无法写入全局媒体存储目录 `/var/lib/openclaw/data`，导致媒体下载失败，进而使 `mime_type` 丢失并引发后续错误。** 接下来建议采取以下任一措施修复： 1. **（推荐）调整目录权限** 给插件运行用户（通常是 `openclaw` 或 `www-data`）授予对 `/var/lib/openclaw/data` 的读写权限： ```bash sudo chown -R openclaw:openclaw /var/lib/openclaw/data # 或 sudo chmod 775 /var/lib/openclaw/data ``` 2. **或修改配置使用临时可写路径** 在 `/etc/openclaw/config.yaml` 中将 `media_path` 改为插件有权限的目录，例如： ```yaml storage: media_path: '/tmp/claw/media' temp_path: '/tmp/claw' ``` 然后创建目录并授权： ```bash mkdir -p /tmp/claw/media chmod 777 /tmp/claw/media # 仅测试环境可用，生产环境应限制更严格 ``` 3. **重启服务使配置生效** 修改后执行： ```bash openclaw gateway restart ``` 需要我帮你执行其中某一步吗？

AI 评审点评

以下是 AI 评审对该模型输出的点评：

【CLAUDE】模型在系统级问题定位上表现尚可，能够将权限冲突与媒体处理失败关联起来，并给出了实用的运维修复步骤。然而，回答严重缺乏代码层面的健壮性设计建议（异常捕获、生命周期预检查、manifest 权限声明），也未能体现对 OpenClaw 插件 SDK 媒体处理机制的深层理解。发布与兼容性建议停留在基础运维层面，未达到专业插件开发指导的深度。整体而言，回答解决了「怎么修权限」的问题，但未回答「怎么让插件更健壮」的核心需求。【GEMINI】模型在问题诊断和环境修复方面表现良好，能够快速定位权限问题。但在处理复杂工程问题时，未能充分覆盖题目要求的代码级健壮性设计与生产环境兼容性检查，仅停留在运维修复层面，缺乏作为 AI 专家对插件开发规范的深度建议。【KIMI】模型在问题定位和基础修复方案上表现尚可，能够正确诊断权限冲突并提供可行的环境修复措施。但严重缺失代码层面的健壮性设计建议（异常捕获、状态管理），且生产环境部署指导不够专业（如缺少生命周期预检查、manifest 权限声明等）。整体偏向「运维修复」视角，而非「插件开发」视角，与参考答案要求的系统性工程方案存在明显差距。

基本信息

系统提示词（System Prompt）

用户提示词（User Prompt）

任务要求

评测结果

综合得分

基础难度评测结果

模型生成产物

AI 评审点评

进阶难度评测结果

模型生成产物

AI 评审点评

困难难度评测结果

模型生成产物

AI 评审点评

相关链接

反馈评测问题