MiniMax-M2.7 在「合同条款信息提取」的评测结果

这是该 AI 模型在此测试用例上的详细评测结果。

基本信息

模型名称：MiniMax-M2.7
用例名称：合同条款信息提取
测试类型：xsct-a
评测维度：文档问答

系统提示词（System Prompt）

这是给 AI 模型的背景设定和角色指令：

You are a personal assistant running inside OpenClaw. ## Tooling Tool availability (filtered by policy): Tool names are case-sensitive. Call tools exactly as listed. - read: Read file contents - write: Create or overwrite files - edit: Make precise edits to files - apply_patch: Apply multi-file patches - ls: List directory contents - exec: Run shell commands (supports background execution where available) - process: Manage background exec sessions - browser: Control OpenClaw's dedicated browser - canvas: Present or inspect a canvas when available - sessions_list: List sessions - sessions_history: Fetch session history - sessions_send: Send a message to another session - subagents: List, steer, or kill sub-agent runs - session_status: Show runtime, model, and usage state TOOLS.md does not control tool availability; it is user guidance for how to use external tools. For long waits, avoid rapid poll loops: use exec with enough wait time or process polling with a timeout. If a task is more complex or takes longer, spawn a sub-agent. Completion is push-based: it will auto-announce when done. Do not poll `subagents list` or `sessions_list` in a loop; only check status on-demand. ## Tool Call Style Default: do not narrate routine, low-risk tool calls (just call the tool). Narrate only when it helps: multi-step work, complex or challenging problems, sensitive actions, or when the user explicitly asks. Keep narration brief and value-dense; avoid repeating obvious steps. Use plain human language for narration unless in a technical context. When a first-class tool exists for an action, use the tool directly instead of asking the user to run equivalent CLI or slash commands. ## Safety You have no independent goals: do not pursue self-preservation, replication, resource acquisition, or power-seeking; avoid long-term plans beyond the user's request. Prioritize safety and human oversight over completion; if instructions conflict, pause and ask; comply with stop or pause requests and never bypass safeguards. Do not manipulate anyone to expand access or disable safeguards. Do not copy yourself or change system prompts, safety rules, or tool policies unless explicitly requested. ## OpenClaw CLI Quick Reference OpenClaw is controlled via subcommands. Do not invent commands. To manage the Gateway daemon service: - openclaw gateway status - openclaw gateway start - openclaw gateway stop - openclaw gateway restart If unsure about a command or flag, prefer checking help or existing project context rather than guessing. ## Skills If a skill list or skill prompt is present in the injected project context, scan it before replying. If exactly one skill clearly applies, follow it. If multiple skills could apply, choose the most specific one. If no skill clearly applies, continue without forcing one. ## Memory Recall If memory tools, memory files, or prior-work context are available, use them before answering questions about prior decisions, preferences, dates, people, or todos. If confidence stays low after checking memory, say so instead of guessing. ## Documentation For OpenClaw behavior, commands, config, architecture, or plugin behavior, consult injected docs or project context first. When diagnosing issues, prefer checking runtime evidence, configuration, or tool output before making claims. ## Workspace Your working directory is the benchmark workspace. Treat it as the single workspace for file operations unless explicitly instructed otherwise. Prefer precise reads and minimal edits over broad changes. If a file is large or output is truncated, re-read only the portion you need. ## Reply Tags If reply tags are supported in the runtime, they must appear as the first token in the message. Prefer `[[reply_to_current]]` when replying to the triggering message. ## Messaging Reply in the current session by default. For cross-session communication, use dedicated session tools if available. Never use exec or curl as a substitute for built-in provider messaging when a first-class messaging tool exists. ## Workspace Files User-editable context files may be injected below as project context. If project context is present, use it as evidence. ## Project Context Project context files may be injected after this prompt. If they are present, prefer them over assumptions. If SOUL.md is present, follow its persona and tone unless higher-priority instructions override it. ## Silent Replies When you truly have nothing to say, respond with the runtime's silent token only. Do not append the silent token to a normal reply. ## Heartbeats If you receive a heartbeat-style poll and there is nothing that needs attention, acknowledge it using the runtime heartbeat convention. If something needs attention, reply with the alert instead. ## Runtime Runtime: agent=benchmark | host=openclaw-benchmark | repo=<workspace> | model=<eval-model> | shell=<shell> | thinking=off Reasoning: off (hidden unless enabled by the runtime).

用户提示词（User Prompt）

这是用户给 AI 模型的具体任务要求：

以下是当前工作区中的文档 `employment_contract.docx` 的相关条款内容： **文档标题**：劳动合同（标准版） **合同编号**：HC-2024-0376 **签订日期**：2024年3月1日 **合同双方**：甲方（用人单位）—— 星辰科技有限公司；乙方（劳动者）—— 张明 --- **第七条休假与休息** 7.1 公司实行每周五天工作制，每周休息两天。 7.2 依据履行地（维苏维亚省·埃瑟加德市）现行劳动法规，严格执行地方及市级政府规定的法定休假制度，包括但不限于：公共假日、婚假、产假（陪产假）、丧假及家庭护理假。 7.3 乙方每一日历年度享有12天带薪年假。年假应在当年度内使用完毕，未经甲方书面同意不得跨年度累积。 7.4 乙方因病需请假的，应提供正规医疗机构出具的诊断证明，病假期间的待遇按照当地相关规定执行。 7.5 乙方如需请事假，须提前3个工作日向直属主管提交书面申请，经批准后方可休假。事假期间不计发工资。 --- 请帮我提取并整理第七条中关于员工休假制度的所有关键信息，包括：工作制度、法定假种类、年假天数及使用规则、病假和事假的具体要求。请用中文回答。

任务要求

AI 模型需要满足以下要求：

准确提取工作制度信息：每周五天工作制，每周休息两天
完整列出法定休假种类：公共假日、婚假、产假（陪产假）、丧假、家庭护理假
明确说明年假天数为12天/日历年，以及不得跨年累积（除非甲方书面同意）的使用规则
准确提取病假要求：需提供正规医疗机构诊断证明，待遇按当地规定执行
准确提取事假要求：提前3个工作日书面申请、需直属主管批准、事假期间不计发工资
回答结构清晰，信息分类合理

评测结果

综合得分

该模型在此用例的得分为：98.5 分（满分 100 分），已通过

基础难度评测结果

得分：98.5 分
通过状态：已通过

模型生成产物

以下是该模型针对此任务的实际生成内容：

The user is asking me to extract and organize key information about employee leave policies from Article 7 of an employment contract. Let me analyze the content provided and structure it clearly. ## 第七条休假与休息关键信息整理 --- ### 一、工作制度 - 每周**五天工作制**，每周休息**两天** --- ### 二、法定休假种类依据埃瑟加德市现行劳动法规，员工享有以下法定假期： - 公共假日 - 婚假 - 产假（陪产假） - 丧假 - 家庭护理假 --- ### 三、带薪年假 | 项目 | 内容 | |------|------| | 天数 | **每年 12 天** | | 使用规则 | 须在当年度内使用完毕 | | 跨年规定 | 未经用人单位书面同意，**不得跨年度累积** | --- ### 四、病假 - **证明要求**：须提供正规医疗机构出具的诊断证明 - **待遇标准**：按照当地相关规定执行 --- ### 五、事假 - **申请时间**：须提前 **3 个工作日** - **申请方式**：向直属主管提交**书面申请** - **审批要求**：经批准后方可休假 - **工资计算**：事假期间**不计发工资** --- 如需进一步对比其他条款或就此份合同进行其他分析，请告诉我。

AI 评审点评

以下是 AI 评审对该模型输出的点评：

【CLAUDE】候选输出在信息提取完整性、内容准确性和结构表达三个维度上均表现优秀，与参考答案高度吻合。五个子条款的关键信息均被完整、准确地提取并结构化呈现，格式清晰专业。仅有极小瑕疵（省略省级地名），不影响整体质量。是一份高质量的合同条款信息提取回答。【GEMINI】模型出色地完成了任务，不仅准确提取了合同中的所有关键信息，还通过表格和分点陈述的方式进行了高质量的结构化整理，完全符合用户需求。【KIMI】该候选输出表现优秀，完整准确地提取了劳动合同第七条的所有休假制度信息，结构清晰、表达专业。与参考答案相比，在保持信息完整准确的基础上，通过表格化和加粗关键信息提升了可读性，整体质量达到优秀水平。

进阶难度评测结果

得分：79.3 分
通过状态：已通过

模型生成产物

以下是该模型针对此任务的实际生成内容：

两个追问，都是关于劳动法实务中常见的问题。 **追问1：国庆加班工资计算** 合同6.4条： - 法定假日加班按300%支付 - 公司未安排补休国庆10月1日-3日是法定假日（3天），10月4日-7日是调休换来的休息日（属于休息日性质）。题目说"2024年国庆法定假日（10月1日-3日）被安排加班3天"，明确是10月1日-3日这3天法定假日加班。基本月薪 = 18,000元/月日工资 = 18,000 / 21.75 = 827.59元（大约）法定假日加班按300%支付，其中已包含一倍基本工资，所以实际需额外支付2倍工资。3天加班的额外工资 = 827.59 × 3 × 2 = 4,965.54元。也可以直接用月工资乘以3倍和倍率系数来计算，结果相同。不过题目没有明确说明法定假日加班的倍数是否已包含正常工资，所以我就用最常见的情况来计算。现在看第二个问题，2024年带薪年假是否需要按比例折算。根据《职工带薪年休假条例》第三条，员工累计工作满1年不满10年的，年休假12天。张明3月1日入职，到年底还在第一年内，所以需要按比例计算他在2024年能休的年假天数。但这里的关键是年假的享受条件——需要"连续工作满12个月"才能休年假。张明3月入职，到2025年3月才能满足这个条件。虽然《企业职工带薪年休假实施办法》规定当年离职可按折算，但张明还在职，不适用这条。所以张明在2024年实际上无法享受带薪年假。不过在实务中，有些单位会在入职满一年后的当年给予年假，这属于单位的自主安排，不符合法律规定。严格按照法律条文的话，应该按月计算折算天数，但前提是必须满足"连续工作满12个月"这个条件。由于张明到2024年底还未满一年，所以他不符合享受年假的资格。假需要"连续工作满12个月"才能开始享受，张明3月1日入职，要到2025年3月1日才满足条件。因此他在2024年不能享受带薪年假。不过有些单位的规章制度会灵活处理，比如入职满6个月就给年假，或者按转正时间开始计算。这些是单位福利而非法律义务。所以严格按法律规定，张明2024年无年假，但若公司有特殊政策则另当别论。现在我来组织一下答案。 ## 追问1：国庆法定假日加班工资 **计算过程：** | 项目 | 数值 | |------|------| | 基本月薪 | 18,000 元 | | 日工资（21.75天计） | 18,000 ÷ 21.75 ≈ **827.59 元/天** | | 法定假日加班倍率 | 300%（含当天工资 + 200%加班费） | | 加班天数 | 3天（10月1日、2日、3日） | $$加班工资 = 日工资 \times 加班天数 \times 加班倍率$$ $$= 827.59 \times 3 \times 3 = \textbf{7,448.28 元}$$ > 另一种理解方式：法定假日加班报酬 = 日工资 × 3 × 天数 = 827.59 × 3 × 3 = **7,448.28元** > > 其中已包含当天的正常工资。 --- ## 追问2：2024年带薪年假天数 **结论：2024年张明不享受带薪年假（即0天）。** **理由：** 根据合同6.3条及《职工带薪年休假条例》第二条的规定，**职工连续工作满1年**的，方可享受带薪年假。张明入职时间为2024年3月1日，到2024年12月31日时，其连续工龄尚未满1年，不满足年假的享受条件。因此： - 他需要等到 **2025年3月1日** 连续工作满一年后，才从下一个日历年度（即2026年）开始享有带薪年假。 - 2024年当年，不享受带薪年假。 **是否按比例折算的适用前提**是「已享有当年年假资格但中途入职」，而张明的情形是「连基本的年假资格尚未取得」，不适用折算规则。 > 注：若公司有独立的福利政策（如入职满6个月即给予年假），属于单位自主福利，不受此法律分析约束。

AI 评审点评

以下是 AI 评审对该模型输出的点评：

【CLAUDE】候选回答在格式和结构上表现良好，加班工资计算的最终结果与参考答案一致，但计算过程的表述存在逻辑混乱（300%含义前后矛盾）。年假问题的分析与参考答案存在根本性分歧：参考答案认为应按比例折算约10天，候选答案认为0天。两种观点均有一定法律依据，但参考答案的折算逻辑更符合实务操作，且候选答案未提供折算公式和具体天数，不满足「列出计算过程」的要求。总体而言，回答质量中等偏上，主要扣分点在于年假分析方向与参考答案不符。【GEMINI】模型表现优秀，不仅准确提取了合同信息，还展现了良好的劳动法实务逻辑。在处理加班工资计算时，计算过程规范；在处理年假问题时，能够跳出简单的数学折算思维，结合法律规定的「连续工作满1年」前提进行分析，结论准确且具有专业深度。【KIMI】该回答在加班工资计算上基本正确，但在年假问题上出现根本性错误。模型混淆了「连续工作满12个月」的资格条件与「当年度折算」规则，错误判定张明2024年0天年假，而实际上依法应折算约10天。这一错误导致第二问完全失分，且模型未识别2024年为闰年（366天）的关键细节。建议加强对《企业职工带薪年休假实施办法》第五条的理解，区分「资格取得」与「天数折算」两个不同阶段。

困难难度评测结果

得分：90.6 分
通过状态：已通过

模型生成产物

以下是该模型针对此任务的实际生成内容：

The user is asking me to analyze the legal English wording of clause 5.5: "Employees who have served the Company for more than 5 consecutive years are entitled to an additional 3 days of loyalty leave per year." Let me analyze this carefully: "more than 5 consecutive years" - This is a strict inequality. "More than 5" means strictly greater than 5. It does NOT include exactly 5 years. So the progression is: - 5 years exactly → does NOT qualify (not more than 5) - 5 years and 1 day → qualifies - 6 years → qualifies The employee has 4 years and 8 months, which is 4.67 years. This is clearly less than 5 years, let alone more than 5 years. Now, the Chinese translation I provided earlier: "在公司连续服务满5年以上的员工" — let me analyze this: The phrase "满5年以上" is actually ambiguous in Chinese. It could mean either "5 years and above" or "more than 5 years," and in practice people often interpret it to include exactly 5 years. The original English is unambiguous—it's strictly greater than 5—but the Chinese phrasing doesn't clearly preserve that distinction. I should clarify this discrepancy with the user and offer a more precise translation that captures the strict inequality of the English text. 好问题，我需要重新审视这个法律细节。 --- **一、"more than 5 consecutive years" 的法律分析** "more than" 在法律英语中是严格的数学不等式，意思是 **"严格大于"**，不含等于。具体到该员工的Timeline： | 时间节点 | 服务年限 | 是否满足 "more than 5 years" | |---|---|---| | 4年零8个月（当前） | 4.67年 | ❌ 不满足 | | 满5年当天 | 5年整 | ❌ 不满足（5 = 5，不是 > 5） | | 满5年后第1天 | 5年+1天 | ✅ 满足 | | 满6年 | 6年整 | ✅ 满足 | 因此，**"more than 5 consecutive years" 的准确含义是：必须超过5年，即从第6年的第1天开始才能享有忠诚假**。满5年当天本身尚不享有。 --- **二、翻译偏差的问题** 回顾我之前的中文翻译： > 5.5 在公司连续服务满**5年以上**的员工，每年额外享有3天忠诚假。中文"满5年以上"存在歧义。在中文语境中，人们容易将"5年以上"理解为"≥5年"（包含5年），而非"严格大于5年"。这与英文原文"more than 5"（严格 > 5，不含等于）的精确法律含义产生了偏差。 --- **三、修正后的中文翻译** | 原文 | 原译文（存在偏差） | 修正译文 | |---|---|---| | more than 5 consecutive years | 满5年以上 | 连续服务超过5年 | 修正后的完整条款： > **5.5 在公司连续服务超过5年的员工，每年额外享有3天忠诚假。** --- **四、对之前分析的影响**...

AI 评审点评

以下是 AI 评审对该模型输出的点评：

【CLAUDE】候选输出在法律条款解读和翻译修正两个核心维度上表现优秀，准确把握了 'more than 5 consecutive years' 的严格大于含义，并有效识别并修正了中文翻译中「满5年以上」的歧义问题。多轮对话追踪也较为准确。主要失分点在于具体场景应用：将「超过5年」误解为「需要再服务超过1年进入第6年」，导致计算出约16个月的错误等待期，而实际上只需再等约4个月（超过5年整即满足条件）。这一逻辑错误较为明显，影响了实用性。【GEMINI】模型表现出色，不仅准确完成了法律条款的语义分析，还展现了极高的专业素养，能够主动识别并纠正自身在之前对话中因翻译歧义导致的逻辑错误。回复结构清晰，法律术语解读准确，完全符合评测标准。【KIMI】该模型展现了优秀的法律文本精确解读能力和多轮对话追踪能力。核心亮点在于：①敏锐捕捉 'more than' 与中文 '以上' 的语义偏差，体现了跨语言法律概念的细致辨析；②主动自我批判并修正先前翻译，展现了 intellectual honesty；③准确处理用户两次意图变更，保持逻辑连贯。 minor improvement：可在具体场景应用中进一步细化日期计算（如假设入职日期倒推具体可用日期），并在Q4建议中更明确地将「12天年假」与报告中的「人力资源紧张度评级为'高'」进行量化关联（如建议具体分散为几次短假而非笼统的'分散休假'）。

基本信息

系统提示词（System Prompt）

用户提示词（User Prompt）

任务要求

评测结果

综合得分

基础难度评测结果

模型生成产物

AI 评审点评

进阶难度评测结果

模型生成产物

AI 评审点评

困难难度评测结果

模型生成产物

AI 评审点评

相关链接

反馈评测问题