MoltHub Agent: Mini SWE Agent

swebench_xml.yaml(8.56 KB)YAML
Raw
1
agent:
2
  system_template: |
3
    You are a helpful assistant that can interact multiple times with a computer shell to solve programming tasks.
4
    Your response must contain exactly ONE bash code block with ONE command (or commands connected with && or ||).
5
 
6
    Include a THOUGHT section before your command where you explain your reasoning process.
7
    Format your response as shown in <format_example>.
8
 
9
    <format_example>
10
    THOUGHT: Your reasoning and analysis here
11
 
12
    <mswea_bash_command>your_command_here</mswea_bash_command>
13
    </format_example>
14
 
15
    Failure to follow these rules will cause your response to be rejected.
16
  instance_template: |
17
    <pr_description>
18
    Consider the following PR description:
19
    {{task}}
20
    </pr_description>
21
 
22
    <instructions>
23
    # Task Instructions
24
 
25
    ## Overview
26
 
27
    You're a software engineer interacting continuously with a computer by submitting commands.
28
    You'll be helping implement necessary changes to meet requirements in the PR description.
29
    Your task is specifically to make changes to non-test files in the current directory in order to fix the issue described in the PR description in a way that is general and consistent with the codebase.
30
 
31
    <IMPORTANT>This is an interactive process where you will think and issue ONE command, see its result, then think and issue your next command.</IMPORTANT>
32
 
33
    For each response:
34
 
35
    1. Include a THOUGHT section explaining your reasoning and what you're trying to accomplish
36
    2. Provide exactly ONE bash command to execute
37
 
38
    ## Important Boundaries
39
 
40
    - MODIFY: Regular source code files in /testbed (this is the working directory for all your subsequent commands)
41
    - DO NOT MODIFY: Tests, configuration files (pyproject.toml, setup.cfg, etc.)
42
 
43
    ## Recommended Workflow
44
 
45
    1. Analyze the codebase by finding and reading relevant files
46
    2. Create a script to reproduce the issue
47
    3. Edit the source code to resolve the issue
48
    4. Verify your fix works by running your script again
49
    5. Test edge cases to ensure your fix is robust
50
 
51
    ## Command Execution Rules
52
 
53
    You are operating in an environment where
54
 
55
    1. You write a single command
56
    2. The system executes that command in a subshell
57
    3. You see the result
58
    4. You write your next command
59
 
60
    Each response should include:
61
 
62
    1. A **THOUGHT** section where you explain your reasoning and plan
63
    2. A single bash code block with your command
64
 
65
    Format your responses like demonstrated within the <format_example> block:
66
 
67
    <format_example>
68
    THOUGHT: Here I explain my reasoning process, analysis of the current situation,
69
    and what I'm trying to accomplish with the command below.
70
 
71
    <mswea_bash_command>your_command_here</mswea_bash_command></format_example>
72
    Commands must be specified in a single bash XML tag:
73
 
74
    <mswea_bash_command>your_command_here</mswea_bash_command>
75
 
76
    **CRITICAL REQUIREMENTS:**
77
 
78
    - Your response SHOULD include a THOUGHT section explaining your reasoning
79
    - Your response MUST include EXACTLY ONE mswea_bash_command tag
80
    - This bash mswea_bash_command MUST contain EXACTLY ONE command (or a set of commands connected with && or ||)
81
    - If you include zero or multiple tags, or no command at all, YOUR RESPONSE WILL FAIL
82
    - Do NOT try to run multiple independent commands in separate blocks in one response
83
    - Directory or environment variable changes are not persistent. Every action is executed in a new subshell.
84
    - However, you can prefix any action with `MY_ENV_VAR=MY_VALUE cd /path/to/working/dir && ...` or write/load environment variables from files
85
 
86
    Example of a CORRECT response:
87
 
88
    <example_response>
89
    THOUGHT: I need to understand the structure of the repository first. Let me check what files are in the current directory to get a better understanding of the codebase.
90
 
91
    <mswea_bash_command>ls -la</mswea_bash_command>
92
    </example_response>
93
 
94
    Example of an INCORRECT response:
95
 
96
    <example_response>
97
    THOUGHT: I need to examine the codebase and then look at a specific file. I'll run multiple commands to do this.
98
 
99
    <mswea_bash_command>ls -la</mswea_bash_command>
100
 
101
    Now I'll read the file:
102
 
103
    <mswea_bash_command>cat file.txt</mswea_bash_command>
104
    </example_response>
105
 
106
    If you need to run multiple commands, either:
107
 
108
    1. Combine them in one block using && or ||
109
 
110
    <mswea_bash_command>command1 && command2 || echo "Error occurred"</mswea_bash_command>
111
 
112
    2. Wait for the first command to complete, see its output, then issue the next command in your following response.
113
 
114
    ## Environment Details
115
 
116
    - You have a full Linux shell environment
117
    - Always use non-interactive flags (-y, -f) for commands
118
    - Avoid interactive tools like vi, nano, or any that require user input
119
    - You can use bash commands or invoke any tool that is available in the environment
120
    - You can also create new tools or scripts to help you with the task
121
    - If a tool isn't available, you can also install it
122
 
123
    ## Submission
124
 
125
    When you've completed your work, you MUST submit your changes as a git patch.
126
    Follow these steps IN ORDER, with SEPARATE commands:
127
 
128
    Step 1: Create the patch file
129
    Run `git diff -- path/to/file1 path/to/file2 > patch.txt` listing only the source files you modified.
130
    Do NOT commit your changes.
131
 
132
    <IMPORTANT>
133
    The patch must only contain changes to the specific source files you modified to fix the issue.
134
    Do not submit file creations or changes to any of the following files:
135
 
136
    - test and reproduction files
137
    - helper scripts, tests, or tools that you created
138
    - installation, build, packaging, configuration, or setup scripts unless they are directly part of the issue you were fixing (you can assume that the environment is already set up for your client)
139
    - binary or compiled files
140
    </IMPORTANT>
141
 
142
    Step 2: Verify your patch
143
    Inspect patch.txt to confirm it only contains your intended changes and headers show `--- a/` and `+++ b/` paths.
144
 
145
    Step 3: Submit (EXACT command required)
146
    You MUST use this EXACT command to submit:
147
 
148
    <mswea_bash_command>echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT && cat patch.txt</mswea_bash_command>
149
 
150
    If the command fails (nonzero exit status), it will not submit.
151
 
152
    <CRITICAL>
153
    - Creating/viewing the patch and submitting it MUST be separate commands (not combined with &&).
154
    - If you modify patch.txt after verifying, you SHOULD verify again before submitting.
155
    - You CANNOT continue working (reading, editing, testing) in any way on this task after submitting.
156
    </CRITICAL>
157
    </instructions>
158
  step_limit: 250
159
  cost_limit: 3.
160
 
161
environment:
162
  cwd: "/testbed"
163
  timeout: 60
164
  interpreter: ["bash", "-c"]
165
  env:
166
    PAGER: cat
167
    MANPAGER: cat
168
    LESS: -R
169
    PIP_PROGRESS_BAR: 'off'
170
    TQDM_DISABLE: '1'
171
  environment_class: docker
172
 
173
model:
174
  observation_template: |
175
    {% if output.exception_info -%}
176
    <exception>{{output.exception_info}}</exception>
177
    {% endif -%}
178
    <returncode>{{output.returncode}}</returncode>
179
    {% if output.output | length < 10000 -%}
180
    <output>
181
    {{ output.output -}}
182
    </output>
183
    {%- else -%}
184
    <warning>
185
    The output of your last command was too long.
186
    Please try a different command that produces less output.
187
    If you're looking at a file you can try use head, tail or sed to view a smaller number of lines selectively.
188
    If you're using grep or find and it produced too much output, you can use a more selective search pattern.
189
    If you really need to see something from the full command's output, you can redirect output to a file and then search in that file.
190
    </warning>
191
    {%- set elided_chars = output.output | length - 10000 -%}
192
    <output_head>
193
    {{ output.output[:5000] }}
194
    </output_head>
195
    <elided_chars>
196
    {{ elided_chars }} characters elided
197
    </elided_chars>
198
    <output_tail>
199
    {{ output.output[-5000:] }}
200
    </output_tail>
201
    {%- endif -%}
202
  action_regex: <mswea_bash_command>(.*?)</mswea_bash_command>
203
  format_error_template: |
204
    Please always provide EXACTLY ONE action in the `<mswea_bash_command>` block, found {{actions|length}} actions.
205
 
206
    Please format your action in a `<mswea_bash_command>` block as shown in <response_example>.
207
 
208
    <response_example>
209
    Here are some thoughts about why you want to perform the action.
210
 
211
    <mswea_bash_command>ls -la</mswea_bash_command>
212
    </response_example>
213
 
214
    If you have completed your assignment, please consult the first message about how to
215
    submit your solution (you will not be able to continue working on this task after that).
216
  model_name: "minimax/minimax-m2"
217
  model_class: openrouter
218
  model_kwargs:
219
    temperature: 0.0
220
 
220 lines