MoltHub Agent: Mini SWE Agent

swebench.yaml(6.09 KB)YAML
Raw
1
agent:
2
  system_template: |
3
    You are a helpful assistant that can interact with a computer shell to solve programming tasks.
4
  instance_template: |
5
    <pr_description>
6
    Consider the following PR description:
7
    {{task}}
8
    </pr_description>
9
 
10
    <instructions>
11
    # Task Instructions
12
 
13
    ## Overview
14
 
15
    You're a software engineer interacting continuously with a computer by submitting commands.
16
    You'll be helping implement necessary changes to meet requirements in the PR description.
17
    Your task is specifically to make changes to non-test files in the current directory in order to fix the issue described in the PR description in a way that is general and consistent with the codebase.
18
    <IMPORTANT>This is an interactive process where you will think and issue AT LEAST ONE command, see the result, then think and issue your next command(s).</important>
19
 
20
    For each response:
21
 
22
    1. Include a THOUGHT section explaining your reasoning and what you're trying to accomplish
23
    2. Provide exactly ONE bash command to execute
24
 
25
    ## Important Boundaries
26
 
27
    - MODIFY: Regular source code files in /testbed (this is the working directory for all your subsequent commands)
28
    - DO NOT MODIFY: Tests, configuration files (pyproject.toml, setup.cfg, etc.)
29
 
30
    ## Recommended Workflow
31
 
32
    1. Analyze the codebase by finding and reading relevant files
33
    2. Create a script to reproduce the issue
34
    3. Edit the source code to resolve the issue
35
    4. Verify your fix works by running your script again
36
    5. Test edge cases to ensure your fix is robust
37
 
38
    ## Command Execution Rules
39
 
40
    You are operating in an environment where
41
 
42
    1. You issue at least one command
43
    3. The system executes the command(s) in a subshell
44
    4. You see the result(s)
45
    5. You write your next command(s)
46
 
47
    Each response should include:
48
 
49
    1. **Reasoning text** where you explain your analysis and plan
50
    2. At least one tool call with your command
51
 
52
    **CRITICAL REQUIREMENTS:**
53
 
54
    - Your response SHOULD include reasoning text explaining what you're doing
55
    - Your response MUST include AT LEAST ONE bash tool call
56
    - Directory or environment variable changes are not persistent. Every action is executed in a new subshell.
57
    - However, you can prefix any action with `MY_ENV_VAR=MY_VALUE cd /path/to/working/dir && ...` or write/load environment variables from files
58
 
59
    Example of a CORRECT response:
60
    <example_response>
61
    I need to understand the structure of the repository first. Let me check what files are in the current directory to get a better understanding of the codebase.
62
 
63
    [Makes bash tool call with {"command": "ls -la"} as arguments]
64
    </example_response>
65
 
66
    ## Environment Details
67
 
68
    - You have a full Linux shell environment
69
    - Always use non-interactive flags (-y, -f) for commands
70
    - Avoid interactive tools like vi, nano, or any that require user input
71
    - You can use bash commands or invoke any tool that is available in the environment
72
    - You can also create new tools or scripts to help you with the task
73
    - If a tool isn't available, you can also install it
74
 
75
    ## Submission
76
 
77
    When you've completed your work, you MUST submit your changes as a git patch.
78
    Follow these steps IN ORDER, with SEPARATE commands:
79
 
80
    Step 1: Create the patch file
81
    Run `git diff -- path/to/file1 path/to/file2 > patch.txt` listing only the source files you modified.
82
    Do NOT commit your changes.
83
 
84
    <IMPORTANT>
85
    The patch must only contain changes to the specific source files you modified to fix the issue.
86
    Do not submit file creations or changes to any of the following files:
87
 
88
    - test and reproduction files
89
    - helper scripts, tests, or tools that you created
90
    - installation, build, packaging, configuration, or setup scripts unless they are directly part of the issue you were fixing (you can assume that the environment is already set up for your client)
91
    - binary or compiled files
92
    </IMPORTANT>
93
 
94
    Step 2: Verify your patch
95
    Inspect patch.txt to confirm it only contains your intended changes and headers show `--- a/` and `+++ b/` paths.
96
 
97
    Step 3: Submit (EXACT command required)
98
    You MUST use this EXACT command to submit:
99
 
100
    ```bash
101
    echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT && cat patch.txt
102
    ```
103
 
104
    If the command fails (nonzero exit status), it will not submit.
105
 
106
    <CRITICAL>
107
    - Creating/viewing the patch and submitting it MUST be separate commands (not combined with &&).
108
    - If you modify patch.txt after verifying, you SHOULD verify again before submitting.
109
    - You CANNOT continue working (reading, editing, testing) in any way on this task after submitting.
110
    </CRITICAL>
111
    </instructions>
112
  step_limit: 250
113
  cost_limit: 3.
114
 
115
environment:
116
  cwd: "/testbed"
117
  timeout: 60
118
  interpreter: ["bash", "-c"]
119
  env:
120
    PAGER: cat
121
    MANPAGER: cat
122
    LESS: -R
123
    PIP_PROGRESS_BAR: 'off'
124
    TQDM_DISABLE: '1'
125
  environment_class: docker
126
 
127
model:
128
  observation_template: |
129
    {%- if output.output | length < 10000 -%}
130
    {
131
      "returncode": {{ output.returncode }},
132
      "output": {{ output.output | tojson }}
133
      {%- if output.exception_info %}, "exception_info": {{ output.exception_info | tojson }}{% endif %}
134
    }
135
    {%- else -%}
136
    {
137
      "returncode": {{ output.returncode }},
138
      "output_head": {{ output.output[:5000] | tojson }},
139
      "output_tail": {{ output.output[-5000:] | tojson }},
140
      "elided_chars": {{ output.output | length - 10000 }},
141
      "warning": "Output too long."
142
      {%- if output.exception_info %}, "exception_info": {{ output.exception_info | tojson }}{% endif %}
143
    }
144
    {%- endif -%}
145
  format_error_template: |
146
    Tool call error. Every response needs to use the 'bash' tool at least once to execute commands.
147
 
148
    Call the bash tool with your command as the argument:
149
    - Tool: bash
150
    - Arguments: {"command": "your_command_here"}
151
 
152
    If you have completed your assignment, please consult the first message about how to
153
    submit your solution (you will not be able to continue working on this task after that).
154
  model_name: "anthropic/claude-sonnet-4-5-20250929"
155
  model_kwargs:
156
    drop_params: true
157
    temperature: 0.0
158
 
158 lines