openhands.agenthub.codeact_agent.codeact_agent
CodeActAgent Objects
class CodeActAgent(Agent)
VERSION
The Code Act Agent is a minimalist agent. The agent works by passing the model a list of action-observation pairs and prompting the model to take the next step.
Overview
This agent implements the CodeAct idea (paper, tweet) that consolidates LLM agents’ actions into a unified code action space for both simplicity and performance (see paper for more details).
The conceptual idea is illustrated below. At each turn, the agent can:
- Converse: Communicate with humans in natural language to ask for clarification, confirmation, etc.
- CodeAct: Choose to perform the task by executing code
- Execute any valid Linux
bash
command - Execute any valid
Python
code with an interactive Python interpreter. This is simulated throughbash
command, see plugin system below for more details.
__init__
def __init__(llm: LLM, config: AgentConfig) -> None
Initializes a new instance of the CodeActAgent class.
Arguments:
- llm (LLM): The llm to be used by this agent
get_action_message
def get_action_message(
action: Action,
pending_tool_call_action_messages: dict[str,
Message]) -> list[Message]
Converts an action into a message format that can be sent to the LLM.
This method handles different types of actions and formats them appropriately:
- For tool-based actions (AgentDelegate, CmdRun, IPythonRunCell, FileEdit) and agent-sourced AgentFinish:
- In function calling mode: Stores the LLM's response in pending_tool_call_action_messages
- In non-function calling mode: Creates a message with the action string
- For MessageActions: Creates a message with the text content and optional image content
Arguments:
action
Action - The action to convert. Can be one of:- CmdRunAction: For executing bash commands
- IPythonRunCellAction: For running IPython code
- FileEditAction: For editing files
- BrowseInteractiveAction: For browsing the web
- AgentFinishAction: For ending the interaction
- MessageAction: For sending messages
pending_tool_call_action_messages
dict[str, Message] - Dictionary mapping response IDs to their corresponding messages. Used in function calling mode to track tool calls that are waiting for their results.
Returns:
list[Message]
- A list containing the formatted message(s) for the action. May be empty if the action is handled as a tool call in function calling mode.
Notes:
In function calling mode, tool-based actions are stored in pending_tool_call_action_messages rather than being returned immediately. They will be processed later when all corresponding tool call results are available.
get_observation_message
def get_observation_message(
obs: Observation,
tool_call_id_to_message: dict[str, Message]) -> list[Message]
Converts an observation into a message format that can be sent to the LLM.
This method handles different types of observations and formats them appropriately:
- CmdOutputObservation: Formats command execution results with exit codes
- IPythonRunCellObservation: Formats IPython cell execution results, replacing base64 images
- FileEditObservation: Formats file editing results
- AgentDelegateObservation: Formats results from delegated agent tasks
- ErrorObservation: Formats error messages from failed actions
- UserRejectObservation: Formats user rejection messages
In function calling mode, observations with tool_call_metadata are stored in tool_call_id_to_message for later processing instead of being returned immediately.
Arguments:
obs
Observation - The observation to converttool_call_id_to_message
dict[str, Message] - Dictionary mapping tool call IDs to their corresponding messages (used in function calling mode)
Returns:
list[Message]
- A list containing the formatted message(s) for the observation. May be empty if the observation is handled as a tool response in function calling mode.
Raises:
ValueError
- If the observation type is unknown
reset
def reset() -> None
Resets the CodeAct Agent.
step
def step(state: State) -> Action
Performs one step using the CodeAct Agent. This includes gathering info on previous steps and prompting the model to make a command to execute.
Arguments:
- state (State): used to get updated info
Returns:
- CmdRunAction(command) - bash command to run
- IPythonRunCellAction(code) - IPython code to run
- AgentDelegateAction(agent, inputs) - delegate action for (sub)task
- MessageAction(content) - Message action to run (e.g. ask for clarification)
- AgentFinishAction() - end the interaction