Aller directement au contenu principal

openhands.agenthub.codeact_agent.codeact_agent

CodeActAgent Objects

class CodeActAgent(Agent)

VERSION

The Code Act Agent is a minimalist agent. The agent works by passing the model a list of action-observation pairs and prompting the model to take the next step.

Overview

This agent implements the CodeAct idea (paper, tweet) that consolidates LLM agents’ actions into a unified code action space for both simplicity and performance (see paper for more details).

The conceptual idea is illustrated below. At each turn, the agent can:

  1. Converse: Communicate with humans in natural language to ask for clarification, confirmation, etc.
  2. CodeAct: Choose to perform the task by executing code
  • Execute any valid Linux bash command
  • Execute any valid Python code with an interactive Python interpreter. This is simulated through bash command, see plugin system below for more details.

image

__init__

def __init__(llm: LLM, config: AgentConfig) -> None

Initializes a new instance of the CodeActAgent class.

Arguments:

  • llm (LLM): The llm to be used by this agent

get_action_message

def get_action_message(
action: Action,
pending_tool_call_action_messages: dict[str,
Message]) -> list[Message]

Converts an action into a message format that can be sent to the LLM.

This method handles different types of actions and formats them appropriately:

  1. For tool-based actions (AgentDelegate, CmdRun, IPythonRunCell, FileEdit) and agent-sourced AgentFinish:
  • In function calling mode: Stores the LLM's response in pending_tool_call_action_messages
  • In non-function calling mode: Creates a message with the action string
  1. For MessageActions: Creates a message with the text content and optional image content

Arguments:

  • action Action - The action to convert. Can be one of:
    • CmdRunAction: For executing bash commands
    • IPythonRunCellAction: For running IPython code
    • FileEditAction: For editing files
    • BrowseInteractiveAction: For browsing the web
    • AgentFinishAction: For ending the interaction
    • MessageAction: For sending messages
  • pending_tool_call_action_messages dict[str, Message] - Dictionary mapping response IDs to their corresponding messages. Used in function calling mode to track tool calls that are waiting for their results.

Returns:

  • list[Message] - A list containing the formatted message(s) for the action. May be empty if the action is handled as a tool call in function calling mode.

Notes:

In function calling mode, tool-based actions are stored in pending_tool_call_action_messages rather than being returned immediately. They will be processed later when all corresponding tool call results are available.

get_observation_message

def get_observation_message(
obs: Observation,
tool_call_id_to_message: dict[str, Message]) -> list[Message]

Converts an observation into a message format that can be sent to the LLM.

This method handles different types of observations and formats them appropriately:

  • CmdOutputObservation: Formats command execution results with exit codes
  • IPythonRunCellObservation: Formats IPython cell execution results, replacing base64 images
  • FileEditObservation: Formats file editing results
  • AgentDelegateObservation: Formats results from delegated agent tasks
  • ErrorObservation: Formats error messages from failed actions
  • UserRejectObservation: Formats user rejection messages

In function calling mode, observations with tool_call_metadata are stored in tool_call_id_to_message for later processing instead of being returned immediately.

Arguments:

  • obs Observation - The observation to convert
  • tool_call_id_to_message dict[str, Message] - Dictionary mapping tool call IDs to their corresponding messages (used in function calling mode)

Returns:

  • list[Message] - A list containing the formatted message(s) for the observation. May be empty if the observation is handled as a tool response in function calling mode.

Raises:

  • ValueError - If the observation type is unknown

reset

def reset() -> None

Resets the CodeAct Agent.

step

def step(state: State) -> Action

Performs one step using the CodeAct Agent. This includes gathering info on previous steps and prompting the model to make a command to execute.

Arguments:

  • state (State): used to get updated info

Returns:

  • CmdRunAction(command) - bash command to run
  • IPythonRunCellAction(code) - IPython code to run
  • AgentDelegateAction(agent, inputs) - delegate action for (sub)task
  • MessageAction(content) - Message action to run (e.g. ask for clarification)
  • AgentFinishAction() - end the interaction