Skip to main content

๐Ÿง  Agents and Capabilities

CodeAct Agentโ€‹

Descriptionโ€‹

This agent implements the CodeAct idea (paper, tweet) that consolidates LLM agentsโ€™ actions into a unified code action space for both simplicity and performance (see paper for more details).

The conceptual idea is illustrated below. At each turn, the agent can:

  1. Converse: Communicate with humans in natural language to ask for clarification, confirmation, etc.
  2. CodeAct: Choose to perform the task by executing code
  • Execute any valid Linux bash command
  • Execute any valid Python code with an interactive Python interpreter. This is simulated through bash command, see plugin system below for more details.

image

Plugin Systemโ€‹

To make the CodeAct agent more powerful with only access to bash action space, CodeAct agent leverages OpenDevin's plugin system:

Demoโ€‹

https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac

Example of CodeActAgent with gpt-4-turbo-2024-04-09 performing a data science task (linear regression)

Actionsโ€‹

Action, CmdRunAction, IPythonRunCellAction, AgentEchoAction, AgentFinishAction, AgentTalkAction

Observationsโ€‹

CmdOutputObservation, IPythonRunCellObservation, AgentMessageObservation, UserMessageObservation

Methodsโ€‹

MethodDescription
__init__Initializes an agent with llm and a list of messages list[Mapping[str, str]]
stepPerforms one step using the CodeAct Agent. This includes gathering info on previous steps and prompting the model to make a command to execute.
search_memoryNot yet implemented

Monologue Agentโ€‹

Descriptionโ€‹

The Monologue Agent utilizes long and short term memory to complete tasks. Long term memory is stored as a LongTermMemory object and the model uses it to search for examples from the past. Short term memory is stored as a Monologue object and the model can condense it as necessary.

Actionsโ€‹

Action, NullAction, CmdRunAction, FileWriteAction, FileReadAction, AgentRecallAction, BrowseURLAction, GithubPushAction, AgentThinkAction

Observationsโ€‹

Observation, NullObservation, CmdOutputObservation, FileReadObservation, AgentRecallObservation, BrowserOutputObservation

Methodsโ€‹

MethodDescription
__init__Initializes the agent with a long term memory, and an internal monologue
_add_eventAppends events to the monologue of the agent and condenses with summary automatically if the monologue is too long
_initializeUtilizes the INITIAL_THOUGHTS list to give the agent a context for its capabilities and how to navigate the /workspace
stepModifies the current state by adding the most recent actions and observations, then prompts the model to think about its next action to take.
search_memoryUses VectorIndexRetriever to find related memories within the long term memory.

Planner Agentโ€‹

Descriptionโ€‹

The planner agent utilizes a special prompting strategy to create long term plans for solving problems. The agent is given its previous action-observation pairs, current task, and hint based on last action taken at every step.

Actionsโ€‹

NullAction, CmdRunAction, CmdKillAction, BrowseURLAction, GithubPushAction, FileReadAction, FileWriteAction, AgentRecallAction, AgentThinkAction, AgentFinishAction, AgentSummarizeAction, AddTaskAction, ModifyTaskAction,

Observationsโ€‹

Observation, NullObservation, CmdOutputObservation, FileReadObservation, AgentRecallObservation, BrowserOutputObservation

Methodsโ€‹

MethodDescription
__init__Initializes an agent with llm
stepChecks to see if current step is completed, returns AgentFinishAction if True. Otherwise, creates a plan prompt and sends to model for inference, adding the result as the next action.
search_memoryNot yet implemented