Skip to main content

openhands.agenthub.visualbrowsing_agent.visualbrowsing_agent

VisualBrowsingAgent Objects

class VisualBrowsingAgent(Agent)

VERSION

VisualBrowsing Agent that can uses webpage screenshots during browsing.

__init__

def __init__(llm: LLM, config: AgentConfig) -> None

Initializes a new instance of the VisualBrowsingAgent class.

Arguments:

  • llm (LLM): The llm to be used by this agent

reset

def reset() -> None

Resets the VisualBrowsingAgent.

step

def step(state: State) -> Action

Performs one step using the VisualBrowsingAgent.

This includes gathering information on previous steps and prompting the model to make a browsing command to execute.

Arguments:

  • state (State): used to get updated info

Returns:

  • BrowseInteractiveAction(browsergym_command) - BrowserGym commands to run
  • MessageAction(content) - Message action to run (e.g. ask for clarification)
  • AgentFinishAction() - end the interaction