openhands.agenthub.visualbrowsing_agent.visualbrowsing_agent
VisualBrowsingAgent Objects
class VisualBrowsingAgent(Agent)
VERSION
VisualBrowsing Agent that can uses webpage screenshots during browsing.
__init__
def __init__(llm: LLM, config: AgentConfig) -> None
Initializes a new instance of the VisualBrowsingAgent class.
Arguments:
- llm (LLM): The llm to be used by this agent
reset
def reset() -> None
Resets the VisualBrowsingAgent.
step
def step(state: State) -> Action
Performs one step using the VisualBrowsingAgent.
This includes gathering information on previous steps and prompting the model to make a browsing command to execute.
Arguments:
- state (State): used to get updated info
Returns:
- BrowseInteractiveAction(browsergym_command) - BrowserGym commands to run
- MessageAction(content) - Message action to run (e.g. ask for clarification)
- AgentFinishAction() - end the interaction