Skip to main content

Local LLM with SGLang or vLLM

warning

When using a Local LLM, OpenHands may have limited functionality. It is highly recommended that you use GPUs to serve local models for optimal experience.

News

  • 2025/03/31: We released an open model OpenHands LM v0.1 32B that achieves 37.1% on SWE-Bench Verified (blog, model).

Download the Model from Huggingface

For example, to download OpenHands LM 32B v0.1:

huggingface-cli download all-hands/openhands-lm-32b-v0.1 --local-dir my_folder/openhands-lm-32b-v0.1

Create an OpenAI-Compatible Endpoint With a Model Serving Framework

Serving with SGLang

SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \
--model my_folder/openhands-lm-32b-v0.1 \
--served-model-name openhands-lm-32b-v0.1 \
--port 8000 \
--tp 2 --dp 1 \
--host 0.0.0.0 \
--api-key mykey --context-length 131072

Serving with vLLM

vllm serve my_folder/openhands-lm-32b-v0.1 \
--host 0.0.0.0 --port 8000 \
--api-key mykey \
--tensor-parallel-size 2 \
--served-model-name openhands-lm-32b-v0.1
--enable-prefix-caching

Run and Configure OpenHands

Run OpenHands

Using Docker

Run OpenHands using the official docker run command.

Using Development Mode

Use the instructions in Development.md to build OpenHands. Ensure config.toml exists by running make setup-config which will create one for you. In the config.toml, enter the following:

[core]
workspace_base="/path/to/your/workspace"

[llm]
embedding_model="local"
ollama_base_url="http://localhost:8000"

Start OpenHands using make run.

Configure OpenHands

Once OpenHands is running, you'll need to set the following in the OpenHands UI through the Settings:

  1. Enable Advanced options.
  2. Set the following:
  • Custom Model to openai/<served-model-name> (e.g. openai/openhands-lm-32b-v0.1)
  • Base URL to http://host.docker.internal:8000
  • API key to the same string you set when serving the model (e.g. mykey)