Skip to main content

Local LLMs

warning

When using a Local LLM, OpenHands may have limited functionality. It is highly recommended that you use GPUs to serve local models for optimal experience.

News

Quickstart: Running OpenHands on Your Macbook

Serve the model on your Macbook

We recommend using LMStudio for serving these models locally.

  1. Download LM Studio and install it

  2. Download the model:

    • Option 1: Directly download the LLM from this link or by searching for the name Devstral-Small-2505 in LM Studio
    • Option 2: Download a LLM in GGUF format. For example, to download Devstral Small 2505 GGUF, using huggingface-cli download mistralai/Devstral-Small-2505_gguf --local-dir mistralai/Devstral-Small-2505_gguf. Then in bash terminal, run lms import {model_name} in the directory where you've downloaded the model checkpoint (e.g. run lms import devstralQ4_K_M.gguf in mistralai/Devstral-Small-2505_gguf)
  3. Open LM Studio application, you should first switch to power user mode, and then open the developer tab:

image

  1. Then click Select a model to load on top of the application:

image

  1. And choose the model you want to use, holding option on mac to enable advanced loading options:

image

  1. You should then pick an appropriate context window for OpenHands based on your hardware configuration (larger than 32768 is recommended for using OpenHands, but too large may cause you to run out of memory); Flash attention is also recommended if it works on your machine.

image

  1. And you should start the server (if it is not already in Running status), un-toggle Serve on Local Network and remember the port number of the LMStudio URL (1234 is the port number for http://127.0.0.1:1234 in this example):

image

  1. Finally, you can click the copy button near model name to copy the model name (imported-models/uncategorized/devstralq4_k_m.gguf in this example):

image

Start OpenHands with locally served model

Check the installation guide to make sure you have all the prerequisites for running OpenHands.

export LMSTUDIO_MODEL_NAME="imported-models/uncategorized/devstralq4_k_m.gguf" # <- Replace this with the model name you copied from LMStudio
export LMSTUDIO_URL="http://host.docker.internal:1234" # <- Replace this with the port from LMStudio

docker pull docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik

mkdir -p ~/.openhands-state && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"lm_studio/'$LMSTUDIO_MODEL_NAME'","llm_api_key":"dummy","llm_base_url":"'$LMSTUDIO_URL/v1'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true,"user_consents_to_analytics":true}' > ~/.openhands-state/settings.json

docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.39

Once your server is running -- you can visit http://localhost:3000 in your browser to use OpenHands with local Devstral model:

Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f
Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.39
Starting OpenHands...
Running OpenHands as root
14:22:13 - openhands:INFO: server_config.py:50 - Using config class None
INFO: Started server process [8]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit)

Advanced: Serving LLM on GPUs

Download model checkpoints

note

The model checkpoints downloaded here should NOT be in GGUF format.

For example, to download OpenHands LM 32B v0.1:

huggingface-cli download all-hands/openhands-lm-32b-v0.1 --local-dir all-hands/openhands-lm-32b-v0.1

Create an OpenAI-Compatible Endpoint With SGLang

SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \
--model all-hands/openhands-lm-32b-v0.1 \
--served-model-name openhands-lm-32b-v0.1 \
--port 8000 \
--tp 2 --dp 1 \
--host 0.0.0.0 \
--api-key mykey --context-length 131072

Create an OpenAI-Compatible Endpoint with vLLM

vllm serve all-hands/openhands-lm-32b-v0.1 \
--host 0.0.0.0 --port 8000 \
--api-key mykey \
--tensor-parallel-size 2 \
--served-model-name openhands-lm-32b-v0.1
--enable-prefix-caching

Advanced: Run and Configure OpenHands

Run OpenHands

Using Docker

Run OpenHands using the official docker run command.

Using Development Mode

Use the instructions in Development.md to build OpenHands. Ensure config.toml exists by running make setup-config which will create one for you. In the config.toml, enter the following:

[core]
workspace_base="/path/to/your/workspace"

[llm]
model="openhands-lm-32b-v0.1"
ollama_base_url="http://localhost:8000"

Start OpenHands using make run.

Configure OpenHands

Once OpenHands is running, you'll need to set the following in the OpenHands UI through the Settings under the LLM tab:

  1. Enable Advanced options.
  2. Set the following:
  • Custom Model to openai/<served-model-name> (e.g. openai/openhands-lm-32b-v0.1)
  • Base URL to http://host.docker.internal:8000
  • API key to the same string you set when serving the model (e.g. mykey)