After setting up TensorRT-LLM inference server in either single-node or multi-node configuration, you can deploy Open WebUI to interact with your models through Open WebUI. To get setup, just make sure the following is in order
Run the following command on the DGX Spark node where you have the TensorRT-LLM inference server running. For multi-node setup, this would be the primary node.
NOTE
If you used a different port for your OpenAI-compatible API server, adjust the OPENAI_API_BASE_URL="http://localhost:8355/v1" to match the IP and port of your TensorRT-LLM inference server.
docker run \
-d \
-e OPENAI_API_BASE_URL="http://localhost:8355/v1" \
-v open-webui:/app/backend/data \
--network host \
--add-host=host.docker.internal:host-gateway \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
This command:
Open your web browser and navigate to:
http://localhost:8080
You should see the Open WebUI interface at http://localhost:8080 where you can:
You can select your model(s) from the dropdown menu on the top left corner. That's all you need to do to start using Open WebUI with your deployed models.
NOTE
If accessing from a remote machine, replace localhost with your DGX Spark's IP address.
WARNING
This removes all chat data and may require re-uploading for future runs.
Remove the container by using the following command:
docker stop open-webui
docker rm open-webui
docker volume rm open-webui
docker rmi ghcr.io/open-webui/open-webui:main