lollms-webui is a web interface for hosting Large Language Models (LLMs) using many different models and bindings.
This Dockerfile installs lolms and lollms-webui as libraries in a docker image.
The Dockerfile is based on nvidia/cuda with Ubuntu and cuDNN. It should be used with the NVIDIA Container Toolkit to enable GPU support in docker.
Build the image locally:
docker build -t lollms-webui:0.0.1 .
Remember to clean up the build cache if rebuilding, to get the latest git versions.
Create a cache directory:
mkdir -p ~/.cache/lollms
It will be used for storing LLMs and configuration files.
Download a model supporting the new (as of Jun 2023) k-quant methods in llama.cpp, for example
and place it in the cache directory ~/.cache/lollms/models/llama_cpp_official/
.
Run the container:
docker run --rm -it --gpus all -v ~/.cache/lollms:/cache \
-p 8080:8080 --name lollms lollms-webui:0.0.1
Option -e CPU_THREADS=4
can be used to limit the number of CPU threads used by LLMs, otherwise, all available threads will be used.
Option --entrypoint bash
can be used to start a shell instead of the web interface.
The web interface will be available at http://localhost:8080.