“Today I Learned” are instructions-focused posts about things I’ve learned.
Llamafiles are these cool little files that have llama.cpp and model weights embedded, and can run on Mac/Linux/Windows. Cool, let’s make some!
In a previous post on quantizing Llama LLMs, I described how to download some Llama models and quantize them to a format that can run on your machine with llama.cpp. That was pretty easy, but it can get even easier!
I took the release of the tiny 1B and 3B Llama 3.2 models as an excuse to try creating my own llamafiles. Llamafiles are basically runnable zip files with llama.cpp, the model in GGUF format, and some cross-platform trickery code embedded, so they can run on the big three computing platforms with just a single file.
And they’re quite easy to build. Basically, you need to:
Download the llamafile zip from Github,
create an .args
file, and
use a special zipalign
tool to create your llamafile.
Because I like to make computers do what I want, I’ve scripted this. You can find the result in the maragudk/llamafile repo. Have a look at the Makefile
. I even uploaded some GGUF models for you to play with, see the model list at the top of the Makefile
.
The most important part is this:
.PHONY: build
build: llamafile/bin/llamafile clean
mkdir -p build
cp llamafile/bin/llamafile build/$(model).llamafile
echo "-m\n$(model).gguf\n-c\n0\n..." >build/.args
./llamafile/bin/zipalign -j0 build/$(model).llamafile models/$(model).gguf build/.args LICENSE-Llama-3.1 LICENSE-Llama-3.2
chmod a+x build/$(model).llamafile
Note that you have to embed the llamafile
binary from the downloaded zip. I missed that step first and it didn’t work. I thought zipalign
would do it for me.
The .args
are passed directly to llama.cpp
. Here, it’s telling it where the model is, loading the context size from the model itself, and making sure we can pass additional parameters to the llamafile if we need to.
For fun, and because these models are actually small enough to run in CI, I’ve uploaded some of them to the Docker Hub.
The Dockerfile
currently looks like this:
FROM debian:stable-slim AS runner
WORKDIR /bin
COPY LICENSE* ./
COPY build/*.llamafile ./model
EXPOSE 8080
ENTRYPOINT ["/bin/sh", "./model"]
CMD ["--host", "0.0.0.0"]
Then it’s a one-liner to build the image:
.PHONY: build-docker
build-docker: build
docker build --platform linux/amd64,linux/arm64 -t maragudk/`echo $(model) | tr A-Z a-z`:latest .
You could also add them easily to your docker compose file:
services:
llama32-1b:
image: maragudk/llama-3.2-1b-instruct-q5_k_m
ports:
- "8090:8080"
llama32-3b:
image: maragudk/llama-3.2-3b-instruct-q5_k_m
ports:
- "8091:8080"
Or your Github workflow:
jobs:
test:
name: Test
runs-on: ubuntu-latest
services:
llama32-1b:
image: "maragudk/llama-3.2-1b-instruct-q4_k_m"
ports:
- "8090:8080"
steps:
- name: Checkout
uses: actions/checkout@v4
And voila, you can auto-generate poems in CI! 😁
I’m Markus, an independent software consultant and developer. 🤓✨ Reach me at markus@maragu.dk.
Podcast on Apple Podcasts and Spotify. Streaming on Youtube. Subscribe to this blog by RSS or newsletter: