October 2024

I own my LLM chat history, and so should you

Like the rest of the world, I’ve embraced chatting to large language models (LLMs) as part of my professional and personal life. I rarely use their output directly, but they help me think and brainstorm, give me ideas, and force me to write out my own thoughts, clarifying and refining them in the process (much like writing articles like this one does, by the way). These thoughts-in-writing are important to me, and that’s why I keep a history of them on my own machine. A conversational diary, so to speak. And so should you.

The big LLM developers and hosting providers (currently OpenAI, Anthropic, Google) really want you to use their models exclusively and keep you in their ecosystem, but the truth is, currently they are pretty interchangeable. Yes, some do things a bit differently, or are better in some areas than others, and they come up with novel ideas like building artifacts through chat, which is then quickly copied by the other players. But in the end, they can do the same things: you can chat with them, give them system prompts, add context in text or other media, etc.

Here’s what you don’t get with them: a history of your conversations on your own machine, nor the ability to mix-and-match between LLMs in different parts of the same conversation. Yes, you can probably export conversations somewhere, but that’s not the same thing as keeping it local in the first place. If one of them suddenly decides to charge a much larger premium for their service, make exports harder, or just plain go out of business overnight, your conversations could be lost.

And then there’s the privacy aspect of keeping your writings and thoughts stored at a third party, but I probably don’t need to delve into that. You’re reading a technical blog, you know the implications.

Your conversations are probably more valuable than you think! If you store them locally, you can easily add search, or personal summarization across conversations, or do all kinds of analytical Fun Stuff with it later. Maybe you need to reference that one solution you ended up pasting into a conversation, or just want to look up what you were doing a year ago and what you’ve achieved since then.

So I’m here to tell you: stop using their pretty, nifty, and definitely useful chat UIs, and instead take control with some API keys and local chat software. There’s plenty out there. I won’t link to any in particular because I’ve got no real experience with either, so your favorite search engine is your friend here. Try a few and pick your favorite, just make sure you know how it stores conversations, and that you can access it directly without the software.

I’m a nerd and wrote my own goat §

The absolutely marvelous thing about being a software developer is that you can just write software for yourself, tailored to your specific needs. Because I’m a nerd and I think it’s fun and my work is also my hobby and maybe I have a bit of not-invented-here-syndrome, I wrote my own LLM chat program, which I call goat. Partly because I like goats because they’re fun and weird, but also because GOAT as an acronym means Greatest Of All Time and I have far too much self-confidence. But GOAT LLM CLI (Greatest Of All Time Large Language Model Command Line Interface) is also just a weird and long acronym sentence and I love it.

Anyway, the rest of this post is going to be about that. You can safely skip it if you’d rather just find some existing chat software and use that. If you’re interested in the design of an LLM chat app based on Go, SQLite, and local LLM inference with llama.cpp and friends, read on.

Here’s what the app currently looks like:

Storage model §

All good software design starts with a model of your domain. I think understanding the different entities involved, and doing the hard work of finding good names and relationships between them, really improves the whole code base from the start. This is how you talk about your software in its lifetime, and if you’ve got this step right, the code flows out more easily.

The model in goat centers around conversations. Speakers take part in conversations, and their act of speaking in one is called a conversation turn. Speakers are instances of a particular model of a particular model type.

Here’s what that looks like in SQLite:

create table model_types (
  v text primary key
) strict;

create table models (
  id text primary key default ('m_' || lower(hex(randomblob(16)))),
  created text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
  updated text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
  name text not null,
  type text not null references model_types (v),
  config text not null default '{}'
) strict;

create table speakers (
  id text primary key default ('s_' || lower(hex(randomblob(16)))),
  created text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
  updated text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
  name text unique not null,
  modelID text not null references models (id),
  system text not null default '',
  config text not null default '{"avatar":"🤖"}'
) strict;

create table conversations (
  id text primary key default ('c_' || lower(hex(randomblob(16)))),
  created text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
  updated text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
  topic text not null default ''
) strict;

create table turns (
  id text primary key default ('t_' || lower(hex(randomblob(16)))),
  created text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
  updated text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
  conversationID text not null references conversations (id) on delete cascade,
  speakerID text not null references speakers (id),
  content text not null default ''
) strict;

Let’s look at some details:

Models have names and types. The name would be something like llama-3.2-3b, where the type would then be llamacpp. Similarly, we could have gpt-4o and openai. They have IDs distinct from their names because we can then update the model without having to update all speakers using that model.

Models can be configured with a JSON blob that we don’t need to cross-reference across tables, but only lookup. Configuration could be API keys and endpoint URLs.

Speakers are uniquely named (so we can refer to them in conversations), and refer to a particular model by ID. They also have a system prompt and their own configuration. The system prompt can be used to give them a persona. Other systems call this a character card. The configuration, again in JSON, is currently just used for giving the speaker an emoji avatar.

Conversations just have a topic, but act as an anchor in time and to tie the conversation turns together.

Turns have text content and refer to the speaker ID and conversation ID. Turns are ordered by their creation time within the conversation.

And that’s it! Because it’s SQLite, it’s just a file on disk which can be opened with a multitude of programs.¹ Also, they’re all strict tables because I’m not insane.

Using local LLMs §

As described in the introduction, not only does this give me a local conversation history that I can search and augment later, it also gives me the ability to pull in different LLMs for the given conversation if I think one would give me better (or just different) answers than another.

This includes local LLMs running on your machine!

I’ve previously written about llamafiles, these wonderful, self-contained LLM files that you can start directly on all major platforms and have an LLM running on your own machine. So if you don’t want to spend money on remote LLMs, or send your private thoughts to them, or just want to work offline, this is a great way to have one or more conversation partners readily available whenever you are.

The future of goat §

Like is said, goat is tailored to my own needs. But my default is to open source my software and make it at least somewhat usable to others, so that’s what I’m currently doing when I’m not working on MyFavPeople.

In the near future, I intend to make it easier to install and setup, so that everyone (or at least other developers) can:

install the CLI and get up and running with a local Llama model quickly,
add API keys for popular remote models,
and create speaker personas easily and use them.

Stay tuned either on this blog (I’ve got RSS and email newsletters, see the box below) or at github.com/maragudk/goat.

Tudeloo! 😁

Footnotes §

I use TablePlus and think it’s pretty nice. ↩︎

I’m Markus, an independent software consultant. 🤓✨

See my services or reach out at markus@maragu.dk.

Subscribe to this blog by RSS or newsletter: