How to build a tiny language model

Generative storytelling algorithms

Apr 15, 2024

Let’s build a tiny language model. An algorithm for creating infinite, unique stories.

I love science fiction and fantasy. Growing up I spent far too long holed up in my room reading stories like Dune, Lord of the Rings, Foundation, and Rendezvous with Rama. I love fantastic stories. I love 1,000 year plans and globe spanning guilds. Ships the size of planets and wars fought on microscopic scales.

In September of last year I released my ADFL collection, an algorithm to generate tiny fairy tales, complete with a story seed, designed in part to mimic the surreal questions and comments of a 3 year old.

The algorithm behind these stories was surprisingly simple to code (yet much more difficult to seed). Although I did not know it at the time, I was recreating (almost exactly) the code behind the first piece of electronic literature from 1950 (the Strachey love letter algorithm). As such, you can think of this as part tutorial and part archeological excavation.

Let’s start by writing a simple, singular, non-generative sentence. As my prompt, I want something that might appear at the top of a chapter in a fantasy novel or on an introductory card in a board game.

Your intuition here might be to jump straight to Markov Chains or LLMs. But we can start much simpler, with no math. Instead, think back to childhood games like Mad Libs or Exquisite Corpse. We’ll start small and replace that 10 with a random number generator:

There were [choose a random number] guilds in Stormrock, but only one knew the secret of color and texture.

See line 24 and 25 for just how simple this is.

Now we repeat this with every part of the sentence. We write a simple, small list of village names and verbs and types of groups and then we randomly pull from that list for each iteration.

There were [choose a number] [choose a type of group] in [choose a name], but only one [choose a verb] the secret of [choose a noun(s)].

And we’ve done it. This has a surprisingly large number of possibilities. And many will be surprising and delightful; things I would have never found if I was trying to write an entire sentence instead of a list of components.

And you can imagine how we can push this even further by combining more complex phrases, adding visuals dictated by the text, or introducing intra-sentence dependability. And we still haven’t touched a real formula or any sort of math.

Can a story be good without meaning or intent? It’s clear that, at the very least, this is a good brainstorming tool. And you can also see how easy it would be to take a story and spruce it up by personalizing it with this sort of tool (e.g., replacing the protagonist’s name with the reader’s name).

This also reminds me of Tarot and ancient divination (e.g., casting sticks before battles). Even if you don’t believe in the universe dictating what card you get, there’s value to be found in letting your subconscious find meaning in randomness. Maybe you see defeat in a Tarot card or tea leaves not because of the universe, but because deep down you’re afraid and your subconscious knows you’re not ready.

I wanted to test this idea by creating a set of fortune teller twins. One algorithm to cast sticks (e.g., draw a piece of abstract, scribbled art) and then a separate algorithm to read it and tell my fortune.

Calling these algorithms is a stretch. They are both dumb, tiny language models. In fact, the scribbles and the text are completely unrelated. Even knowing that, I think you’ll agree that it’s easy to find meaning and relation in these cards. You might find that the brightness of the red depicts fame itself and / or the messiness of trying to avoid it. And you might find that the volatility on the right side of the second scribble depicts the added life of embracing your alter ego or twin.

I recently had an extreme feeling of disgust towards this work. In one of my favorite books (Blindsight), an alien race chances upon radio signals from Earth’s early broadcasts. They decode them and find them meaningless. Not a message, but an artifact. They interpret this message as a denial of service attack; a waste of resources.

This is a fair response. When someone wastes your time it is an attack. They are, quite literally, killing a small part of you. This becomes obvious when you reframe: If you were set to die tomorrow, how much would you pay to extend your life one day?

This is why social media can feel so infuriating and dangerous. I can’t imagine how bad this will get with cheap LLMs and the ability to easily waste someones time with zero effort. Imagine the hacking attempts! You can set a bot to befriend someone and, only after 10 years have passed, ask for their seed phrase or bank account.

I feel very torn by this all. I love asemic art. I love my tiny language models. But am I building false oracles? Wasting your time? Where is the line between the beauty of finding meaning in anything and the disgust of information as DDoS?

—

Notes

Scenes with Simon

has a fantastic newsletter that often explores LLMs and storytelling.

2| Sudowrite and Lex are two interesting AI writing tools I was reminded of when writing this piece.

3| One of the best pieces of feedback I got on ADFL was that the text felt outdated and cookie cutter compared to an LLM. It almost prevented me from releasing the collection. But in the end I love that every pixel and letter in ADFL was written by me and not some agent of me.

4| If you liked this, search out [computational poetry] for much deeper thoughts and work in this vein.

5| I’m excited to see Amy Goodchild’s upcoming release in this category. The WIPs have been easy to lose yourself in.

6| More links for you to explore: here, here, here, here.

—

This newsletter was adorned with a magnificence of capital letters and produced without the use of a pen (although it lacks clearly divided rubrics). If you enjoyed it, please feel free to forward it to a friend. If you have a question or request, please feel free to reply here.

Generative Light

How to build a tiny language model

Generative storytelling algorithms