I’m a little proud to have been so early on the hyperlinks train. I first got my ticket punched in 1991, when the company where I was working, Interleaf, was developing a product that let you create digital documents hyperlinked to other documents on your office network. But the web supersized that feature—plus, it was open and free. Oops.
I loved hyperlinks because, as a writer and former academic, I’d become acutely aware of how much of writing consists of isolating ideas in a container, whether it’s a paper, chapter, or column. Of course, you refer to other people’s work, and you put in the appropriate footnotes, but ultimately, what you’re writing has to be self-standing.
I was such a links fanatic that in 1999, I wrote, “Hyperlinks subvert hierarchies” in The Cluetrain Manifesto. I still think that’s true, although it’s not nearly as true as I’d hoped. I continue to believe that the world and our knowledge of it look different when we are shown evidence that all ideas are just one click away from other ideas that may support, deny, elaborate on, or thoroughly misunderstand what they’re linked to.
That’s an effect of the presence of hyperlinks as a tool. But now, with the onset of the Age of AI, the idea of hyperlinks may be ramping up in an unexpected way. If I were prone to overstating a meme (for example, “Hyperlinks subvert hierarchies”), I might say that links are now eating our words. Then I’d spend the second half of a column trying to make sense of that ridiculous claim.
Links are eating our words
How many words are in a large language model (LLM) of the sort that empowers generative AI (GenAI) tools and chatbots?
Zero. But don’t feel bad. It was a trick question, albeit wrapped around a non-tricky sense that bears directly on how these conversational models work.
The tricky part of it is that every word an LLM encounters in its training data is assigned a meaningless number, called a “token.” And it’s not just words: Word parts, phrases, and punctuation may also be given a token. So if you were to look in a fully trained model, you wouldn’t see any words at all.
If you were to examine the internal workings of an LLM, you would typically “see“ each token represented as a vector—a set of numbers that capture their relationships with other tokens in a highly multidimensional space. These dimensions collectively capture various aspects of the token’s meaning and usage, such as its semantic properties, potential formality (“hello” versus “howdy”), or other characteristics that help the model generate appropriate responses. Some dimensions may represent abstract or complex features that aren’t directly interpretable by humans. The overall result isn’t a map of the relationships among tokens, but is rather a representation that provides the data necessary for the system to compute the distances among tokens in any given context.