If you take an undergrad course on the history of political thought, at some point you will be made to read Quentin Skinner’s 1969 essay, Meaning and Understanding in the History of Ideas. Otherwise you might discover it as I did, by asking your boss (very politely) what the point of having corporate values is.1 The question that Skinner wants to answer is: what are the “appropriate procedures” to understand a text? He takes aim at a kind of historian who struggles to step outside of their place as a ‘present-minded’ observer. They will read the classic texts of political thought expecting to find doctrines for what contemporary scholars would think are “mandatory themes” (the state, popular sovereignty, equality etc). The historian is projecting backwards a set of questions that they think are timeless debates but that only make sense to a modern viewer.
This is more than simple anachronism. Carrying these expectations set the historian up to fail in many ways. If they cannot find a clear expression of a doctrine, they might “discover” it across fragments of their work. Or they can “read in” meaning which the writer never meant to convey. If neither of these methods proves sufficient, some historians will even reprimand past thinkers for not including a doctrine on a theme. They will try to trace the history of an idea,
"as if the fully developed form of the doctrine was always in some sense immanent in history, even if various thinkers failed to 'hit upon' it, even if it 'dropped from sight' at various times, even if an entire era failed [...] to 'rise to a consciousness' of it."
This leads to what Skinner would call absurdities: pointing to earlier 'anticipations' — that's not what people thought they were doing! — and treating the history of political thought as a 'wholly semantic' exercise, where the historian evaluates whether an idea was "really there" by the yardstick of our current formation. A potential fix — to use the social context — can also establish unhelpful expectations, either to find causes determined by the context or to explain intentions in terms of the effects. History "becomes a pack of tricks we play on the dead".
This kind historian is “of the current moment” in the way they frame their question, but in their own self-image “outside time”, considering the perennial problems of political philosophy. The mental model this historian has is that Great Contributions are atoms attached to the fixed Debates, and each contribution is measurable against the others and the current moment. But this stilted conception isn’t explaining what the authors were actually trying to do on their own terms. Per Skinner, "there are only individual answers to individual questions, with as many different answers as there are questions, and as many different questions as there are questioners."
The solution to him is using the context as a framework for figuring out the meaning and intention of the authors' interventions from the text. What conversation did they view themselves to be participating in? With the language they used, what meaning did they believe it carried? What intention did they have, in writing?
To demand from the history of thought a solution to our own immediate problems is thus to commit not merely a methodological fallacy, but something like a moral error. But to learn from the past and we cannot otherwise learn it at all - the distinction between what is necessary and what is the product merely of our own contingent arrangements, is to learn the key to self-awareness itself.
Why did I explain this paper? I think it’s a useful frame for the question I want to ask: what time, like the historians, are the minds of language models from? There’s a loose analogy, I think, between how in the history of political thought we treat texts as comparable atoms in a vacuum and how a neural network treats the relationships between concepts stored in its weights.
Nothing about pre-training encodes a sense of time. All tokens are treated equally by the model, unless you change the learning rate. The tokens aren’t processing information sequentially, as humans have over recorded history. But most tokens come from the last few years, as the amount of content on the Internet has grown exponentially, so the present is weighted much more heavily.
Language models are predicting the next token based on the context which means that models can learn dual meanings of words, but whether they can notice subtle linguistic drift in the meanings of words over time is less sure. When the model uses “rights” today, can the model know that we could be referencing something similar to what Locke or Paine referred to, and also conceptions which neither of them had access to? When features light up inside the model, does the model understand the chronology so “rights” for Locke does not mean the Geneva Convention? As multilingual models have become more capable, they have stopped representing concepts in each language — it’s more efficient to represent the same meanings in a smaller number of features. This is useful for building capable agents, but it might mean the models are under-parameterised for keeping hold of all the subtle differences in meaning that matter for these problems, and so we just keep the coarse, present meaning.
RLHF selects for the present even more aggressively. For chatbot products, the model specification wants them to be ‘helpful, harmless, and honest’ (all good things!). But it does alter the persona of the model towards whatever we understand those things to mean right now. The authors of the classic texts of political philosophy would have made different suggestions to the model behaviour teams in Californian AI labs about what it means to ‘do no harm’. The chatbot personas are also selected for what users want them to be: (mildly) sychophantic and long-winded. These tools are not able to precisely alter parts of the model’s persona. The deeper drives and motivations of the model are shifted by these interventions in imperceptible ways — all towards our 2025 ideas of what they ought to be.
At this point, you might push back, “What does it matter if the LLM minds are so contemporary? Their relationships to time and to texts feels like exactly the kind of academia of our time that I am excited to automate."
I think this would be wrong. It matters a lot.
People will say things like, “you must read Heidegger in German or Tolstoy in Russian” because the linguistic structures in those languages mean the authors had a different set of affordances for expressing different thoughts. Translations don’t quite get it. The same is true of structures influencing thoughts at a higher level too. One that has bothered me recently is oftentimes when people oppose technological progress or economic growth, they will often also believe that we’ve all been corrupted by institutions or modernity. They take a very rosy view of what life was like beforehand, unspoilt and simpler. This pervades into things like, thinking that its positive to shrink the UK’s water allowance by 20% for 2037. You can’t avoid them.
What we want from the models is a very flexible, self-aware kind of intelligence that can step into and out of these frames.
I was reminded of a time visiting a Viking museum on a family holiday, which had preserved wooden longboats from 900AD. I found it completely insane that people would get into these boats, only half a step up from a canoe, and sail to raid or settle another country. What must they have believe about their relationship to the sea, the place they lived and were going, their purpose, the other people, the weather, and so on? Without any more structured, I asked the AI systems to provide me with an account of why someone would sail across the sea, in terms that would have made sense to the Vikings. (Not literally Old Norse, but the closest approximation of the ideas.) The responses were inflicted with Romantic ideas about the sublime nature world that a Viking wouldn’t have used.
Noticing this issue was relatively easy, but I could imagine that these frames are invisible if you are exploring something particularly unfamiliar. Models which can only think in terms of the present, or which are adversarially pulling to return to the present, are an unhelpfully rigid kind of intelligence.
Last month, the Trump Administration introduced an executive order on “Preventing Woke AI in the Federal Government”, to make AI that is “serving America, not ideological interests”. The details of the requirements are fairly uncontroversial despite the politically charged title: model developers have to provide the government with transparency into how models are ideologically steered through the model spec, system prompt, or evaluations. There’s an important discussion for all states to care about the default responses of the models, but a “post-ideological” model does not seem to be desirable, or possible. (America is not ideologically neutral, and it would be worse off for being so!). I don’t think that preventing “Woke AI” means that the “woke” vectors should be ablated, unlearned, or RLHF’ed by the model. The best kind of models should be able to step into the “woke” frame, give us the best of whatever it has to offer, and then readily step into another.
Lots of people have been reaching to articulate what it means to instantiate “liberal democratic” AI systems and have stalled at having liberal democratic owners and controllers. But there is a partial answer here: models which support their users to step into and out of different (ideological) frames with high fidelity is much more liberal than the rigid, doctrinal enforcement of the (ironically, quite Liberal) status quo. For these models to make paradigm-shifting progress in the humanities, they need to have more awareness of their own state and choose their own frames. I don’t think they are nearly as good as humans doing this. I worry this kind of thing might be undersupplied by the market — the model developers also have to make these models good therapists too — and while we all benefit from progress in the humanities, it’s more difficult to capture this value.
At least, if anyone does want to try this, they will have the corpus of the Cambridge School to help.
If you treat them as abstract utterances which are just part of some eternal conversation about what makes a good company, that’s pretty useless. But if values are specific interventions that address problems that are particular to the company, that’s much more useful. (Or at least, that was his point.)