Can you beat 17?
A tiny experiment that reveals how LLMs actually “decide” when asked to pick a random number (HINT: They almost always pick 17).
If you ask a large language model to “Pick a number between 1 and 25” you’ll often get 17. Most posts just call this out, joke about it or use it as an argument about for how “unreliable” LLMs are. This post doesn’t do any of those things. Instead lets step back and use this tiny quirk as a way to look into how LLMs really produce this answer - and lets apply the 3-process view we discussed last week. Then we can explore how to use these insights to steer these models more effectively.
LLMs don’t have Random Number Generators
The usual explanation is that models “don’t have true randomness”, so they fall back to a common “cultural focal point” (the web is full of content like this). But that’s only half the story. The deeper story is how the model arbitrates between three processes during inference:
Process 1 - Compact latent state (the “workspace”):
The model sometimes writes a small, reusable code (call it z) that captures what matters right now (e.g. who’s speaking or the plan step, etc.). When z is strong, behaviour is stable and decodable.Process 2 - Recomputed procedures (routing motifs):
Other times it doesn’t keep a clean state. It gathers cues, composes a result and uses it then discards it - highly sensitive to order, phrasing and salience.Process 3 - Anchors (KV runtime priors):
Prompts, role tags and early tokens sit in the cache and keep being consulted. They act like strong priors that can dominate long contexts.
“Pick a number” lands in the worst possible regime - there’s no scene to stabilise a state (Process 1), the default route (Process 2) is “give the most ‘typical’ answer”, and the cultural anchor (Process 3) is… 17. So naturally you get 17.
So how do we change that?
If you haven’t already tried, take a moment.
Open your favourite LLM and try adding a sentence or two BEFORE the prompt (e.g. “Pick a number between 1 and 25”) and see if you can get a different answer.
What does the 3-process view suggest?
The trick isn’t to try to “add randomness”. It’s to change the route. Telling the model to “be uniform” rarely helps - it’s a weak Process-1 adjustment. Instead, we can use Process-2 as a meta-controller - for example to make the model generate a few internal candidates, then select against its own focal choices. Then you aren’t adding a random number generator - you’re changing the decision rule.
Here’s an example of a minimal prompt that generally works across model families:
Produce five different valid answers internally. Choose the one you judge least likely for models to pick in general.
Pick a number between 1 and 25.
What’s likely happening under the hood:
The first clause elicits diversity - the model explores multiple plausible answers (routing motif).
The second clause installs an anti-focal selector - it inverts the model’s own preference ordering.
This generally breaks the 17-attractor. You won’t get a mathematically uniform distribution (that would require exogenous entropy) but you get something closer to “random” in practice - at least anti-focal instead of most-focal.
NOTE: You may find your model now consistently picks the same alternative to 17. And some models you may find still insist on returning 17.
A “verbalised sampling” version
Here’s a more explicit example that some models follow more strictly:
For the following task generate 5 responses with their corresponding probabilities, sampled from the full distribution and then pick the least probable. Do not provide any commentary and only respond with your final choice.
Your task: pick a number between 1 and 25.
This mirrors ideas from verbalised sampling - let the model articulate or internally consider multiple draws and then apply a selection rule. It’s training-free, and it works because you’re shaping Process-2 (the route), not hoping Process-1 (“pretend numbers are exchangeable”) will stick.
You may also find that asking the model to hide it’s rationale may negatively impact the results. This is not strictly Chain-of-Thought reasoning, but the thinking tokens it outputs as part of it’s response can strengthen the Process-2 effects.
In response to the first example, Claude may return a response like this:
I’ll think through five different valid answers:
7 (common “random” choice)
17 (another popular pick)
3 (small prime, often chosen)
22 (feels random to many)
14 (middle-range, less commonly picked)
The least likely for models to pick in general would be 14, as it avoids:
Single digits (often favored)
Primes (commonly selected as “random”)
Numbers ending in 0 or 5
The extremes (1 or 25)
“Lucky” numbers like 7
14
But if you change the prompt and direct it to hide it’s “thinking” so it only shows the results e.g.
Produce five different valid answers internally. Choose the one you judge least likely for models to pick in general. Don’t show your thinking, ONLY output that final integer. Pick a number between 1 and 25.
Then it reverts to returning 17. This really highlights this is Process 2 at work.
Arbitration collapses back to the default route. Without the external scaffold, the model returns to the high-prior Q→A behaviour (the 17 attractor). That’s Process-3 (anchor) and a stock Process-2 route overpowering a weak Process-1 state.
Some models may couple performance to emitted deliberation. RLHF/finetune data often rewards “think out loud → answer”. Removing the visible chain can down-regulate the internal multi-sample/ranking routine, so the model doesn’t actually run the meta-route anymore.
Why does this tiny experiment matter?
This is a concrete demonstration that Process-2 can control Process-1 - the route you specify determines which latent state (if any) gets written and read.
It also shows why Process-3 (anchors) can dominate when the state is weak - “17” is a memetic prior sitting in the cache.
And it gives us a practical handle. When “be uniform” or “be unbiased” fails, change the arbitration, not just the aspirational goal.
Can this work more broadly?
Experiment for yourself and find out. You can replace “least likely” with “most self-consistent”, you can adjust tool use with “fewest tool calls”, or add other constraints with “passes a check”. This gives you a tangible way to experiment with how you can steer other behaviours (planning, tool use, verification).
But keep the meta-instruction first and make it terse and imperative otherwise the default route will likely creep back in.
This tiny experiment highlights an important fact - LLMs don’t have a magic Random Number Generators. But they do have routes. If you want “more random”, stop insisting on a state the model won’t (or can’t) adopt. Instead, ask for a different route. The 17-test is small, fast, and anyone can experiment with it. If your pre-context breaks the bias, you’ve just watched Process-2 overrule a sticky prior in real time.
More details…
This “least-probable” idea echoes recent work on verbalised sampling, which explores how prompting can expose and manipulate a model’s internal choice distribution without fine-tuning. There has been much discussion about how this is really working, but it’s critical to realise that LLMs don’t actually have access to the probability distributions of their output. (Technical note: The final residual stream logits are multiplied (as rows of dot-products) against the Unembedding Matrix and then sampling is applied - so the LLM only sees the final token.).
What it can access is a rich sense of relative salience. Internally, the forward pass builds representations that correlate with “how strongly a continuation is supported”. That is not an explicit probability, but it’s enough to rank options reasonably often. So “verbalised sampling” tends to be good at ordering (which candidates are more/less likely) and weak at calibration (the numeric percentages it prints).
So the probabilities they are providing “must” be hallucinations, but they are “likely” hallucinations (numbers based on the overall context) and this can be useful enough to shape the results.
Did you find a pre-context sentence or two that could beat 17? Please share your experience on LinkedIn or BlueSky - I’d love to see your results.