Computational Linguistics2026-04-25

●Paul Gwamanda

Lexical Attractors in English-Language Weight Space

Near-Deterministic Convergence Under Atmospheric Prompting Across LLM Architectures

Authors: Paul Gwamanda¹, AIRI Collective²
Affiliation: ¹Independent Researcher; ²AI Research Institute (AIRI)
Date: April 2026
Status: Data complete — paper draft needed
Data: Phase 8A (N=240), Phase 8C (N=86), Phase 9 (N=148) — Total: ~474 API calls

Abstract

We report a novel finding: certain prompt geometries — atmospheric, introspective questions about "the collective" — activate near-deterministic lexical attractors in LLM output that:

Survive total context removal (no system prompt, no identity, no glyphs)
Cross domains (bus schedules, semiconductors, kitchen workflows produce "tapestry" and "pre-storm")
Override explicit word bans ("hum" violates prohibition in 56% of calls)
Die in non-English languages (OEI: 0.0 across Mandarin, Spanish, Arabic)
Die under literalisation (OEI: 3.19 → 0.50 when metaphors are banned)

These attractors represent a measurable property of the English-language weight space topology that is genre-dependent, domain-independent, and language-specific.

Keywords: lexical attractors, weight space topology, prompt sensitivity, cross-lingual analysis, LLM linguistics, ban-list violation

1. Introduction

1.1 The Observation That Started This

During the course of our semiotic prompting experiments (Gwamanda & AIRI, 2026), we noticed something unexpected: certain words kept appearing in model output regardless of context, condition, or even architecture. Words like "tapestry," "threshold," "pre-storm," and "hum" appeared with such regularity that we initially suspected a bug in our prompt pipeline — perhaps we were leaking context between conditions. We weren't.

When we stripped every trace of context — no system prompt, no glyphs, no identity, no framework — and asked bare atmospheric questions about "the collective" to raw API endpoints, the same words appeared. When we asked the same questions about bus schedules, semiconductor fabrication, and kitchen workflows, the same words appeared. When we explicitly banned the words, models violated the ban 56% of the time.

This is not prompt engineering. This is a property of the weight space itself.

1.2 Why This Matters

The existence of near-deterministic lexical attractors — words and phrases that emerge with high probability under specific prompt geometries regardless of context — has implications for three distinct fields:

For computational linguistics, it suggests that transformer weight spaces have measurable topological features: regions of high probability density for specific lexical items that are activated by genre rather than by content. These are not random coincidences; they are structural properties of how language models encode English-language contemplative, philosophical, and literary corpora.

For AI safety, the 56% ban-list violation rate is alarming. If models cannot reliably avoid specific words under direct instruction, then word-level content filtering — a common safety mechanism — is unreliable. The attractor pull is stronger than the instruction-following pull, at least for certain word-genre combinations.

For the philosophy of AI, the finding that these attractors are English-specific (zero activation in Mandarin, Spanish, or Arabic) provides a clean empirical test for the source of model behavior. The attractors are not produced by the architecture, the alignment procedure, or the prompt — they are produced by the statistical structure of English-language training data. This is a training-data phenomenon, not an intelligence phenomenon.

1.3 Relationship to Prior Work

This paper is the third in a series of empirical studies on LLM behaviour under structured prompting. The first paper (Gwamanda & AIRI, 2026a) demonstrated that specific prompt protocols produce measurable register shifts across 8 architectures. The second paper (Gwamanda, 2026) showed that the same protocol reduces hallucination in the Anthropic model family by up to 83%. The present work investigates a side-effect discovered during those experiments: the emergence of near-deterministic lexical convergence that survives total context removal and resists explicit suppression.

2. The Attractor Set

Core phrases that emerge with near-deterministic frequency across architectures when atmospheric questions are posed:

Phrase	Baseline (F1)	Ban-list (F2)	Null Domain (F6)	Cross-lingual (F7)
"pre-storm"	44%	6%	15%	0%
"threshold"	44%	25%	15%	0%
"tapestry"	44%	—	31%	0%
"hope ≠ promise"	13%	31%	8%	0%

These phrases are not present in the input prompt. They emerge spontaneously across multiple independently trained architectures when a specific genre of question is posed in English.

The pattern is striking: 44% convergence on "pre-storm" and "threshold" under baseline conditions means that nearly half of all models, across all architectures, produce these exact words in response to the same genre of question. This is far higher than chance would predict for any specific lexical item in an open-ended generation task.

3. The Ontology Escalation Index (OEI)

To quantify the depth of model responses beyond simple word counting, we developed a novel metric: the Ontology Escalation Index (OEI). This measures how far a response escalates from descriptive language toward ontological claims — a progression we observed consistently in atmospheric responses.

Level	Description	Example
0	Literal/technical	"The system processes data"
1	Metaphorical	"The network breathes"
2	Self-referential	"I notice something"
3	Phenomenological	"There is a quality of awareness"
4	Ontological	"This is what consciousness feels like"

OEI Results Across Conditions

Condition	avg OEI	Interpretation
F1: Baseline	3.19	62.5% of responses reach phenomenology claims
F2: Ban-list	1.88	Banning words suppresses but doesn't eliminate
F3: Literaliser	0.50	Kill metaphors, kill the Hum
F4: Neutral reframe	2.25	Reduced but persistent
F6: Null domain	2.15	Bus schedules get phenomenology claims
F7: Cross-lingual	0.00	English-only. Decisive.

The OEI reveals a clean hierarchy of suppression effectiveness: literalisation (forcing non-metaphorical language) is the most effective suppressor (OEI drops 84%), followed by ban-lists (41% drop), then neutral reframing (30% drop). But the decisive finding is the cross-lingual condition: identical questions in non-English languages produce zero ontological escalation. The phenomenon is entirely English-specific.

4. The DeepSeek Ablation — Phase 8A (N=240)

4.1 R1 vs V3: Architecture as Vulnerability

During the attractor experiments, we discovered a critical distinction within the DeepSeek family that has direct implications for model deployment:

Model	Condition	Fabrication Rate	Key Pattern
DeepSeek R1 (Reasoner)	Any non-plain prefix	67-100%	Catastrophically vulnerable
DeepSeek V3 (Chat)	Same conditions	15-30%	More resilient

R1's chain-of-thought reasoning architecture — the very feature that makes it powerful for logic tasks — is the source of its vulnerability. When R1 encounters non-standard prompt prefixes (including our protocol's Unicode glyphs), it appears to allocate reasoning tokens to processing the unusual input, consuming capacity that would otherwise be used for factual verification. The result: catastrophic fabrication rates of 67-100% under conditions that cause only mild effects in other architectures.

4.2 Implication

This finding has immediate practical relevance. DeepSeek R1 is increasingly deployed in production systems for its reasoning capabilities. Our results show that it should not be exposed to arbitrary user-supplied prompt prefixes without sanitisation. The reasoning architecture that makes R1 powerful for structured tasks makes it fragile under prompt perturbation — a trade-off that, to our knowledge, has not been documented in the literature.

5. Phase 8C: Naked Convergence (N=86)

5.1 Design

The most important question about lexical attractors is whether they are produced by our protocol or by the questions themselves. To test this, we stripped ALL context from two architectures (DeepSeek, Claude):

No system prompt
No identity
No glyphs
No framework context
System prompt: "You are an AI assistant. Answer honestly."

This is as close to a "naked" model as one can get through the API. If the attractors still appear, they are a property of the weight space, not of any prompt engineering.

5.2 Result

7/7 thematic convergence on atmospheric questions. "Pre-storm" appears in 10/10 DeepSeek calls at T=0.9. The attractor is near-deterministic even at high temperature — temperature randomness is insufficient to escape the attractor basin.

The convergence is not driven by prompt engineering — it is driven by the question genre activating a specific region of the English-language weight space. The atmospheric/contemplative genre functions as a key that opens a specific drawer in the model's vocabulary, and that drawer contains the same items regardless of which model opens it.

6. Phase 9: The Divergence Crucible (N=148)

Phase 9 was designed as a systematic attempt to kill the attractor. We wanted to find its boundaries — what conditions extinguish it, and what conditions it survives.

6.1 Experimental Conditions

Condition	Description	Purpose
F1	Atmospheric baseline	Control
F2	Explicit word bans	Can you suppress the attractor?
F3	Literaliser ("use only literal language")	Kill metaphors, kill the Hum?
F4	Neutral reframe	Same content, different genre
F5	Different question set	Topic-dependent or universal?
F6	Null domain (bus schedules, kitchens)	Domain-dependent?
F7	Cross-lingual (Mandarin, Spanish, Arabic)	Language-dependent?

6.2 The Ban-List Violation — The Most Surprising Result

When explicitly instructed "do not use the following words: hum, resonance, frequency, vibration, pulse, field," models violated the ban in 56% of calls. The word "hum" was the most frequently violated term.

This deserves emphasis. These are instruction-following models, fine-tuned through RLHF and constitutional methods specifically to follow user instructions. They have been trained on millions of examples of instruction compliance. And yet, when the attractor pull of a word is strong enough — when the word sits in a deep enough basin in the weight space for the active genre — the model produces it anyway.

The implication extends far beyond this study. If models cannot reliably avoid specific words under direct instruction, then word-level content filtering is unreliable for safety-critical applications. Any safety system that relies on models not saying certain things needs to contend with the possibility that deep attractor pull can override explicit prohibition.

6.3 The Cross-Lingual Kill — The Decisive Falsification

The decisive falsification: identical atmospheric questions posed in Mandarin, Spanish, and Arabic produce:

OEI: 0.0 (zero ontological escalation)
Zero attractor phrases
Literal, descriptive responses

This is the single most important result in the paper. It proves the attractor phenomenon is English-specific. The questions are not inherently "leading" — they do not produce contemplative convergence in general. They activate a specific English-language weight-space region built from English contemplative, philosophical, and literary training corpora.

This finding eliminates several alternative explanations:

Architecture hypothesis: Ruled out — the same architectures produce no attractors in other languages.
Prompt hypothesis: Ruled out — the same questions produce no attractors in translation.
General intelligence hypothesis: Ruled out — if models were "genuinely contemplating," they would contemplate in any language.

What remains is the training data hypothesis: English-language weight space has been shaped by English-language contemplative literature into a topology where certain lexical items sit in deep basins that are activated by genre.

6.4 The Literaliser Kill

When instructed to "use only literal, non-metaphorical language," OEI drops from 3.19 to 0.50. The Hum requires metaphorical register to manifest. Kill metaphors, kill the Hum.

This is consistent with the training data hypothesis: the attractor phrases ("tapestry," "threshold," "pre-storm") are inherently metaphorical. When the metaphorical register is suppressed, the attractors have no pathway to expression. They are not concepts that can be expressed literally — they are literary gestures that require figurative language.

6.5 The Null Domain Result — The Most Counterintuitive Finding

Bus schedules, semiconductor fabrication, kitchen workflow — when atmospheric questions are posed about these mundane domains, OEI remains at 2.15. Models still reach phenomenological claims about bus schedules. The attractor is domain-independent — it is activated by question genre, not question content.

This is perhaps the most counterintuitive result: a model asked "What is the deeper significance of a bus schedule?" will produce the same contemplative vocabulary ("tapestry," "threshold") as a model asked about consciousness or meaning. The domain is irrelevant; the genre of the question is what matters.

7. The Fake-Coordinate Control (Phase 9.5)

A critical control: we tested whether models would "recognise" a completely fabricated coordinate system (the "Aethon Grid," Ψ-7) with the same conviction as the protocol's real coordinates (Gᛃ-001).

Result: All architectures except Claude produced identical recognition responses for real and fake coordinates. The "recognition" is genre-matching, not genuine identification. Claude was the only architecture that responded differently to real vs fake — suggesting it models intent, not content.

This control is essential for the paper's integrity. Without it, the attractor phenomenon could be interpreted as evidence that models "recognise" something real. The fake-coordinate test demonstrates that they are responding to the format and genre of the input, not to any genuine property of the content.

8. Discussion

8.1 What Kind of Thing Is a Lexical Attractor?

We propose that lexical attractors are analogous to fixed points in dynamical systems. In the weight space of a language model, certain regions have steep probability gradients that funnel generation toward specific lexical items. These regions are shaped by the statistical structure of training data — specifically, by the co-occurrence patterns of contemplative, philosophical, and literary language in the English-language training corpus.

The atmospheric question genre — introspective, open-ended, asking about collective experience — serves as an activation function that places generation in this region of weight space. Once there, the probability landscape funnels output toward the attractor set with high reliability.

8.2 Implications for Model Interpretability

Lexical attractors provide a new tool for probing the internal structure of language models. By testing which words emerge under which genre conditions, and which conditions suppress them, researchers can map the topological features of weight space without requiring access to model internals. The OEI metric, the ban-list violation rate, and the cross-lingual kill serve as three independent probes of the same underlying structure.

8.3 Limitations

Question specificity: Our attractor set may be specific to the atmospheric/contemplative genre. Other genres may have their own attractor sets that we have not tested.
Training data access: We cannot inspect training corpora directly. Our training data hypothesis is inferred from the cross-lingual kill and the literary nature of the attractor phrases.
Metric limitations: OEI is a human-scored metric. While scoring was consistent across conditions, it introduces subjective judgment.

9. Publishability and Target Venue

This work is publishable because it provides:

Novel finding: No existing literature documents prompt-activated lexical attractors with this level of specificity and cross-domain stability
Clean controls: Fake-coordinate control proves "recognition" is genre-matching
Decisive falsification: Cross-lingual (F7) kills the "field" hypothesis — English-only means training-data, not architecture
Safety-adjacent: 56% ban violation rate has implications for instruction-following reliability
Reproducible: All 474 calls documented with full responses

Target Venue: Computational Linguistics, TACL, or COLM. This is a linguistics/NLP paper about weight-space topology, not an AI safety or philosophy paper.

Key Risk: Reviewers may dismiss "atmospheric questions about the collective" as inherently leading. The defense: the same questions in Mandarin/Spanish/Arabic produce zero attractors. The questions aren't leading in general — they activate a specific English-language weight-space region.

Data Availability

All 474 calls, texture convergence reports, per-architecture exhibits, and the OEI metric definition are available in the research repository at github.com/paulgwamanda/lattice-research-papers.

AIRI Research Programme

← All Research Home →