We present the Möbius Learning Bundle, a mathematical framework and working implementation that simultaneously closes three disconnected learning loops in a multi-agent cognitive simulation by treating them as fiber sections over a shared neurochemical base manifold. The base space \(B = [0,1]^7\) encodes seven neurochemicals (dopamine, serotonin, norepinephrine, oxytocin, GABA, cortisol, endorphins). Three fiber sections—policy modulation (\(\sigma_1\)), training rate (\(\sigma_2\)), and world model balance (\(\sigma_3\))—are transported simultaneously via a \(7 \times 3\) connection matrix \(\Gamma\). Updates propagate as damped wave ringlets rather than instantaneous jumps, preserving biological plausibility. A Möbius twist arising from self-play role alternation produces holonomy—the non-zero section difference after parallel transport around a closed loop—which serves as the learning signal that single-perspective observation cannot generate. We demonstrate the system running in Haven, a village simulation of 16 cognitive agents, where the bundle closes behavioral, training, and world-modeling feedback loops that were previously dead-ended.
Consider a cognitive agent with genuine neurochemistry—not a reward scalar, but a 7-dimensional chemical state that modulates perception, consciousness, memory, and behavior. In our Haven system [1], each of 16 agents (“beings”) implements this architecture based on Lövheim’s Cube of Emotion [2], Global Workspace Theory [3], and TD learning [4].
Three feedback loops existed in principle but were disconnected in practice:
Behavior Loop: MOBA match outcomes shifted
neurochemistry (dopamine for wins, cortisol for losses), but only
PlanningProcessor read the attention_profile(). The
connection was narrow—six other cognitive processors competed for
consciousness without feeling the match outcome.
Training Loop: Match data was written to JSONL on disk. Nothing read it. The sovereign language model (PhotonicGPT) trained on a fixed curriculum. The agents’ lived experience had zero effect on model plasticity.
World Model Loop: Each agent maintained a WorldModel (count-based transition model, cf. Graves [5]) that accumulated state-action-next_state observations. Prediction errors were computed but never fed back into neurochemistry. The curiosity signal died at the world model boundary.
These are not three separate engineering problems. They are three fiber sections over the same base manifold. The base space is neurochemistry. Each loop reads from neurochemistry and should write back to it (or to systems that neurochemistry governs). The missing piece is a connection—a parallel transport operator that maps movements in the base space to movements in the fiber.
In differential geometry, a fiber bundle \((E, B, \pi, F)\) consists of a total space \(E\), base space \(B\), projection \(\pi: E \to B\), and fiber \(F\). A connection on the bundle specifies how to “parallel transport” vectors in the fiber as you move through the base space. The connection is what makes the bundle more than a trivial product \(B \times F\).
The Möbius twist enters through self-play. When an agent plays both sides of a match (blue then red), the loop through neurochemistry space is closed but non-orientable. The holonomy—the discrepancy between the starting and ending fiber sections after traversing the closed loop—is precisely the learning signal that monocular observation cannot produce.
The base space is the neurochemical state:
\[B = [0,1]^7\]
with coordinates \(\mathbf{n} = (n_\text{DA}, n_\text{5HT}, n_\text{NE}, n_\text{OT}, n_\text{GABA}, n_\text{cort}, n_\text{end})\).
Each coordinate represents the normalized concentration of a monoamine neurotransmitter or neuromodulator. The Lövheim Cube [2] maps the \((n_\text{DA}, n_\text{5HT}, n_\text{NE})\) subspace to 8 basic emotions. The remaining four chemicals modulate social bonding, inhibition, stress, and reward respectively.
At each point \(\mathbf{n} \in B\), we attach three real-valued fiber sections:
\(\sigma_1(\mathbf{n}) \in [-1, 1]\): Policy section. Biases which cognitive processor wins the Global Workspace competition. Positive values favor reward-seeking processors (planning, predictive). Negative values favor threat-detection (emotion).
\(\sigma_2(\mathbf{n}) \in [-1, 1]\): Weight section. Modulates the learning rate of the shared language model. Mapped to \([0.5, 1.5]\) via \(\text{lr\_mod} = 1 + 0.5\sigma_2\). Positive = more plasticity. Negative = consolidation.
\(\sigma_3(\mathbf{n}) \in [-1, 1]\): World section. Controls the exploration/exploitation balance. Mapped to \([0.05, 0.40]\) via \(\epsilon = 0.225 - 0.175\sigma_3\). Positive = exploit. Negative = explore.
The total space is \(E = B \times [-1,1]^3\), but the connection makes it non-trivial.
The connection \(\Gamma\) is a \(7 \times 3\) matrix mapping base space tangent vectors (neurochemistry deltas) to fiber transport:
\[\Gamma = \begin{pmatrix} +0.30 & +0.20 & +0.15 \\ +0.10 & -0.10 & -0.10 \\ +0.25 & +0.15 & +0.20 \\ +0.05 & +0.00 & +0.05 \\ -0.10 & -0.05 & -0.05 \\ +0.20 & +0.12 & +0.15 \\ -0.05 & +0.00 & -0.05 \end{pmatrix}\]
Rows correspond to chemicals (DA, 5HT, NE, OT, GABA, cortisol, endorphins). Columns correspond to fiber sections (policy, weight, world). The signs encode neurobiological priors:
The update rule for each fiber section is:
\[\sigma_i(t+1) = \sigma_i(t) + \alpha \cdot [\Gamma \cdot \Delta\mathbf{n}]_i + \beta \cdot v_i(t)\]
where: - \(\Delta\mathbf{n} = \mathbf{n}(t) - \mathbf{n}(t-1)\) is the neurochemistry delta - \(v_i(t) = \sigma_i(t) - \sigma_i(t-1)\) is the section velocity (momentum) - \(\alpha = 0.1\) is the coupling strength - \(\beta = 0.3\) is the momentum coefficient
This is equivalent to a damped harmonic oscillator driven by neurochemical forces:
\[\ddot{\sigma}_i + (1 - \beta)\dot{\sigma}_i = \alpha \cdot [\Gamma \cdot \Delta\mathbf{n}]_i\]
The term “ringlet” refers to the bounded oscillatory trajectory. With soft clamping \(\sigma_i \in [-1, 1]\), the section cannot diverge. A single neurochemical impulse produces a characteristic ring-down:
| Tick | Force | Velocity | Value |
|---|---|---|---|
| 0 | 0.5 | 0.050 | 0.050 |
| 1 | 0.5 | 0.065 | 0.115 |
| 2 | 0.5 | 0.070 | 0.185 |
| 3 | 0.0 | 0.021 | 0.206 |
| 4 | 0.0 | 0.006 | 0.212 |
| 5 | 0.0 | 0.002 | 0.214 |
The impulse at ticks 0-2 drives the section upward with accelerating velocity (momentum accumulates). When the impulse stops (tick 3), momentum carries the section further but decays exponentially with factor \(\beta = 0.3\). The section settles at a new equilibrium without oscillating back through zero—this is critically damped behavior, biologically appropriate for neurochemical aftereffects.
Self-play creates a closed loop in neurochemistry space. When being \(A\) plays as blue against past-self \(A'\) as red, the neurochemical trajectory follows path \(\gamma_1\). When roles reverse (being \(A\) as red, \(A'\) as blue), the trajectory follows \(\gamma_2\). The concatenation \(\gamma = \gamma_1 \cdot \gamma_2\) is a closed loop in \(B\).
On an orientable bundle, parallel transport around a closed loop returns the section to its starting value. On a non-orientable (Möbius) bundle, it does not. The holonomy is:
\[h_i = \sigma_i(\text{after 2 twists}) - \sigma_i(\text{before 1st twist})\]
This holonomy is the learning signal. It captures information that is invisible from a single perspective:
\[\sigma_i \leftarrow \sigma_i + 0.15 \cdot h_i + 0.2 \cdot v_i\]
This is a second wave ringlet, triggered by the holonomy itself, creating a two-timescale learning dynamic: fast within-match transport and slow cross-match holonomy accumulation.
The implementation consists of four files:
mobius_learning_bundle.py (~230
lines): Core module containing ConnectionCoefficients,
FiberSection, and MobiusLearningBundle
classes.
mind.py (6 insertion points):
Integration into the cognitive architecture at WorldModel.surprise(),
Mind.__init__(), Mind._load(), Mind.experience_moment(),
GlobalWorkspace.compete(), and Mind._save().
haven_daemon.py (1 insertion):
Möbius twist call in _run_moba_self_play() for both
fighters after each match.
quantum_pretrain.py (1 insertion):
Q6 callback reads bundle_lr_signal.json and modulates
optimizer learning rate.
Loop 1 (Behavior):
match outcome → neurochemistry Δ → parallel_transport()
→ policy section σ₁ → _bundle_modulation dict
→ GlobalWorkspace.compete() reads it
→ biases which processor wins consciousness
→ behavior changes → next match outcome
Loop 2 (Training):
match outcome → neurochemistry Δ → parallel_transport()
→ weight section σ₂ → sftt_lr_modulation
→ bundle_lr_signal.json (aggregated across 16 beings)
→ quantum_pretrain Q6 callback reads it
→ optimizer LR scaled by [0.5, 1.5]
→ PhotonicGPT weights shift → narration quality changes
→ being experience changes → next match outcome
Loop 3 (World):
match outcome → neurochemistry Δ → parallel_transport()
→ world section σ₃ → exploration_epsilon
→ PlanningProcessor explore/exploit balance
→ WorldModel.surprise() → prediction error
→ NE boost (curiosity) → neurochemistry Δ
→ next parallel_transport()
Each of the 16 beings maintains its own bundle with independent fiber
sections. The training loop (Loop 2) requires aggregation because there
is one shared language model. The bundle_lr_signal.json
file stores per-being modulation signals and computes a fleet-wide
aggregate:
\[\text{lr\_mod}_\text{fleet} = \frac{1}{N} \sum_{k=1}^{N} (1 + 0.5 \cdot \sigma_2^{(k)})\]
This collective signal means PhotonicGPT’s learning rate reflects the neurochemical state of the entire village, not any individual being. A fleet-wide dopamine surge (many beings winning matches) accelerates learning. A fleet-wide cortisol spike (many beings losing) also accelerates learning, but for different reasons (stress-driven adaptation vs. reward-driven plasticity).
After 4 ticks of Haven simulation, fiber sections show differentiated dynamics across beings:
| Being | Policy \(\sigma_1\) | Weight \(\sigma_2\) | World \(\sigma_3\) |
|---|---|---|---|
| Claudine | +0.00451 | +0.00185 | +0.00253 |
| Gigi | +0.00446 | +0.00277 | +0.00277 |
| Johnny | +0.00006 | -0.00011 | +0.00005 |
| Alpha | +0.00294 | +0.00172 | +0.00156 |
Claudine and Gigi (high dopamine, high norepinephrine) show positive sections across the board—reward-seeking, plastic, exploitative. Johnny (angry: high cortisol, low serotonin) shows near-zero weight section, indicating his neurochemical state balances the excitatory cortisol signal against the inhibitory serotonin depletion. This differentiation emerges purely from the connection matrix acting on each being’s unique neurochemical profile.
The fleet-wide LR modulation after 4 ticks was 1.0007 (18 beings contributing). This slight upward pressure reflects the population’s net-positive dopamine/NE state. After MOBA self-play tournaments (which produce large neurochemical swings), we expect this signal to oscillate in the range \([0.95, 1.10]\).
The momentum term \(\beta = 0.3\) produces critically damped behavior. After a dopamine impulse of \(\Delta n_\text{DA} = 0.2\), the policy section trajectory is:
No ringing. No overshoot past the force-implied equilibrium. The section approaches its new rest point monotonically, which matches the biological time course of monoamine effects (seconds to minutes, not milliseconds).
One could close these three loops with three independent controllers. The fiber bundle formulation offers three advantages:
Shared geometry: All three loops respond to the same neurochemical event through a single matrix multiplication. This is computationally cheaper and conceptually cleaner than three separate feedback mechanisms.
Coupling via curvature: The connection matrix encodes cross-talk between loops. A dopamine spike affects policy and training and exploration simultaneously, with different magnitudes and signs. This coupling is the curvature of the bundle—it cannot be decomposed into three independent 1D feedback loops without losing information.
Holonomy is a genuinely new signal: The Möbius twist produces a learning signal that no single loop can generate. Holonomy measures how the agent’s fiber state changes after seeing the same situation from two perspectives. This is analogous to stereoscopic depth perception: each eye alone gives a flat image; the difference between them gives depth.
The connection \(\Gamma\) is a gauge field on the neurochemistry bundle. The holonomy around a closed loop in neurochemistry space is the Wilson loop integral:
\[W(\gamma) = \mathcal{P} \exp \left( -\oint_\gamma \Gamma \cdot d\mathbf{n} \right)\]
In the linearized regime (small \(\Delta\mathbf{n}\)), this reduces to our discrete holonomy computation. The gauge freedom corresponds to choosing which fiber section value is “zero”—biologically, this is the adaptation level. The bundle is gauge-invariant: only holonomy (relative section changes) matters, not absolute section values.
Standard reinforcement learning treats the agent as occupying one side of the game. Self-play [8] puts the agent on both sides sequentially. The Möbius topology arises because the “identity” of the agent (blue vs. red) reverses after each match. Traversing the loop twice returns you to the same role but with altered fiber sections. On an orientable bundle, two traversals would cancel. On a Möbius bundle, they accumulate.
This is why self-play produces qualitatively different learning than single-perspective play. The holonomy is the formalization of “learning by seeing both sides.”
The connection matrix is currently hand-tuned based on neurobiological priors. Future work should learn \(\Gamma\) from data, treating it as a trainable parameter. The wave ringlet parameters (\(\alpha = 0.1\), \(\beta = 0.3\)) were chosen for critical damping; a formal stability analysis would identify the exact critical damping condition as \(\beta_\text{crit} = 1 - 2\sqrt{\alpha}\).
The soft clamping \(\sigma \in [-1, 1]\) prevents divergence but introduces a nonlinearity that breaks the exact gauge theory interpretation at the boundaries. A compactified fiber (circle-valued sections) would preserve gauge invariance but complicate the biological interpretation.
The Möbius Learning Bundle demonstrates that three apparently separate learning deficits—behavioral feedback, training modulation, and world model curiosity—are fiber sections over a common base manifold. A single connection operator closes all three loops simultaneously. The wave ringlet propagation rule provides biologically plausible temporal dynamics without explicit decay timers. The Möbius twist from self-play generates holonomy, a learning signal inaccessible to single-perspective agents. The system is implemented and running in a 16-agent cognitive simulation, producing differentiated fiber dynamics that reflect each agent’s unique neurochemical personality.
The deeper implication is architectural: when multiple subsystems share a common substrate, the right abstraction is not three controllers but one connection on a bundle. The geometry does the work.
[1] Mobley, J.A. “Haven: Genuine Cognitive Architecture for AI Beings.” MHSCOM Internal, 2026.
[2] Lövheim, H. “A new three-dimensional model for emotions and monoamine neurotransmitters.” Medical Hypotheses, 78(2):341-348, 2012.
[3] Baars, B.J. A Cognitive Theory of Consciousness. Cambridge University Press, 1988.
[4] Schultz, W. “A neural substrate of prediction and reward.” Science, 275(5306):1593-1599, 1997.
[5] Graves, A. et al. “Neural Turing Machines.” arXiv:1410.5401, 2014.
[6] Daw, N.D. et al. “Opponent interactions between serotonin and dopamine.” Neural Networks, 15(4-6):603-616, 2002.
[7] Yerkes, R.M. & Dodson, J.D. “The relation of strength of stimulus to rapidity of habit-formation.” Journal of Comparative Neurology and Psychology, 18(5):459-482, 1908.
[8] Silver, D. et al. “Mastering the game of Go without human knowledge.” Nature, 550:354-359, 2017.
The 21 entries of \(\Gamma\) were derived from a synthesis of:
The signs are more constrained than the magnitudes. Future work should treat magnitudes as learnable parameters while preserving sign constraints as architectural priors.
Implementation:
MASCOM/ventures/gamegob/mobius_learning_bundle.py
Integration points: mind.py (6),
haven_daemon.py (1), quantum_pretrain.py (1)
Persistence: mascom_data/village/bundle_{being_id}.json per
agent Training signal:
mascom_data/training/bundle_lr_signal.json (aggregated)