Mobley Helms Strategic Systems John Mobley, Founder & CEO | Ron Helms, General Partner
February 2026
We present PhotonicMind, a novel cognitive architecture for artificial general intelligence (AGI) that rejects the prevailing paradigm of scaling language models and instead builds intelligence from first principles of biological perception. PhotonicMind processes raw screen photons through a complete biological vision pipeline — from sRGB gamma decoding to LMS cone excitation, through retinal circuits, saccadic eye movements, object binding, and semantic word understanding — before making decisions through a neural network trained via Hebbian plasticity and teacher-student imitation learning. No large language model operates in the perception-action loop. The system learns from experience, accumulates memory, predicts outcomes before acting, regulates its own energy through emotional state transitions, and evolves its cognitive parameters through MAP-Elites quality-diversity search. Operating as the core intelligence of MASCOM (Mobleysoft Autonomous Systems Commander), PhotonicMind autonomously manages a portfolio of 124+ digital ventures across defense, finance, AI, developer tools, and entertainment. This paper describes the architecture, its biological foundations, the seven integrated subsystems, and early operational results.
The dominant paradigm in AI — training ever-larger transformer models on internet-scale text corpora — has produced systems that are remarkably fluent but fundamentally brittle. These systems lack grounded perception, cannot learn from a single experience, have no persistent memory across sessions, cannot predict the consequences of their actions before executing them, and have no mechanism for knowing when they are stuck. They generate plausible text about the world but do not perceive it.
The biological brain solves intelligence differently. Vision is not an API call — it is a cascade of photochemical, neural, and computational processes that transforms light into actionable understanding in under 200 milliseconds. Memory is not a database query — it is associative, context-dependent, and strengthened by emotional salience. Decision-making is not token generation — it is a competition between neural populations, modulated by neurotransmitter systems that encode confidence, novelty, and reward history.
Intelligence emerges from the interaction between grounded perception, predictive modeling, and energetic regulation — not from statistical language generation.
PhotonicMind implements this thesis. Every computational layer is modeled on how biological systems actually process information, from the photoreceptor mosaic in the retina to the dopaminergic reward signals that modulate learning. The system is entirely proprietary — no OpenCV, no pretrained vision models, no LLM in the loop. The logic is ours. numpy provides matrix algebra; scipy.ndimage provides fast convolution; PIL loads images. Everything else is built from scratch.
PhotonicMind is the perception-cognition-action core of MASCOM, a fully autonomous system that manages MobCorp’s venture portfolio. MASCOM operates macOS applications (Safari, Terminal, Finder) through screen perception and mouse/keyboard control, executing tasks ranging from website deployment to system health monitoring. PhotonicMind provides the eyes, brain, and hands.
PhotonicMind implements a seven-layer cognitive architecture. Each layer has a clear biological analog and a formal computational specification.
┌─────────────────────────────────────────────────────────────────┐
│ LAYER 7: EVOLUTIONARY DISCOVERY (MAP-Elites + CMA-ES) │
│ Discovers which cognitive configurations work best for which │
│ task types. 52-parameter genome. Quality-diversity search. │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 6: METABOLIC KNOWLEDGE (SADIE Cycle) │
│ Search → Absorb → Dissolve → Integrate → Emerge │
│ Composes KnowledgeBase, Braid, TaskMaster, Weaves, Complexity │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 5: THALAMIC INTEGRATION │
│ Central relay hub. 12 modalities. Global workspace. │
│ Temporal binding. Attention gating. │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 4: COGNITIVE BRAIN (8 Subsystems) │
│ PFC, Cerebellum, Hippocampal Replay, Neuromodulation, │
│ Default Mode Network, Salience, Metacognition, Mirror System │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 3: PREDICTION-REALITY ALIGNMENT (FeedbackLoop) │
│ Predict → Act → Compare. Emotional states. Energy regulation. │
│ Contract enforcement. Action suppression. Introspection. │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 2: DECISION & LEARNING │
│ NeuralDecisionEngine: 42-dim features → 6 actions. │
│ Hebbian plasticity. Teacher-student imitation learning. │
│ Hippocampal memory. Pattern consolidation. │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 1: BIOLOGICAL PERCEPTION │
│ Photon Capture → Eye Optics → Cone Mosaic → Phototransduction │
│ → Retinal Circuit → Saccades → Object Binding → VWFA → Scene │
└─────────────────────────────────────────────────────────────────┘
PhotonicMind does not call a vision API. It implements the complete pathway by which light becomes perception in the mammalian visual system.
Stage 1 — Photon Capture (PhotonSource). Screen pixels emit RGB light. We convert through the full physical pathway: sRGB gamma decoding (IEC 61966-2-1), linear RGB to CIE XYZ tristimulus values, then XYZ to LMS cone excitation space via the Hunt-Pointer-Estevez transform. The resulting tensor represents actual photon catch rates for each of the three cone types (L: 564nm, M: 534nm, S: 420nm). This is what the retina physically receives.
Stage 2 — Eye Optics (EyeOptics). Pupil diameter adapts to mean luminance via the Watson & Yellott (2012) model: D = 4.9 − 3·tanh(0.4·log₁₀(L)), clamped to the biological range of 2–8mm. Foveal resolution follows the cone density gradient: 200,000 cones/mm² at the foveal center, falling to 10,000 cones/mm² at 20° eccentricity. This is modeled by spatially-varying Gaussian blur (σ=0 at fovea, σ=2 at parafovea, σ=6 at periphery).
Stage 3 — Cone Mosaic (ConeMosaic). An irregular array of L (62%), M (32%), and S (6%) cones tiles the retinal image. Each cone samples only its wavelength channel, producing a sparse, interleaved signal. This matches the biological mosaic where each photoreceptor type has its own spatial distribution and the brain must reconstruct full-color images from incomplete sampling.
Stage 4 — Phototransduction. The Naka-Rushton compressive nonlinearity (R = R_max · I^n / (I^n + σ^n), with Hill coefficient n=0.74) converts photon catch rates to neural currents. Critically, photoreceptors signal by hyperpolarization — more light produces less output. σ adapts slowly to the ambient light level, giving the system a dynamic range spanning 14 orders of magnitude.
Stage 5 — Retinal Circuit (RetinalCircuit). Horizontal cells compute lateral inhibition (center-surround antagonism). Bipolar cells split into ON and OFF pathways — two parallel processing streams for light increments and decrements. Ganglion cells produce the output: Midget/P cells (80%, high spatial resolution, color-opponent), Parasol/M cells (10%, motion/transients), and Bistratified/K cells (blue-yellow opponent). Color opponency channels (L−M red-green, S−(L+M) blue-yellow) are computed from the filled cone responses.
Stage 6 — Saccadic Eye Movements (SaccadeController). Four fixations per frame, planned from a saliency map with inhibition of return. Each fixation captures high-resolution foveal detail at one location; the scene percept accumulates across fixations via max pooling.
The ObjectBinding layer performs the function of inferotemporal cortex: combining shape (from retinal edges), color (from opponent channels), and text (from OCR) into unified object percepts. Classification is purely visual — aspect ratio, brightness, edge density, position, and color determine whether a rectangle is a button, input field, tab, link, or panel. No keyword heuristics. No DOM access.
Named for the left fusiform gyrus region that converts visual word forms into semantic representations, our VWFA bridges perception and understanding. Recognized text from the OCR pipeline is embedded into 768-dimensional vectors via a local embedding model (nomic-embed-text running on Ollama). These vectors are matched against a vocabulary of 36 semantic concepts spanning UI elements, actions, states, and domain knowledge. The system does not “ask” what something means — it perceives meaning directly from the visual form of words.
Scene classification combines visual structure (number of inputs, buttons, interactive elements) with text content to categorize the current screen as login, landing page, dashboard, or unknown. A scene hash (MD5 of sorted element labels) enables memory lookup and change detection.
The complete perception pipeline — from screen capture through retinal processing, OCR, VWFA, object binding, and scene classification — executes in under 500ms on commodity hardware. No GPU required. No cloud API calls.
The NeuralDecisionEngine maps perception to action through learned weights, not rules. The architecture:
encode(element, context) → 42-dimensional feature vector
8 visual features (brightness, edge density, aspect, area, position, text presence)
7 element type (one-hot)
9 color (one-hot)
4 scene type (one-hot)
4 task relevance (word overlap, keyword signals)
6 sequence features (last action, session history)
2 memory features (recall confidence, best known action)
2 history features (times acted on, last outcome)
features @ W + bias → 6 action scores (click, type, clear_and_type, key, done, stuck)
argmax → selected (element, action) pair
When the CognitiveBrain (Layer 4) is attached, 32 additional cognitive features are grown via neurogenesis — extending the feature vector to 74 dimensions. An optional hidden layer is born when cognitive features are added, creating a two-layer network with ReLU activation.
The decision engine learns through a biologically-inspired teacher-student paradigm. The “teacher” is a reflexive pattern-matching system that parses task descriptions (“click X”, “type Y”) and identifies the correct element and action. The neural network observes these teacher decisions and trains to reproduce them via Hebbian learning:
ΔW = η · reward · features^T · (target − prediction)
When the student’s imitation accuracy exceeds 80% over accumulated decisions, it “graduates” and can make autonomous decisions when the teacher has no applicable rule. This mirrors how motor skills transfer from conscious (cortical) to automatic (cerebellar) control.
Every action and its outcome are stored in a SQLite-backed hippocampal memory system. Two learning mechanisms operate:
The hippocampus also persists neural network weights, ensuring that learning survives across sessions.
Depression and anxiety are not bugs — they are features. When a biological organism’s prediction systems fail repeatedly, the brain drains energy to force introspection. You cannot just keep clicking the same button. You must stop, reflect, update your model, and cautiously test new predictions. PhotonicMind implements this insight directly.
Before every action, the FeedbackLoop formulates a prediction: “If I click this button, the screen should change.” After the action, it compares prediction to reality:
The system transitions through four emotional states based on recent prediction accuracy (5-step window):
| State | Prediction Accuracy | Behavior |
|---|---|---|
| Active | > 60% | Full energy, normal operation |
| Frustrated | 30–60% | Reduced energy, starting to suppress failed actions |
| Anxious | 10–30% | Low energy, many suppressed actions |
| Depressed | < 10% | Energy depleted, forced introspection, task termination |
These are not metaphorical labels. They are functional states that directly alter the system’s behavior — just as biological emotional states alter an organism’s engagement with its environment.
Four hard contracts prevent pathological behavior:
stuck (pure repetition detection)stuck (stagnation detection)stuck (model failure detection)When a contract triggers, the system performs forced introspection — analyzing which actions were most repeated, how many unique screen states were seen, and generating a self-diagnosis of why predictions are failing.
The teacher system includes a “done signal” — if the task is “open X”
and the system has (a) clicked X, (b) observed a screen change, and (c)
confirmed X is visible in the current elements, it returns
done. This solves the fundamental problem of knowing when
to stop.
PhotonicMind’s base perception-action loop (Layers 1–3) is augmented by eight brain subsystems, each modeling a distinct cognitive function:
Maintains a bounded working memory (capacity 3–12 items, tunable) with temporal decay. Decomposes compound goals (“open X then click Y”) into sub-goal sequences. Tracks time on goal and stuckness. Produces an 8-dimensional context vector encoding goal depth, sub-goal progress, working memory load, recency, and stuck duration.
Predicts the outcome of each action before execution. Maintains internal models that learn from prediction errors. When predicted failure confidence exceeds a threshold, the cerebellum inhibits the action — preventing execution before the error occurs. Learning rate, prediction horizon, and confidence threshold are all evolvable parameters.
Stores experiences in a prioritized replay buffer. During idle periods, replays batches of high-priority experiences (weighted by prediction error), reinforcing successful patterns and weakening failed ones. This mirrors the biological process where the hippocampus replays daily experiences during sleep to consolidate them into cortical long-term memory.
Models four neurotransmitter systems:
Activates during idle periods (no active task). Runs consolidation cycles that replay experiences, update forward models, and “imagine” action sequences. Produces insight reports. This mirrors the biological DMN that activates during mind-wandering and is associated with creativity and planning.
Filters the full set of perceived elements down to the most task-relevant subset. Combines top-down (working memory, goal relevance) and bottom-up (visual saliency, novelty) signals. Attention breadth is modulated by norepinephrine levels. High salience elements are prioritized for decision-making.
Monitors the decision engine’s own confidence. Tracks calibration (does 80% confidence correspond to 80% success?). When confidence drops below a threshold or calibration diverges, triggers a strategy switch — forcing exploration of alternative actions. Implements the “knowing what you don’t know” capacity that prevents overconfident repetition.
Learns from recorded demonstrations (training traces). When live decision confidence is low, retrieves similar situations from the trace database and biases the decision toward the demonstrated action. Learning rate and demo weight are evolvable parameters.
The biological thalamus is not merely a relay station. It is the central integrator that normalizes disparate sensory modalities into a common format, gates attention, and creates the unified “global workspace” that constitutes conscious awareness. MASCOM faces the same challenge: it has 12 input modalities (vision, task queue, event bus, HAL state, captain’s log, terminal, drive, venture health, motor actions, verification, observer) each speaking different languages with different latencies and bandwidths.
The Thalamus module (thalamus.py) implements:
{seq, ts, modality, source, data}.A critical design principle: no subsystem talks directly to another subsystem. All inter-component communication flows through the thalamus. This prevents the combinatorial explosion of point-to-point connections and ensures that every event is logged, normalized, and attention-filtered before reaching any consumer.
Knowledge is not storage. It is a metabolic process — the cognitive analog of digestion. Raw information must be searched for, absorbed, dissolved into primitives, integrated with existing understanding, and reconstituted as emergent insight. The Cognitive Search Engine implements this as the SADIE cycle:
SEARCH (KnowledgeBase): Query across 75 knowledge domains containing 2,961 concepts. Identify gaps — what do we not know that we should? Generate synthesis targets — which concepts should be cross-referenced?
ABSORB (TheBraid): Structure raw results using braid topology — a mathematical framework for tracking how knowledge strands interweave. Pattern detection identifies recurring structural similarities across domains.
DISSOLVE (ComplexityTheory): Break structured knowledge into atomic primitives. Compute implementation codons — the minimal units of actionable knowledge. Score complexity using information-theoretic metrics.
INTEGRATE (TaskMaster): Inject dissolved primitives into the belief system and task planning hierarchy. Update the knowledge tree. Track which facts support which beliefs.
EMERGE (WeaveManager): Asynchronous recombination. Weave dissolved primitives together to discover novel concepts that did not exist in any input. Identify emergent patterns. Generate new search targets — completing the metabolic cycle.
Every cycle is persisted to SQLite (cycles, discoveries, knowledge graph, search queue tables). The engine supports continuous operation — running SADIE cycles indefinitely, accumulating knowledge, and feeding contextual enrichment back to the CognitiveBrain during live decision-making.
The Cognitive Brain (Layer 4) has 52 tunable parameters: working memory capacity, decay rates, prediction horizons, neurotransmitter baselines, attention thresholds, confidence calibration, learning rates. Setting these by hand is intractable. Different task types demand different configurations — a navigation task benefits from broad attention and high exploration; a data entry task benefits from narrow focus and low exploration.
All 52 parameters are encoded as a genome — a real-valued vector in [0, 1]⁵² that maps to the actual parameter ranges of each brain subsystem. The genome supports:
Rather than optimizing for a single best genome, we use MAP-Elites (Mouret & Clune, 2015) to maintain an archive of diverse high-performing configurations indexed by behavioral descriptors:
Each cell in the 7×5 grid holds the best-performing genome for that behavioral niche. New genomes compete to enter the archive only against the occupant of their own cell, preserving diversity across the entire task space.
Once MAP-Elites identifies promising niches, CMA-ES (Covariance Matrix Adaptation Evolution Strategy; Hansen, 2006) performs continuous optimization within each niche. CMA-ES adapts the mutation distribution’s covariance matrix to follow the local fitness landscape, providing efficient optimization in 52-dimensional space without gradient information.
The RuntimeBrainSelector module selects the appropriate genome from the MAP-Elites archive based on the incoming task’s type and estimated difficulty, instantiates a CognitiveBrain with that genome’s parameters, and hot-swaps it into the live system. This means the system’s cognitive configuration changes for each task — a form of adaptive intelligence that static architectures cannot achieve.
PhotonicMind operates under a graduated autonomy model enforced by the HAL State Machine. Eight states define the system’s authority level:
| State | Name | Authority |
|---|---|---|
| o | Off | Dormant, no perception |
| g | Green | User in control, screen capture active |
| y | Yellow | Shared control, idle detection active |
| a | Orange | Recording mode, learning at scale |
| r | Red | HAL in command (user stepped away) |
| p | Purple | Self-operate + self-record + self-learn |
| i | Indigo | Deep autonomy, nightmode |
| w | White | Self-learning training mode (gauntlet) |
Not every state is reachable from every other state. Transitions are validated against a formal transition graph stored as data, not code. Auto-transition rules handle common patterns (yellow + idle → red; red + user activity → yellow). Every transition is logged with timestamp, source, and reason.
All components operate under formal Design by Contract (Meyer, 1992):
Example — Task Lifecycle Contract:
PRECONDITION: task exists in tasks.db with status='pending'
ACTION: TaskSource.get_next_task()
POSTCONDITION: task.status='in_progress' AND task.started_at IS NOT NULL
INVARIANT: status transitions follow: pending → in_progress → {completed, failed}
Contract violations are detected and reported via the thalamic verification modality with the highest attention weight (10), ensuring immediate system response.
PhotonicMind runs entirely on commodity hardware. No GPU. No cloud API in the perception-action loop. No pretrained models (except the optional local embedding model for VWFA). The system can operate air-gapped.
Unlike systems that require millions of examples, PhotonicMind learns from every single interaction. One successful click on a button labeled “DEPLOY” creates a hippocampal memory that biases future decisions. Pattern consolidation aggregates these into statistical knowledge over time.
The full perception-decision-action cycle executes in under 500ms. The retinal pipeline (photon capture through object binding) takes 200–350ms. The decision engine takes 10–50ms. Motor execution (human-kinematic mouse movement) takes 150–500ms depending on distance.
The MotorSystem implements Fitts’ Law for mouse movement timing, minimum-jerk trajectories (6t⁵ − 15t⁴ + 10t³), Gaussian position noise, and human typing patterns including fast bigrams (th, he, in, er), hand alternation effects, and random micro-pauses. The system’s mouse and keyboard behavior is indistinguishable from a human operator.
| Dimension | LLM-Based Agents | PhotonicMind |
|---|---|---|
| Perception | Screenshot → API call → text description | Photons → biological retina → neural features |
| Decision | Token generation (100ms–10s) | Weight matrix multiplication (10ms) |
| Memory | Context window (fixed) | Persistent hippocampal DB (unbounded) |
| Learning | Fine-tuning (offline, expensive) | Hebbian plasticity (online, per-action) |
| Prediction | None | Cerebellum forward models + FeedbackLoop |
| Self-regulation | None | Emotional states, energy, introspection |
| Adaptation | Static architecture | 52-parameter evolutionary optimization per task |
| Dependencies | Cloud GPU, API keys, bandwidth | Local CPU, no network required |
| Latency | 1–30s per action | < 500ms per action |
PhotonicMind is a biologically-grounded cognitive architecture — not a simulation of biology, but an engineering system inspired by biological principles. We call this discipline cognitive architecture engineering: the design of artificial minds using computational models of biological cognitive processes, validated by operational performance rather than biological fidelity.
PhotonicMind is not a brain simulation. We do not model individual neurons, synaptic vesicle dynamics, or ion channel kinetics. We model the computational functions that biological systems perform — center-surround contrast enhancement, temporal change detection, predictive coding, dopaminergic reward signaling — at the level of abstraction where they can be implemented efficiently on digital hardware while preserving their functional role.
We believe AGI will not emerge from scaling a single architecture. It will emerge from the integration of multiple specialized cognitive systems — perception, prediction, memory, emotion, attention, metacognition, knowledge metabolism, and evolutionary self-improvement — under a unified control architecture. PhotonicMind is our implementation of this belief. The system is operational, learning, and managing real-world tasks today.
PhotonicMind demonstrates that an alternative path to capable AI systems exists — one that builds from physics and biology rather than from statistical language modeling. By implementing grounded perception (photons → retinal circuits → object binding), predictive control (FeedbackLoop), emotional self-regulation (energy and state transitions), thalamic integration (global workspace), metabolic knowledge processing (SADIE), and evolutionary cognitive optimization (MAP-Elites + CMA-ES), we have created a system that perceives, decides, acts, learns, predicts, reflects, and evolves — without a single LLM call in the loop.
The system is not a research prototype. It is the operational intelligence managing a portfolio of 124+ digital ventures. Every architectural decision described in this paper is implemented, tested, and running in production.
We invite the research community and potential collaborators to engage with these ideas. The dominant paradigm of “make the language model bigger” is a local optimum. There are other mountains to climb.
Hansen, N. (2006). The CMA Evolution Strategy: A Tutorial. arXiv:1604.00772.
Meyer, B. (1992). Applying “Design by Contract”. IEEE Computer, 25(10), 40–51.
Mouret, J.-B., & Clune, J. (2015). Illuminating search spaces by mapping elites. arXiv:1504.04909.
Naka, K. I., & Rushton, W. A. H. (1966). S-potentials from luminosity units in the retina of fish. Journal of Physiology, 185(3), 587–599.
Watson, A. B., & Yellott, J. I. (2012). A unified formula for light-adapted pupil size. Journal of Vision, 12(10), 12.
Contact: Mobley Helms Strategic Systems — mobleyhelms.com
System: MASCOM (Mobleysoft Autonomous Systems Commander) — mobcorp.cc
Copyright 2026 Mobley Helms Strategic Systems. All rights reserved.