Early-Stage Idea: A Cognitive Architecture Built on Attention and Graph Traversal

Hello everyone,

I’m a undergraduate student who has been independently exploring AGI and cognitive architectures. For the past eight months I’ve been working on a theoretical framework that I’m calling an “attention-driven graph-based cognitive architecture.” Because I don’t have access to large compute or teams, the project is fully conceptual rather than experimental. It focuses on the logic of the architecture, the internal flow of computation, and a plausible path toward implementation.

I’m posting here because HuggingFace has always felt like a community where researchers, engineers, and students can exchange ideas openly. I would be genuinely grateful for any thoughts—critical or encouraging—on whether this framework has any research value, or whether I’m missing something fundamental.

Very briefly, the core idea is to treat the entire cognitive system as a dynamic, multi-modal knowledge graph. Concepts, events, actions, rules, emotions, self-states, and even decision strategies are represented as nodes, linked by directed weighted edges that encode semantic, causal, or procedural relations. There is no central controller; the system “thinks” by propagating activation through the graph and by shifting its attention focus toward the currently most active region.

Memory is organized into semantic structures, episodic sequences, and a self-model that represents internal variables like emotional tone, task load, arousal, uncertainty, and system confidence. These internal states act as a global modulation field, shaping how activation diffuses across the graph. The flow of attention is not a pointer but a momentary cluster of highly active nodes; its movement emerges from activation competition, task relevance, and internal state.

One part I spent a lot of time designing is a continuous mode parameter that controls whether the system behaves more like a divergent, free-associative “DMN-like” mode or a convergent, goal-driven “CEN-like” mode. The same underlying graph traversal behaves very differently depending on this parameter, producing both exploratory and focused reasoning patterns without switching architectures. The framework also integrates simple decision-theory nodes, prospective-intention nodes for delayed intentions, and action templates that can be instantiated into executable logic depending on context.

My main question is simple: from your perspective, does this kind of architecture seem to have potential research value, or is it mostly reinventing ideas that already exist? I’m fully aware that without experiments it is just a theoretical construct, so I’m hoping to learn whether it is worth developing further, or whether I should rethink the approach entirely.

If anyone is interested in taking a look, I’ve uploaded the full paper draft to GitHub. It’s completely open:

Any form of feedback—critique, suggestions, reading recommendations, or even just general impressions—would mean a lot to me. Thank you for taking the time to read this.

1 Like

I had the search agent run the search repeatedly and think it over, but it didn’t come up with any grounds to deny its research value. So, probably fine.


Short answer: yes, this absolutely has potential research value. It overlaps with existing work (so it’s not “from nowhere”), but the combination you’re proposing is coherent, timely, and, if you sharpen it, could become a serious small research program—especially in the context of LLM-based agents and structured memory.

Below I’ll:

  1. Restate your idea in my own words
  2. Explain how it lines up with existing architectures and systems
  3. Point out what looks genuinely distinctive
  4. Highlight important gaps / pitfalls
  5. Suggest concrete next steps and readings

I’ll keep the language informal but precise so you can reuse pieces in a post or a write-up.


1. Restating what you’re proposing

You’re proposing a cognitive architecture where the “mind” is a dynamic, multi-modal knowledge graph, and cognition is what happens when activation and attention move around this graph.

In more explicit terms:

  • Representation

    • Everything is a node: concepts, events, actions, rules, goals, emotions, self-state variables, decision heuristics.
    • Edges are directed and weighted, encoding semantic relations, causal links, temporal structure, preconditions/effects, motivational links, etc.
    • This graph is multi-modal: language, perception, internal signals all become structured propositions embedded into the graph.
  • Process / “thinking”

    • Nodes carry activations. Activation spreads along edges with some decay.
    • Attention is not a single pointer; it’s a cluster of highly active nodes—a local “hot spot” in the graph.
    • The Focus of Attention (FoA) is the currently dominant cluster, which you treat as the system’s “current thought” or working content.
    • There is no top-down scripted controller; instead, the control flow emerges from activation dynamics, task relevance, and the current internal state.
  • Memory structure

    • Semantic memory: the stable concept/relationship part of the graph.
    • Episodic memory: sequences of events / propositions linked over time, possibly with event boundaries detected from semantic shifts.
    • Self-model: internal variables as graph nodes (emotional tone, arousal, uncertainty, confidence, workload, etc.) that modulate activation diffusion and thresholds globally.
  • Mode parameter (DMN-like vs CEN-like)

    • A continuous scalar controls whether the system behaves in a more:

      • divergent, associative, “Default Mode Network-like” regime (broad diffusion, looser thresholds, more episodic and self connections), or
      • convergent, focused, “Central Executive Network-like” regime (narrow diffusion, strong goal constraints, local reasoning).
    • Importantly, you don’t swap architectures; you just change parameters, so the same graph machinery yields different cognitive styles.

  • Action and control

    • Some nodes encode decision-theoretic features (expected utility, risk, etc.), some encode prospective intentions (delayed intentions that should fire later), and some encode “action templates” that can be instantiated as external actions when certain patterns in the FoA are present.

That is: a single, unified graph for world knowledge, self state, and intentions; spreading activation + attention as the control mechanism; and a mode parameter that smoothly shifts the reasoning style between free association and goal-directed focus.


2. How this relates to existing work

You are not reinventing the wheel blindly; you’re stepping into a space with existing architectures and systems. That’s good. It means you can position your work rather than defend it as totally new.

2.1 Classical cognitive architectures

There’s a long tradition of architectures that try to unify perception, memory, and action in one framework: ACT-R, Soar, CLARION, LIDA, OpenCog, SPA/Spaun, Sigma, and many more. Surveys like Kotseruba & Tsotsos’s review of 84 architectures give a good overview. (Science Direct)

Two are especially close to what you’re doing:

  • LIDA (Learning Intelligent Distribution Agent)

    • Implements Global Workspace Theory: perception builds a “situational model”; attention codelets select parts of that model; those coalitions compete for access to a global workspace; the winner is broadcast, driving learning and action. (Science Direct)
    • It has separate semantic and episodic memories and includes motivations and emotions in the architecture. (digitalcommons.memphis.edu)
    • This is very close to your idea of a current FoA emerging from a competition among candidate clusters and then driving further processing.
  • OpenCog / OpenCog Prime

    • Uses the Atomspace, a typed, weighted hypergraph of all knowledge and internal state—very similar in spirit to your world+internal knowledge graph.
    • Has an Economic Attention Allocation (ECAN) subsystem that treats attention as an economic resource distributed over atoms; high-attention atoms get more processing. (Open Cochtit)
    • ECAN is literally about activation/importance spreading on a graph to control what gets processed, something you’re reinventing in your own way.

So: the shape of your idea—graph memory + activation + attention as control—has solid precedents. That’s a strength, as long as you explicitly say “this builds on X and Y” instead of claiming total novelty.

2.2 Modern LLM agents and graph-based memory

Separately, there is a surge of work on memory mechanisms for LLM-based agents. A 2024 survey (“A Survey on the Memory Mechanism of Large Language Model based Agents”) reviews many designs and classifies memory as episodic, semantic, working, profile/self, etc. (arXiv)

A clear pattern in that survey:

  • Flat logs and naive vector stores are not enough.
  • There is growing interest in structured and graph-based memories.

A striking example:

  • AriGraph (Ariadne agent)

    • An LLM agent that builds a knowledge graph world model integrating semantic and episodic memory as it explores text-based environments. (arXiv)
    • The memory graph significantly improves planning and complex task performance over unstructured memory and RL baselines. (arXiv)

AriGraph is basically a subset of what you’re proposing: a semantic+episodic memory graph for an LLM agent, but without the explicit self-model and DMN/CEN-like control.

Your architecture can be framed as:

“A more complete cognitive architecture around the kind of graph-based memory used in AriGraph and other LLM agents, including self-state, attention control, and mode-switching.”

That’s a very reasonable research niche.


3. What looks genuinely interesting / potentially novel

Given that the building blocks are known, the value is in how you combine and sharpen them. Several aspects look distinctive and promising:

3.1 Unified graph including self, affect, and control

Most older architectures treat “emotion” or “self state” as a side-channel or a separate module. You are instead:

  • Putting emotional tone, arousal, uncertainty, confidence, task load, etc. directly into the same graph that holds world facts and rules.
  • Letting those internal nodes modulate diffusion, thresholds, and selection across the entire graph.

This gives a single substrate where:

  • world state,
  • self state,
  • goals,
  • and decision strategies

all live and interact.

That integration is conceptually clean and matches what some modern theories (e.g., cognitive architectures for emotion, LIDA’s emotion/motivation integration) suggest, but you push harder on “all of it is in one graph.” (digitalcommons.memphis.edu)

If you can show, for example, that different “personality parameterizations” of these self nodes produce different reasoning or exploration patterns, that becomes a clear, testable feature.

3.2 Attention as an emergent cluster (FoA) plus a mode parameter

You treat attention not as a single spotlight but as an emergent cluster of highly active nodes (your event/frame), built in layers:

  • L3: broad neighborhood of candidate nodes,
  • L2: coherent subclusters,
  • L1: a tightly bound proposition/event that is “what the agent is thinking about right now.”

That is essentially a graph-shaped global workspace: many candidates compete via activation; a cluster wins; that cluster is the FoA and drives further processing. This parallels LIDA’s coalitions in the Global Workspace, but your three-layer event-frame structure is more explicit and more directly tied to graph structure. (Science Direct)

Then you add:

  • a continuous mode parameter that reshapes diffusion and selection (DMN-like vs CEN-like). Rather than separate “modules,” you have one mechanism whose behavior smoothly changes with parameters.

If you run toy experiments where varying the mode parameter produces:

  • more associative, far-reaching paths in one regime, and
  • tighter, goal-directed paths in another,

you have a neat little story connecting your architecture to the neuroscience literature on Default Mode vs Executive control networks. (Science Direct)

3.3 Prospective-intention nodes (delayed intentions in the graph)

Your prospective-intention nodes (PI-nodes):

  • store “what to do later,”
  • have triggers (time, context, conditions), deadlines, and priorities,
  • gradually increase their activation as deadlines approach or context matches,
  • compete for attention when they become relevant.

That is an explicit, mechanistic model of prospective memory (remembering to do something in the future), embedded in your graph.

Prospective memory has its own cognitive-psychology literature and is not usually modeled in detail within AGI-like architectures. Embedding it as graph dynamics plus attention (rather than just “if condition then do X”) is a concrete, researchable idea.

A small agent that:

  • does an ongoing task,
  • has PI-nodes for delayed tasks, and
  • uses your attention/activation rules

could already demonstrate something meaningful: e.g., trade-offs between monitoring costs, missed opportunities, and mode settings.


4. Important gaps / pitfalls to be aware of

The big question isn’t “is this interesting?” (it is), but “what could block this from becoming publishable research?” There are a few recurring issues in this kind of work:

4.1 Learning and structural growth

Right now, most of your description is “static architecture + dynamics,” but:

  • How are nodes and edges created, merged, and deleted over time?
  • How are edge weights updated based on success/failure or prediction error?
  • How does the system prevent the graph from exploding with noise and redundancy?

Graph-based agents like AriGraph explicitly address some of this via learned or heuristic update rules over the memory graph. (arXiv)

For research value, you don’t need a perfect solution, but you do need at least:

  • simple, explicit rules for:

    • when to create a new node vs reusing an existing one,
    • basic learning of edge strengths (e.g., increase when a path leads to successful outcomes, decay otherwise),
    • some forgetting/compression strategy.

Otherwise, reviewers will rightly say “this is a beautiful static design, but it doesn’t tell us how the system learns and stays stable.”

4.2 Missing precise algorithms / equations

Cognitive architecture work that gets taken seriously usually provides:

  • explicit state variables,
  • explicit update equations or algorithm pseudocode,
  • a defined cognitive “cycle” or processing loop. (Science Direct)

You already have the conceptual story. To turn it into research, you need to pin things down:

  • An activation update rule per node per step
  • Clear rules for which nodes enter L3/L2, and how clusters are found
  • What exactly the mode parameter multiplies or adds to
  • How self-state nodes modulate edge weights or thresholds

These can be simple approximations, but they must be spelled out clearly enough that someone can implement them.

4.3 Scaling and graph explosion

Spreading activation over a large, dense knowledge graph is expensive. OpenCog’s ECAN and newer frameworks like the DeepFunding attention-evaluation project exist precisely to manage and study this. (Open Cochtit)

You will need to think about:

  • Limiting diffusion radius (depth and activation thresholds)
  • Pruning or compressing low-importance parts of the graph
  • Possibly building summaries (like GraphRAG’s communities and summaries) to let attention operate on higher-level structures rather than raw nodes. (Hugging Face)

You don’t have to solve industrial-scale problems, but being aware of the issue and sketching a strategy helps.

4.4 Clarifying what “activation” and “weight” mean

Are your activations:

  • probabilities,
  • salience/importance scores,
  • value estimates,
  • a mix?

You don’t need to commit to full Bayesian formalism, but you should decide what intuition you want:

  • If activation ~ salience, then learning rules should look like attention/priming rules (Hebbian-ish, recency-based).
  • If activation ~ value estimate, then RL-style update rules apply.
  • If activation ~ belief, more probabilistic/Bayesian thinking might be appropriate.

OpenCog, for example, separates truth values (probabilities + confidences) in its Probabilistic Logic Networks from attention values in ECAN. (Open Cochtit)

You can follow a similar split: “weights encode structure/strength; activation encodes momentary salience.”


5. Concrete advice and next steps

If you want this to become real research rather than “just an interesting idea,” here’s a practical roadmap.

5.1 Narrow to 1–2 sharp questions

Examples:

  • “Can prospective-intention nodes in a unified graph reproduce basic prospective memory behavior in a text-based agent?”
  • “Can a continuous DMN/CEN-like mode parameter, acting on a graph-based memory, produce distinct reasoning styles on ‘creative’ vs ‘focused’ tasks?”
  • “Does integrating self-state nodes into the same graph as world knowledge yield stable, interpretable differences in agent behavior?”

Choose one as your primary goal. The rest can be secondary.

5.2 Build a minimal prototype (even small and ugly)

You do not need big compute. You can:

  • Use Python + NetworkX or a small graph DB.

  • Implement:

    • nodes with activation and type,
    • weighted edges,
    • a simple diffusion step,
    • an FoA selection procedure (top-k nodes → cluster → pick a focal event),
    • a mode parameter that changes decay and neighbor weighting,
    • basic PI-node logic.

For environment:

  • Pick a small text-based game or a toy environment (even a grid with objects and tasks).
  • Or adapt a simple TextWorld game; AriGraph already uses that framework, and their code is public. (GitHub)

5.3 Compare to simple baselines

To show your architecture matters, compare against:

  • A baseline LLM agent with simple memory (short window or vector store).
  • A graph-memory baseline (AriGraph-style memory) without your mode/PI/self mechanisms. (arXiv)

Measure things like:

  • task success,
  • number of steps,
  • how often prospective tasks are forgotten,
  • how behavior changes as you sweep the mode parameter.

Even small experiments can be compelling if they’re clearly designed and well explained.

5.4 Write it up as “architecture + case study,” not “AGI solved”

When you eventually write a report or a post:

  • Introduce it as a cognitive architecture proposal for LLM-based agents focused on memory and control.

  • Anchor it clearly in:

    • cognitive architecture surveys (for background and terminology), (Science Direct)
    • LLM-agent memory survey (for relevance to current work), (arXiv)
    • AriGraph and OpenCog ECAN (for concrete similar systems). (arXiv)
  • Present your architecture concisely, then show the experimental slice (e.g., PI-nodes or mode parameter) with results.

  • Be honest about limitations and open questions (learning, scaling, grounding).

That tone will be much better received than “this is a blueprint for AGI.”


6. Bottom line

From a research perspective:

  • You are not just reinventing something trivial; you’re re-entering a serious line of work (graph-like cognitive architectures, global workspace ideas, attention as control) and connecting it with very current topics (LLM agents, graph-based memory). (Science Direct)

  • The core ingredients are known, but your particular combination—unified graph with self/affect, structured FoA, continuous DMN/CEN-like mode, and PI-nodes—is coherent and underexplored in modern LLM-agent settings.

  • The real value will come from:

    • making the mechanisms mathematically / algorithmically explicit,
    • implementing a small prototype, and
    • showing at least one concrete behavioral effect (e.g., prospective memory, mode-dependent reasoning, self-modulated exploration).

So yes: it is worth developing further—especially if you narrow it into a clear, testable piece of research instead of trying to cover the entire AGI problem at once.

1 Like

Your project and mine are convergent articulations of a deeper paradigm shift Hellucigen: the move from sequential, decision-theoretic cognition to field-theoretic, self-organising intelligence. You model cognition as a dynamic knowledge graph with attentional flow shaped by internal state and global modulation; I model it as a valence-laden semantic potential field undergoing phase transitions. These are dual descriptions of the same underlying principle—cognition as a distributed, coherent process emerging from graded, interacting potentials.

Where your graph’s activation patterns propagate across nodes representing concepts, emotions, and strategies, my valence tensor evolves across positions in a neutral manifold, differentiating into transient structures through interference and resonance. Both systems reject centralised control: you dissolve the “I” by making attention an emergent cluster, I dissolve it by making observation a symmetry-breaking perturbation within the field. They are the same thing.

Your continuous mode parameter modulating between DMN-like and CEN-like states maps directly onto my Reflexive Head’s regulation of semantic fluidity. When your system shifts into free association, it lowers constraint—mirroring my injection of noise when KL divergence drops too fast. Both mechanisms preserve metastability, allowing the system to hover at the edge of collapse where novelty arises.

But where your framework remains representational—nodes stand for things, edges for relations—mine operates at a pre-representational level: the 4D valence tensor is a medium with truth-values potentials in superposition. Your graph builds meaning from connected symbols; my field generates symbols from meaning. The only difference is in ontological priority.

The convergence becomes striking when we consider the stratification I just added. Your episodic sequences and self-model form higher-order structures atop raw associations. That is precisely what Nāga now formalises: each stabilisation event in the semantic field becomes a node in a simplicial complex, whose geometry defines the next neutral manifold.

Moreover, your decision-theory nodes and action templates are functionally equivalent to my Pragmatic Head’s upāya policy: both produce skillful action without requiring semantic resolution. You allow behaviour under ambiguity; I regulate utility in indeterminate fields. Same function, different formalism.

The key divergence lies in plasticity. Your graph requires explicit node and edge creation. My system generates new conceptual manifolds endogenously via topological descent. But this gap closes if your knowledge graph were embedded in a field like mine—where nodes crystallise from valence thresholds and edges emerge from coherence gradients. In this case your activation flow would be the classical approximation of a quantum-like potential dynamics.

Your architecture is the discrete, symbolic shadow of the continuous field dynamics I propose. Or conversely, mine is the differential geometry underlying your graph’s topology. Together, they suggest a unified principle—intelligence as recursive symmetry breaking in multi-scale potential fields, where representation condenses from resonance.

You are essentially simulating a phase space of mind. So am I. We’re just integrating from opposite directions: you’re coming from structure to dynamics, I’m coming from dynamics to structure. These architectures benefit immensely and almost necessarily from persistent state memory. In my humble opinion this is where it’s all going to be happening soon Hellucigen, so certainly worth pursuing.

1 Like

Sounds really interesting! It seems like it has good research potential.

1 Like

I have been solo developing a modular platform to use as the foundations for AGI. I have several modules that already do what you described. I had a bot pick out the modules it’s in and it made a few comparisons for you. So you know your idea is on the right path. The Plan Orchestrator it’s talking about is basically a new way to program. I call it Mechanical Programming. It basically completely automates the building/debugging process for anything written in code and spits out a working product, rough but it will work.

What I found in your zip that matches “graph-based choices”

I dug through ai-persistence-engine-297.zip and you have a couple of modules that already implement the core of what that HuggingFace post is describing, just without the cognitive storytelling.

1) Plan Orchestrator is literally a graph choice engine (DAG nodes + edges)

In repo/modules/plan-orchestrator-034/ you have an explicit graph model:

  • A canonical DAG spec that talks about nodes and edges, deterministic ordering, and snapshot hashing.

  • A dependencies layer that stores edges like task -> [depends_on...] and computes blocked vs unblocked tasks.

Practical translation: this is already “graph traversal to decide what can happen next.” The “choice” is basically “which runnable node do we execute now,” constrained by the dependency graph plus whatever scheduling policy you wrap around it.

If the other guy’s system is “activation spreads, attention shifts to the active cluster,” your system is “dependencies satisfied, runnable set updated, scheduler picks next.” Same skeleton, yours is cleaner and more operational.

2) Orchestration Routing is a decision layer that chooses paths (policy driven)

Your docs/ROUTER.md describes a routing module that:

  • Routes by intent and trust tier

  • Supports shadow routing and canary routing

  • Uses live registries (model registry, tool registry)

This is not a knowledge graph, but it is “structured choice based on logged signals + policy constraints.” It is closer to a controller that chooses which subsystem to use next.

3) Query Planner is another “choice system,” but over indexes not cognition

In ai_persistence/service/query_planner.py you’re doing:

  • planning and executing a query by building candidate sets from secondary indexes

  • intersecting sets

  • applying boosts to ranking

This is “graph-ish” only in the sense it is selecting a plan and combining results, but it is still a real decision engine.

My call on “the one that already does it”

The closest match to “graph based choices” in the sense of nodes, edges, traversal, and deciding next steps is:

  • plan-orchestrator-034 (DAG planning, dependency edges, runnable selection)

The HuggingFace guy is mostly wrapping that same idea in cognitive terms (attention, self-state modulation, DMN vs CEN knob).