Safer LLMs require open search - Building the AI Memory Layer
TLDR; arrowspace reframes AI retrieval as an open, topologyâaware search problem so that systems can keep the benefits of intuitive patternâfinding while structurally detecting and constraining subtle hallucinations.
arrowspace v0.24.3 is out with improvements.
You can find arrowspace in the:
- Rust repository âŞď¸
cargo add arrowspace - and Python repository âŞď¸
pip install arrowspace
I am building a datastore based on arrowspace, follow me for news.
Intro
arrowspace was designed to go beyond treating embedding spaces as just geometric space; every dataset hides also a graph with energy flowing over the manifold of features; focusing only on the connections of its elements gives a partial perspective. This same shift â from closed, purely geometric similarity to open, energy-informed topology â is exactly what is needed to tackle subtle hallucinations without giving up on useful intuitions.
When retrieval becomes a hallucination engine
A relevant fraction of LLM hallucinations does not come from bad generation steps but from the slow accumulation of errors in retrieval and semantic caching. Recent work shows that as you incrementally inject context with some relevant, slightly off information LLM internal representations drift in a consistent direction until the model âlocks inâ to an incorrect answer that still looks well supported by the surrounding text. Other studies on multiâturn systems call this context drift: a gradual erosion of the original intent where summaries and followâup answers stay fluent while silently drifting away from the userâs goal.123
Semantic caches and naive RAG pipelines amplify this effect. If the cache is tuned only on geometric similarity, even small threshold mistakes can produce extremely high falseâpositive rates, to the point where almost every cache hit is wrong . Systematic analyses of RAG failures show that retrieval errors and domainâmismatched hits are a primary driver of hallucinations, especially when the model confidently stitches together partially wrong snippets into a coherent narrative.456
Intuitions, subtle hallucinations, and safety
My 2022 paper âCybernetics Interfaces and Networks: intuitions as a toolboxâ reflects on intuitions in tools usage: intuitions are pattern proposals that are valuable exactly because they are not yet commitments. In an LLM context, the danger appears when those intuitive jumps are silently promoted to âfactsâ by the surrounding retrieval context and ranking logic, instead of being flagged and interrogated as hypotheses. How to skim these patterns when every search focuses on the geometrical relations of documents and concepts?
According to common consensus, the safety issue is not the easily debunked mistake but the subtle plausible error that passes through ranking, caching, and user interfaces without any visible sign that the system has left the reliable part of its knowledge graph.1
Why geometryâonly search is brittle
Todayâs vector databases mostly operate as geometric engines: each embedding is a point, similarity is a distance (cosine, dot product, L2), and the system returns the nearest neighbors. This setup has no explicit notion of manifold structure, communities, or how information flows across the dataset; it only knows that points happen to be close in a highâdimensional space, which is precisely where semantic caches accumulate false positives under domain shift and noisy inputs . The selection between intuition and hallucinatory pathways is stochastic, so the necessity of redundancy in retrieval to keep the drifting from happening.
Pure geometry also has no language to talk about drift or coherence of a retrieved set. Evaluation metrics like recall@k treat each item independently and ignore whether the topâk results form a tight, wellâconnected region in the corpus graph or a scattered, unstable subgraph that signals emerging hallucination risk. In other words, the current search layer is blind to exactly the structural signals that could tell us the characteristics of intuitions compared to subtle hallucinations.
Arrowspace as âopen searchâ
Arrowspace starts from the opposite assumption: any vector dataset is a graph. Vectors define local edges (via similarity), features induce topology over those edges, and from that arrowspace builds a graph Laplacian whose eigenvectors and eigenvalues encode both global topology and local geometry. An enedrgy dispersion network is built on the features graph to simulate semantic coherence (calling geometric-only search based on datapoints distance as âsemanticâ is indeed a subtle hallucination in my opinion as there is no meaning involved, but thatâs a philosophical non-actionable point).
From the Laplacian arrowspace derives eigenmaps and energymaps â indices that capture how smooth or rough signals are over the graph and how energy disperses across communities. The taumode scorer then blends classic semantic similarity with Rayleighâquotient energy and dispersion statistics into bounded, comparable scores, so that the datapoints are âclose in the the manifoldâ. The embeddings space becomes more representative and actual semantic data (metadata, the data that represents the relations among the properties of the datapoints, as in âSemantic Webâ or âWeb 2.0â data).
This combination is what I mean for open search here:
- The index is not just a blackâbox distance function; it exposes topological coordinates, energy levels, and community structure that can be inspected, logged, and audited.
- Search is not restricted to a local metric ball; you can reason about trajectories, flows, and motifs in the result subgraph, and compare snapshots over time to detect drift.
- The outcome is an open space to search for intuitions, not a closed space prone to drifting and self-replication. Energy maps based on topology embeds spotting faulting pathways.
Spotting subtle hallucinations in the topology
Once retrieval lives in a topological, energyâaware space, subtle hallucinations become detectable as structural anomalies (pathways with excess smoothness or roughness) rather than just content mistakes. If a query has historically pulled a cluster with low conductance, high modularity, and stable spectral signatures, a new batch of results that is geometrically similar but spectrally incoherent can be flagged and removed from the context.
arrowspaceâs evaluation work already leans into this idea. The proposed MRRâTop0 metric extends classic MRR by weighting reciprocal ranks with topology factors derived from personalized PageRank, conductance, and modularity, so a ranking that looks fine geometrically but sits in a noisy, poorly connected subgraph scores worse. The same machinery can run online as a guardrail: if the current contextâs retrieval set has low topology communities, high energy dispersion where past queries were smooth, the system can downgrade trust, widen the search, or explicitly mark the answer as âspeculativeâ. This also allows pre-computing pathways of search that follow or not the speculative assumptions, these pathways can themselves be represented as ArrowSpaces.
Keeping the intuitions, losing the risk
Intuitions are exactly the kind of pattern jumps that live off the main geometric manifold: weak signals, longâtail neighborhoods, surprising but potentially valuable connections. arrowspaceâs energyâinformed search is designed to surface these by following dispersion patterns and spectral signatures, not just local distances, so models can discover associations traditional cosine search never sees. This is leads naturally to the extension of the manifold via selected pathways.
Hypotheticall it is possible to treat these offâmanifold results as a different âmodesâ of search. In practice, this means:
- Maintaining two simultaneous views for a query: a highâconfidence, lowâenergy, topologically coherent core, and an exploratory, higherâenergy fringe where intuitions live.
- Using topologyâaware scores (like
MRRâTop0and related subgraph quality indices) to gate what enters the âauthoritativeâ context fed to the LLM, while still exposing intuitive candidates as marked hypotheses the model can reason about explicitly.
Takeaways
Under this lens, âmaking LLMs good at patterns beyond geometryâ and âspotting subtle hallucinations earlyâ are just two sides of the same safety coin. The same arrowspace machinery that lets you follow energy dispersion into interesting, nonâlocal neighborhoods is also what tells you when the system is leaving the safe manifold and when an apparently reasonable intuition should be treated as a suspect hallucination.
- Hallucinations often arise from accumulated retrieval and semanticâcache errors: geometrically close but structurally wrong documents slowly pollute context until fluent but unreliable answers emerge.
- Geometryâonly vector search cannot see this drift because it ignores the datasetâs manifold and motives-subgraphs structure, so it has no way to score whether a retrieved set is coherent or topologically unstable.
arrowspaceadds a graph Laplacian and energyâinformed spectral indexing (taumode) on top of embeddings, letting search reason about smoothness, dispersion, and subgraph qualityânot just distanceâso it can flag risky, offâmanifold neighborhoods where subtle hallucinations live.- This same machinery supports a dual mode of operation: a stable, lowâenergy core for factual grounding and a clearly marked, higherâenergy fringe for intuitive exploration, preserving creativity while making its risks spottable.
References
-
https://arxiv.org/html/2510.07777v1Â ↩
-
https://www.nature.com/articles/s41598-025-15203-5Â ↩
-
https://www.tredence.com/blog/mitigating-hallucination-in-large-language-models ↩
-
https://www.scalegen.ai/blog-v1.0/addressing-llm-hallucination-with-rag-innovative-solutions ↩
-
https://arxiv.org/html/2507.18910v1Â ↩