Research Papers in Vector Search & AI Systems

Field of application: Vector Databases, Spectral Methods, and Agentic AI

A curated collection of foundational and emerging papers that inform the design and implementation of arrowspace, optical compression, and next-generation retrieval systems.


Graph Embeddings

Ontology Embedding: A Survey of Methods, Applications and Resources

Authors: Jiaoyan Chen, Olga Mashkova, Ernesto Jiménez-Ruiz, Ian Horrocks, Diego M. López, Przemyslaw Andrzej Nowak, et al. arXiv: 2406.10964 (2024), accepted to IEEE TKDE

Comprehensive survey of ontology embedding, covering formal definitions, method categories, resources, and applications across ontology engineering, machine learning augmentation, and life sciences, consolidating works from AI and bioinformatics venues. This connects to arrowspace by framing how logical structure can be embedded into vector spaces to complement spectral and graph-based similarity in hybrid search.

Key contributions: Taxonomy of ontology embedding approaches, resource catalog, application landscape, and challenges/future directions in integrating symbolic semantics with embeddings.

The RDF2vec Family of Knowledge Graph Embedding Methods

Authors: Petar Ristoski, Simone Paolo Ponzetto, Heiko Paulheim (and collaborators across variants) Journal: Semantic Web – Interoperability, Usability, Applicability (SWJ), “The RDF2vec Family of Knowledge Graph Embedding Methods”

In-depth study of RDF2vec variants that generate embeddings from random walks over RDF graphs, with a comprehensive evaluation revealing representational strengths and weaknesses relative to other KG embedding methods. For arrowspace, this informs walk-based feature extraction that can be blended with Laplacian-based distances for structure- and path-aware retrieval.

Key contributions: Unified overview of RDF2vec techniques, large-scale comparative evaluation, and practical guidance for selecting variants by task characteristics.


Graph Signal Processing & Spectral Methods

The Emerging Field of Signal Processing on Graphs

Authors: David I Shuman, Sunil K. Narang, Pascal Frossard, Antonio Ortega, Pierre Vandergheynst
arXiv: 1211.0053 (2012)

Foundational work extending classical signal processing operations to graph-structured data. This paper establishes the theoretical framework for spectral graph analysis used in arrowspace’s energy-distance metrics and Laplacian-based search.

Key contributions: Graph Fourier Transform, spectral filtering, and multi-resolution analysis on irregular graph domains.

What Is Positive Geometry?

Authors: Kristian Ranestad, Bernd Sturmfels, Simon Telen
arXiv: 2502.12815 (2025)

Foundational introduction to positive geometry—an interdisciplinary field bridging particle physics, cosmology, and algebraic geometry. Positive geometries are tuples \((X, X_{\geq 0}, \Omega(X_{\geq 0}))\) consisting of a complex algebraic variety, a semi-algebraic positive region, and a canonical differential form satisfying recursive axioms. The framework represents physical observables (scattering amplitudes, cosmological correlators) as geometric structures like amplituhedra and cosmological polytopes.

Relevance to arrowspace: The canonical form construction—recovering volume integrals from positive regions via \(\Omega(P) = \text{vol}(P-x)^\circ dx\)—directly parallels arrowspace’s energy map pipeline. Just as positive geometry “linearizes” high-dimensional semi-algebraic varieties into canonical differential forms, arrowspace’s energymaps.rs constructs a graph Laplacian over the data manifold and projects it onto a 1-dimensional taumode spectrum (Rayleigh quotients). Both frameworks encode complex geometric structures (amplituhedra / energy graphs) as scalar fields that preserve topological invariants while enabling efficient computation.

Key contributions:

Full PDF


Agentic Systems & Planning

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

Authors: ZeroRepo Team
arXiv: 2509.16198 (2025)

Introduces the Repository Planning Graph (RPG), a graph-driven framework for generating complete software repositories. Relevant to formal agent protocols and structured generation workflows.

Key contributions: Persistent graph representations unifying proposal- and implementation-level planning for autonomous code generation.


Retrieval Fundamentals & Limitations

On the Theoretical Limitations of Embedding-Based Retrieval

Authors: Orion Weller et al. (Google DeepMind)
arXiv: 2508.21038 (2025)

Theoretical analysis proving fundamental limitations of single-vector embeddings for complex retrieval tasks. Introduces the LIMIT benchmark to expose failure modes in cosine-similarity-based retrieval.

Key contributions: Sign-rank bounds on embedding expressiveness, motivating energy-distance and spectral approaches beyond cosine similarity.

Paper PDF


Document Reranking

jina-reranker-v3: Last but Not Late Interaction for Document Reranking

Authors: Feng Wang, Yuqing Li, Han Xiao
arXiv: 2509.25085v2 (2025)

State-of-the-art 0.6B parameter multilingual document reranker achieving 61.94 nDCG@10 on BEIR. Demonstrates lightweight alternatives to generative listwise reranking.

Key contributions: Late-interaction architecture for efficient cross-encoder reranking with strong BEIR performance.


Context Compression & Recursive Models

Recursive Language Models

Authors: Alex Zhang, Omar Khattab (MIT CSAIL)
Paper: alexzhang13.github.io/blog/2025/rlm

Proposes Recursive Language Models (RLMs), where models recursively call themselves to decompose and interact with unbounded context. RLM with GPT-4-mini outperforms full GPT-4 by 87% on long-context benchmarks.

Key contributions: Divide-and-conquer strategy for handling 10M+ token contexts without performance degradation, mitigating “context rot.”


Quantum Computing & Hybrid Systems

Mind the Gaps: The Fraught Road to Quantum Advantage

Authors: Jens Eisert, John Preskill
arXiv: 2510.19928 (2025)

Perspectives on the transition from noisy intermediate-scale quantum (NISQ) devices to fault-tolerant application-scale quantum computing. Identifies four key hurdles including error mitigation, scalable fault tolerance, and verifiable algorithms.

Relevance: Explores hybrid classical-quantum systems for optimization and simulation tasks relevant to graph algorithms and energy minimization.


Computational Methods in Software Engineering

Vulnerability2Vec: A Graph-Embedding Approach for Enhancing Vulnerability Classification

Source: Tech Science Press - CMES 2025

Vulnerability2Vec converts Common Vulnerabilities and Exposures (CVE) text explanations to semantic graphs.

Key contributions: Security vulnerability; graph representation; graph-embedding; deep learning; node classification.

Full PDF


Implementation Resources

For practical implementations informed by these papers:


Interested in research collaboration or sponsorship? Check the Contact page to discuss how these methods can accelerate your data infrastructure.