Showing 33 papers for 2026-05-20
This paper presents a knowledge-based approach to automated data quality assessment for big data, leveraging knowledge graph embeddings. It predicts missing edges between the input dataset's context representation and relevant quality rules and dimensions within a contextual knowledge graph of data characteristics and quality operations. The goal is to achieve context-aware, automated data quality evaluation, addressing limitations of prior solutions.
The paper proposes a graph-driven real-time anti-money laundering monitoring framework for cross-industry integrated mobility-energy supply chain networks. It builds a cross-industry heterogeneous graph (CIHG) spanning EV rental platforms, energy suppliers, fintechs, etc., integrating industry semantics for real-time analysis. The framework uses graph analytics to detect suspicious money flows and enable cross-domain risk assessment.
Graph condensation for scalable GNNs faces a fundamental contradiction with gradient matching, which often requires training on the full dataset. This work argues for moving beyond full-dataset training and model-dependence, proposing a reset-based or alternative condensation paradigm. The goal is to enable scalable graph condensation without relying on full data or specific models.
HypergraphFormer introduces a novel and efficient approach to floor plan generation by learning hypergraph representations with a large language model (LLM). Trained via supervised fine-tuning to encode spatial relationships and connectivity in a hypergraph-based textual representation, it is evaluated on the RPLAN dataset and a held-out distribution dataset. The method demonstrates generalizability for editable floor plan generation.
Deep Neural Sheaf Diffusion analyzes the depth-related limitations of deep GNNs, where stacking layers leads to representation collapse and diminishing sensitivity. Although Neural Sheaf Diffusion offers theoretical guarantees, these do not translate well in practice as the disagreement signal vanishes with depth. The paper identifies the mechanism and proposes remedies to sustain expressive depth.
This work introduces ELGIN, a physics-informed graph neural network surrogate to predict turbulent nanoparticle dispersion in dental clinical environments. It couples Eulerian-Lagrangian particle tracking with graph-based surrogates to forecast carrier-flow dynamics, enabling real-time clinical decision support. The surrogate substantially speeds up simulations while preserving accuracy.
B-cos GNNs are an inherently explainable GNN family where predictions decompose exactly into per-node, per-feature contributions via a dynamic input-dependent linear map. They use linear aggregation and replace non-linear message/update functions with B-cos transforms, enabling direct interpretation. Instance-level explanations come from a single forward and backward pass without extra modules.
ST-TGExplainer tackles temporal GNN interpretability by disentangling stability and transition patterns. While existing methods emphasize stability—historical interactions—this work also captures transition patterns, i.e., newly emerging interactions, that influence predictions. The approach improves interpretability by considering both pattern types.
HCLBind is a self-supervised framework that decouples geometric representation learning from affinity regression for multi-domain protein-ligand binding prediction. It uses hierarchical contrastive learning with a general-to-specific pre-training pipeline on the Q-BioLiP database, improving robustness to inter-domain dynamics.
SCAFDS presents edge-feature graph attention for interbank fraud detection and generates attribution-grounded SARs. It models fraud propagation across interbank networks using edge features and provides forensic-anchored narratives that trace outputs to specific detections, enabling regulatory auditability.
GOAL introduces a graph-based diffusion solver for dynamic multi-objective optimization, enabling controllable decision generations by conditioning on user-defined objectives. It uses a heterogeneous graph encoding with multiple edge types to define constraint classes that guide message passing. The framework extends neural combinatorial optimization to dynamic, multi-objective settings.
The study tests whether better realized volatility forecasts from graph neural networks translate into improved portfolio performance. Using weekly realized volatility for 465 S&P 500 stocks and multiple graph constructions, they compare GNN-based forecasts to AR/LSTM baselines, with and without macro regime features. The results show mixed evidence: forecast accuracy does not always guarantee better portfolio outcomes.
DAG-DC-ADMM provides a unified framework for structure-aware clustering and heterogeneous causal graph learning across subjects, built on Structural Equation Modeling. It jointly learns DAG-based dependency structures and clusters to capture subpopulation-specific dependencies, solved via alternating direction method of multipliers. This addresses structural heterogeneity in complex systems.
This paper analyzes inferring sensitive attributes from knowledge graph embeddings, presenting attack and defense strategies. It highlights privacy risks when KG embeddings enable inferences about users, and discusses defensive approaches to mitigate leakage and protect sensitive information.
The survey examines GNN-based community detection within the graph signal analysis framework, reviewing architectures and results for clustering on large, high-dimensional graphs. It discusses how GNNs learn embeddings that enable effective community discovery in graph signal processing contexts.
GraphPINE introduces a GNN for drug response prediction that uses domain-specific prior knowledge to initialize node importance, which is refined during training to guide explainability by encoding known relationships among predictive features. The approach aims to improve interpretability without sacrificing performance.
This work proposes warm-starting dual active-set QP solvers with a GNN that predicts the active constraint subset, represented as bipartite graphs. The learned active set speeds up solving in real-time control and optimization across varying problem sizes.
GESC introduces Gauge-Equivariant Graph Networks with Self-Interference Cancellation, replacing additive aggregation with a projection-based interference cancellation to mitigate self-interference in gauge-equivariant networks. This improves robustness on heterophilic graphs where traditional GNNs struggle.
DynaSTy offers a spatio-temporal node attribute prediction framework for dynamic graphs using a transformer-based model that ingests node attribute time series and a time-varying adjacency matrix to forecast multiple future steps. It emphasizes end-to-end modeling of evolving graphs with an edge-biased temporal mechanism.
S2Aligner presents pair-efficient and transferable pre-training for sparse text-attributed graphs, addressing the challenge of weak supervision. It moves beyond relying solely on abundant textual anchors by learning to align graph and text representations under sparse supervision, improving transferability of TAG-based models.
This paper introduces the DAG Convolutional Network (DCN), a graph neural network architecture built for signals on directed acyclic graphs. It uses causal graph filters to respect the partial order of DAGs, providing a strong inductive bias that standard GNNs lack, and it demonstrates improved performance on DAG-structured learning tasks.
We propose NERVE, a network-aware bilinear tokenization for resting-state brain functional connectivity representation learning. By tokenizing FC matrices in a way that aligns with the brain's modular organization and using bilinear pooling to capture inter-module interactions, NERVE improves self-supervised representations over region-centric or homogeneous-token approaches.
Discoverable Agent Knowledge proposes a formal framework for agentic KG affordances, bridging how agents with different ontologies can discover, reason about, and invoke web services via knowledge graphs. It extends service description paradigms like OWL-S and WSMO to specify what an agent can do, what it must know to invoke a service, and how ontological mismatches can be formally bridged.
We introduce Conflict-Resilient Multi-Agent Reasoning via Signed Graph Modeling, addressing the fragility of LLM-based multi-agent systems under conflicting signals. By modeling agents and their interactions with signed graphs, the approach enables conflict-aware aggregation and improves robustness and reasoning quality when disagreements occur.
BLINKG provides a benchmark for evaluating LLM-integrated knowledge graph generation. It targets the costly alignment between input sources and ontology terms and offers tasks and metrics to assess how well LLMs can generate semantically coherent KGs from heterogeneous data.
Query-Conditioned Graph Retrieval for Contextualized LLM Reasoning in Personalized Wearable Data (WAG) introduces a graph-based context retrieval framework that organizes wearable metrics and user signals into a personalized graph. The retrieval is conditioned on the user query to enable LLMs to reason with context that is both relevant and scalable.
STAR delivers a semantic-tuned and tail-adaptive retriever for graph-augmented generation to cope with sparse graph semantics. It mitigates semantic shortcut bias and long-tail path bias by adjusting retrieval to semantic richness and tail-case information, improving GraphRAG performance.
Agentic GraphRAG presents a collaborative AI framework for analyzing unstructured financial data with a knowledge-graph-based approach. It builds a Neo4j graph from structured registry data and large volumes of unstructured legal text in a three-phase pipeline, enabling deterministic ingestion of strong nodes and cohesive analytics.
Query-Aware Flow Diffusion for Graph-Based RAG with Retrieval Guarantees proposes QAFD-RAG, a retrieval-augmented generation method that uses a flow-diffusion process guided by the query. The approach provides theoretical guarantees on subgraph quality and relevance, improving retrieval quality for multi-hop reasoning.
ContextRAG offers an extraction-free hierarchical graph construction for retrieval-augmented generation. It builds a fuzzy concept graph over chunk embeddings using residual-quantization k-means and formal concept analysis, bypassing LLM-based extraction to enable scalable indexing.
TERGAD introduces Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection, integrating textual content with structural context to detect anomalies that arise from misalignment between a node's content and its topological role. The approach yields richer representations and stronger detection performance.
GraphInstruct provides a progressive benchmark for diagnosing capability gaps in LLM-based graph generation. It evaluates LLMs across graph types and task domains to reveal instruction-following limitations and guide future improvements.
Toward Robust GraphRAG investigates retrieval drift and hallucination caused by imperfect knowledge graphs in GraphRAG systems. It identifies spurious noise and incomplete information as recurring issue modes and proposes robust retrieval strategies to reduce hallucination and improve evidence consistency.