1. GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation
Authors: Hang Yin, Haoyu Wei, Xiuwei Xu, Wenxuan Guo, Jie Zhou, Jiwen Lu β’
Published: 2025-09-12 β’
Source: arXiv
In this paper, we propose a training-free framework for vision-and-language navigation (VLN). Existing zero-shot VLN methods are mainly designed for discrete environments or involve unsupervised training in continuous simulator environments, which makes it challenging to generalize and deploy them in real-world scenarios. To achieve a training-free framework in continuous environments, our framework formulates navigation guidance as graph constraint optimization by decomposing instructions into explicit spatial constraints. The constraint-driven paradigm decodes spatial semantics through constraint solving, enabling zero-shot adaptation to unseen environments. Specifically, we construct a spatial constraint library covering all types of spatial relationship mentioned in VLN instructions. The human instruction is decomposed into a directed acyclic graph, with waypoint nodes, object nodes and edges, which are used as queries to retrieve the library to build the graph constraints. The graph constraint optimization is solved by the constraint solver to determine the positions of waypoints, obtaining the robot's navigation path and final goal. To handle cases of no solution or multiple solutions, we construct a navigation tree and the backtracking mechanism. Extensive experiments on standard benchmarks demonstrate significant improvements in success rate and navigation efficiency compared to state-of-the-art zero-shot VLN methods. We further conduct real-world experiments to show that our framework can effectively generalize to new environments and instruction sets, paving the way for a more robust and autonomous navigation framework.
2. The CHARA Array Polarization Model and Prospects for Spectropolarimetry
Authors: Linling Shuai, John D. Monnier, Benjamin R. Setterholm, Stefan Kraus, Narsireddy Anugu, Tyler Gardner, Jean-Baptiste Le Bouquin, Gail H. Schaefer β’
Published: 2025-09-12 β’
Source: arXiv
Polarimetric data provide key insights into infrared emission mechanisms in the inner disks of YSOs and the details of dust formation around AGB stars. While polarization measurements are well-established in radio interferometry, they remain challenging at visible and near-infrared due to the significant time-variable birefringence introduced by the complex optical beamtrain. In this study, we characterize instrumental polarization effects within the optical path of the CHARA Array, focusing on the H-band MIRC-X and K-band MYSTIC beam combiners. Using Jones matrix formalism, we developed a comprehensive model describing diattenuation and retardance across the array. By applying this model to an unpolarized calibrator, we derived the instrumental parameters for both MIRC-X and MYSTIC. Our results show differential diattenuation consistent with >= 97% reflectivity per aluminum-coated surface at 45 deg incidence. The differential retardance exhibits small wavelength-dependent variations, in some cases larger than we expected. Notably, telescope W2 exhibits a significantly larger phase shift in the Coude path, attributable to a fixed aluminum mirror (M4) used in place of deformable mirrors present on the other telescopes during the observing run. We also identify misalignments in the LiNbO_3 birefringent compensator plates on S1 (MIRC-X) and W2 (MYSTIC). After correcting for night-to-night offsets, we achieve calibration accuracies of $\pm$ 3.4% in visibility ratio and $\pm$ 1.4 deg in differential phase for MIRC-X, and $\pm$ 5.9% and $\pm$ 2.4 deg, respectively, for MYSTIC. Given that the differential intrinsic polarization of spatially resolved sources, such as AGB stars and YSOs, typically greater than these instrumental uncertainties, our results demonstrate that CHARA is now capable of achieving high-accuracy measurements of intrinsic polarization in astrophysical targets.
3. MatSKRAFT: A framework for large-scale materials knowledge extraction from scientific tables
Authors: Kausik Hira, Mohd Zaki, Mausam, N. M. Anoop Krishnan β’
Published: 2025-09-12 β’
Source: arXiv
Scientific progress increasingly depends on synthesizing knowledge across vast literature, yet most experimental data remains trapped in semi-structured formats that resist systematic extraction and analysis. Here, we present MatSKRAFT, a computational framework that automatically extracts and integrates materials science knowledge from tabular data at unprecedented scale. Our approach transforms tables into graph-based representations processed by constraint-driven GNNs that encode scientific principles directly into model architecture. MatSKRAFT significantly outperforms state-of-the-art large language models, achieving F1 scores of 88.68 for property extraction and 71.35 for composition extraction, while processing data $19$-$496\times$ faster than them (compared to the slowest and the fastest models, respectively) with modest hardware requirements. Applied to nearly 69,000 tables from more than 47,000 research publications, we construct a comprehensive database containing over 535,000 entries, including 104,000 compositions that expand coverage beyond major existing databases, pending manual validation. This systematic approach reveals previously overlooked materials with distinct property combinations and enables data-driven discovery of composition-property relationships forming the cornerstone of materials and scientific discovery.
4. DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
Authors: Rui Lu, Zhenyu Hou, Zihan Wang, Hanchen Zhang, Xiao Liu, Yujiang Li, Shi Feng, Jie Tang, Yuxiao Dong β’
Published: 2025-09-12 β’
Source: arXiv
Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supervised data. To address these challenges, we present DeepDive to advance deep search agents. First, we propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs. Second, we apply end-to-end multi-turn reinforcement learning (RL) to enhance LLMs' long-horizon reasoning with deep search. Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. We demonstrate that multi-turn RL training improves deep search ability and significantly contributes to the performance improvements across multiple benchmarks. We observe that DeepDive enables test-time scaling of tool calls and parallel sampling. All datasets, models, and code are publicly available at https://github.com/THUDM/DeepDive.
5. A study of ferronematic thin films including a stray field energy
Authors: Shilpa Dutta, James Dalby, Apala Majumdar, Anja SchlΓΆmerkemper β’
Published: 2025-09-12 β’
Source: arXiv
Ferronematic materials are colloidal suspensions of magnetic particles in liquid crystals. They are complex materials with potential applications in display technologies, sensors, microfluidics devices, etc. We consider a model for ferronematics in a 2D domain with a variational approach. The proposed free energy of the ferronematic system depends on the Landau-de Gennes (LdG) order parameter $\mathbf{Q}$ and the magnetization $\mathbf{M}$, and incorporates the complex interaction between the liquid crystal molecules and the magnetic particles in the presence of an external magnetic field $\mathbf{H}_{ext}$. The energy functional combines the Landau-de Gennes nematic energy density and energy densities from the theory of micromagnetics including (an approximation of) the stray field energy and energetic contributions from an external magnetic field. For the proposed ferronematic energy, we first prove the existence of an energy minimizer and then the uniqueness of the minimizer in certain parameter regimes. Secondly, we numerically compute stable ferronematic equilibria by solving the gradient flow equations associated with the proposed ferronematic energy. The numerical results show that the stray field influences the localization of the interior nematic defects and magnetic vortices.
6. InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Authors: Tao Han, Wanghan Xu, Junchao Gong, Xiaoyu Yue, Song Guo, Luping Zhou, Lei Bai β’
Published: 2025-09-12 β’
Source: arXiv
Arbitrary resolution image generation provides a consistent visual experience across devices, having extensive applications for producers and consumers. Current diffusion models increase computational demand quadratically with resolution, causing 4K image generation delays over 100 seconds. To solve this, we explore the second generation upon the latent diffusion models, where the fixed latent generated by diffusion models is regarded as the content representation and we propose to decode arbitrary resolution images with a compact generated latent using a one-step generator. Thus, we present the \textbf{InfGen}, replacing the VAE decoder with the new generator, for generating images at any resolution from a fixed-size latent without retraining the diffusion models, which simplifies the process, reducing computational complexity and can be applied to any model using the same latent space. Experiments show InfGen is capable of improving many models into the arbitrary high-resolution era while cutting 4K image generation time to under 10 seconds.
7. Viewing heat through ice: an infrared camera monitors hydrogel freezing and thawing during cryoapplication
Authors: Gennadiy O. Kovalov, Mykola O. Chyzh, Vyacheslav Yu. Globa, Oleksandr F. Todrin, Galyna V. Shustakova, Eduard Yu. Gordiyenko, Yuliya V. Fomenko, Oleh V. Ivakhnenko, Polina O. Kofman, Sergey N. Shevchenko β’
Published: 2025-09-12 β’
Source: arXiv
Cryosurgery employs a safe and relatively simple technique of exposure and is an advantageous and highly rated method. For its effective application, it is necessary to control both the volume of the expanding freezing zone and volumetric thermal field dynamics. The aim of this study was to perform a thermal imaging study of freezing and thawing in a model system (gel phantom) to predict the dynamics of the freezing zone during cryodestruction of biological tissues in vivo. Here, the thermal imager is an effective tool for demonstrating the surface temperature distribution. We have studied how the observed infrared image relates to the distribution and change of the thermal field in depth. For this purpose, we created test measuring equipment for simultaneous analysis of the dynamics of thermal fields on the surface, video recording of freezing and thawing on the surface as well as in the depth of the gel phantom, measuring the temperature at any given point in the depth and modeling in the zone of low-temperature exposure of vessels with different blood flow parameters. It was revealed that with a modeled vessel in the low-temperature exposure zone, the surface thermal fields deformed and they gained the shape of butterfly wings. Our experimental study in a gel phantom is supported by numerical calculations, demonstrating how the freezing zone and thermal isotherms on the surface and in depth evolve under real conditions, thereby providing a basis for assessing the cryoeffect time and intensity in practice. Key words: cryoapplication; freezing; thawing; temperature field dynamics; infrared thermography; gel phantom; testing measuring equipment; vessel simulation.
8. Robust Localization in Modern Cellular Networks using Global Map Features
Authors: Junshi Chen, Xuhong Li, Russ Whiton, Erik Leitinger, Fredrik Tufvesson β’
Published: 2025-09-12 β’
Source: arXiv
Radio frequency (RF) signal-based localization using modern cellular networks has emerged as a promising solution to accurately locate objects in challenging environments. One of the most promising solutions for situations involving obstructed-line-of-sight (OLoS) and multipath propagation is multipathbased simultaneous localization and mapping (MP-SLAM) that employs map features (MFs), such as virtual anchors. This paper presents an extended MP-SLAM method that is augmented with a global map feature (GMF) repository. This repository stores consistent MFs of high quality that are collected during prior traversals. We integrate these GMFs back into the MP-SLAM framework via a probability hypothesis density (PHD) filter, which propagates GMF intensity functions over time. Extensive simulations, together with a challenging real-world experiment using LTE RF signals in a dense urban scenario with severe multipath propagation and inter-cell interference, demonstrate that our framework achieves robust and accurate localization, thereby showcasing its effectiveness in realistic modern cellular networks such as 5G or future 6G networks. It outperforms conventional proprioceptive sensor-based localization and conventional MP-SLAM methods, and achieves reliable localization even under adverse signal conditions.
9. Human Body Segment Volume Estimation with Two RGB-D Cameras
Authors: Giulia Bassani, Emilio Maoddi, Usman Asghar, Carlo Alberto Avizzano, Alessandro Filippeschi β’
Published: 2025-09-12 β’
Source: arXiv
In the field of human biometry, accurately estimating the volume of the whole body and its individual segments is of fundamental importance. Such measurements support a wide range of applications that include assessing health, optimizing ergonomic design, and customizing biomechanical models. In this work, we presented a Body Segment Volume Estimation (BSV) system to automatically compute whole-body and segment volumes using only two RGB-D cameras, thus limiting the system complexity. However, to maintain the accuracy comparable to 3D laser scanners, we enhanced the As-Rigid-As-Possible (ARAP) non-rigid registration techniques, disconnecting its energy from the single triangle mesh. Thus, we improved the geometrical coherence of the reconstructed mesh, especially in the lateral gap areas. We evaluated BSV starting from the RGB-D camera performances, through the results obtained with FAUST dataset human body models, and comparing with a state-of-the-art work, up to real acquisitions. It showed superior ability in accurately estimating human body volumes, and it allows evaluating volume ratios between proximal and distal body segments, which are useful indices in many clinical applications.
10. Bitcoin Cross-Chain Bridge: A Taxonomy and Its Promise in Artificial Intelligence of Things
Authors: Guojun Tang, Carylyne Chan, Ning Nan, Spencer Yang, Jiayu Zhou, Henry Leung, Mohammad Mamun, Steve Drew β’
Published: 2025-09-12 β’
Source: arXiv
Bitcoin's limited scripting capabilities and lack of native interoperability mechanisms have constrained its integration into the broader blockchain ecosystem, especially decentralized finance (DeFi) and multi-chain applications. This paper presents a comprehensive taxonomy of Bitcoin cross-chain bridge protocols, systematically analyzing their trust assumptions, performance characteristics, and applicability to the Artificial Intelligence of Things (AIoT) scenarios. We categorize bridge designs into three main types: naive token swapping, pegged-asset bridges, and arbitrary-message bridges. Each category is evaluated across key metrics such as trust model, latency, capital efficiency, and DeFi composability. Emerging innovations like BitVM and recursive sidechains are highlighted for their potential to enable secure, scalable, and programmable Bitcoin interoperability. Furthermore, we explore practical use cases of cross-chain bridges in AIoT applications, including decentralized energy trading, healthcare data integration, and supply chain automation. This taxonomy provides a foundational framework for researchers and practitioners seeking to design secure and efficient cross-chain infrastructures in AIoT systems.
11. Knotted DNA Configurations in Bacteriophage Capsids: A Liquid Crystal Theory Approach
Authors: Pei Liu, Zhijie Wang, Tamara Christiani, Mariel Vazquez, M. Carme Calderer, Javier Arsuaga β’
Published: 2025-09-12 β’
Source: arXiv
Bacteriophages, viruses that infect bacteria, store their micron long DNA inside an icosahedral capsid with a typical diameter of 40 nm to 100 nm. Consistent with experimental observations, such confinement conditions induce an arrangement of DNA that corresponds to a hexagonal chromonic liquid-crystalline phase, and increase the topological complexity of the genome in the form of knots. A mathematical model that implements a chromonic liquid-crystalline phase and that captures the changes in topology has been lacking. We adopt a mathematical model that represents the viral DNA as a pair of a vector field and a line. The vector field is a minimizer of the total Oseen-Frank energy for nematic liquid crystals under chromonic constraints, while the line is identified with the tangent to the field at selected locations, representing the central axis of the DNA molecule. The fact that the Oseen-Frank functional assigns infinite energy to topological defects (point defects in two dimensions and line defects in three dimensions) precludes the presence of singularities and, in particular, of knot structures. To address this issue, we begin with the optimal vector field and helical line, and propose a new algorithm to introduce knots through stochastic perturbations associated with splay and twist deformations, modeled by means of a Langevin system. We conclude by comparing knot distributions generated by the model and by interpreting them in the context of previously published experimental results. Altogether, this work relies on the synergy of modeling, analysis and computation in the study of viral DNA organization in capsids.
12. Gromov hyperbolicity III: improved geometric characterization in Euclidean spaces and beyond
Authors: Chang-Yu Guo, Manzi Huang, Xiantao Wang β’
Published: 2025-09-12 β’
Source: arXiv
This is the third article of a series of our recent works, addressing an open question of Bonk-Heinonen-Koskela [3], to study the relationship between (inner) uniformality and Gromov hyperbolicity in infinite dimensional spaces. Our main focus of this paper is to establish improved geometric characterization of Gromov hyperbolicity. More precisely, we develop an elementary measure-independent approach to establish the geometric characterization of Gromov hyperbolicity for general proper Euclidean subdomains, which addresses a conjecture of Bonk-Heinonen-Koskela [Asterisque 2001] for unbounded Euclidean subdomains. Our main results not only improve the corresponding result of Balogh-Buckley [Invent. Math. 2003], but also clean up the relationship between the two geometric conditions, ball separation condition and Gehring-Hayman inequality, that used to characterize Gromov hyperbolicity. We also provide a negative answer to an open problem of Balogh-Buckley by constructing an Euclidean domain with ball separation property but fails to satisfy the Gehring-Hayman inequality. Furthermore, we prove that ball separation condition, together with an LLC-2 condition, implies inner uniformality and thus the Gehring-Hayman inequality. As a consequence of our new approach, we are able to prove such a geometric characterization of Gromov hyperbolicity in the fairly general setting of metric spaces (without measures), which substantially improves the main result of Koskela-Lammi-Manojlovi\'c [Ann. Sci. \'Ec. Norm. Sup\'er. 2014]. In particular, we not only provide a new purely metric proof of the main reuslt of Balogh-Buckley and Koskela-Lammi-Manojlovi\'c, but also derive explicit dependence of various involved constants, which improves all the previous known results.
13. Developer-LLM Conversations: An Empirical Study of Interactions and Generated Code Quality
Authors: Suzhen Zhong, Ying Zou, Bram Adams β’
Published: 2025-09-12 β’
Source: arXiv
Large Language Models (LLMs) are becoming integral to modern software development workflows, assisting developers with code generation, API explanation, and iterative problem-solving through natural language conversations. Despite widespread adoption, there is limited understanding of how developers interact with LLMs in practice and how these conversational dynamics influence task outcomes, code quality, and software engineering workflows. To address this, we leverage CodeChat, a large dataset comprising 82,845 real-world developer-LLM conversations, containing 368,506 code snippets generated across over 20 programming languages, derived from the WildChat dataset. We find that LLM responses are substantially longer than developer prompts, with a median token-length ratio of 14:1. Multi-turn conversations account for 68% of the dataset and often evolve due to shifting requirements, incomplete prompts, or clarification requests. Topic analysis identifies web design (9.6% of conversations) and neural network training (8.7% of conversations) as the most frequent LLM-assisted tasks. Evaluation across five languages (i.e., Python, JavaScript, C++, Java, and C#) reveals prevalent and language-specific issues in LLM-generated code: generated Python and JavaScript code often include undefined variables (83.4% and 75.3% of code snippets, respectively); Java code lacks required comments (75.9%); C++ code frequently omits headers (41.1%) and C# code shows unresolved namespaces (49.2%). During a conversation, syntax and import errors persist across turns; however, documentation quality in Java improves by up to 14.7%, and import handling in Python improves by 3.7% over 5 turns. Prompts that point out mistakes in code generated in prior turns and explicitly request a fix are most effective for resolving errors.
14. Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems
Authors: Alva West, Yixuan Weng, Minjun Zhu, Zhen Lin, Yue Zhang β’
Published: 2025-09-12 β’
Source: arXiv
Failure attribution in multi-agent systems -- pinpointing the exact step where a decisive error occurs -- is a critical yet unsolved challenge. Current methods treat this as a pattern recognition task over long conversation logs, leading to critically low step-level accuracy (below 17\%), which renders them impractical for debugging complex systems. Their core weakness is a fundamental inability to perform robust counterfactual reasoning: to determine if correcting a single action would have actually averted the task failure. To bridge this counterfactual inference gap, we introduce Abduct-Act-Predict (A2P) Scaffolding, a novel agent framework that transforms failure attribution from pattern recognition into a structured causal inference task. A2P explicitly guides a large language model through a formal three-step reasoning process within a single inference pass: (1) Abduction, to infer the hidden root causes behind an agent's actions; (2) Action, to define a minimal corrective intervention; and (3) Prediction, to simulate the subsequent trajectory and verify if the intervention resolves the failure. This structured approach leverages the holistic context of the entire conversation while imposing a rigorous causal logic on the model's analysis. Our extensive experiments on the Who\&When benchmark demonstrate its efficacy. On the Algorithm-Generated dataset, A2P achieves 47.46\% step-level accuracy, a 2.85$\times$ improvement over the 16.67\% of the baseline. On the more complex Hand-Crafted dataset, it achieves 29.31\% step accuracy, a 2.43$\times$ improvement over the baseline's 12.07\%. By reframing the problem through a causal lens, A2P Scaffolding provides a robust, verifiable, and significantly more accurate solution for automated failure attribution.
15. Vendi Information Gain for Active Learning and its Application to Ecology
Authors: Quan Nguyen, Adji Bousso Dieng β’
Published: 2025-09-12 β’
Source: arXiv
While monitoring biodiversity through camera traps has become an important endeavor for ecological research, identifying species in the captured image data remains a major bottleneck due to limited labeling resources. Active learning -- a machine learning paradigm that selects the most informative data to label and train a predictive model -- offers a promising solution, but typically focuses on uncertainty in the individual predictions without considering uncertainty across the entire dataset. We introduce a new active learning policy, Vendi information gain (VIG), that selects images based on their impact on dataset-wide prediction uncertainty, capturing both informativeness and diversity. Applied to the Snapshot Serengeti dataset, VIG achieves impressive predictive accuracy close to full supervision using less than 10% of the labels. It consistently outperforms standard baselines across metrics and batch sizes, collecting more diverse data in the feature space. VIG has broad applicability beyond ecology, and our results highlight its value for biodiversity monitoring in data-limited environments.
16. Ordinality of Visible-Thermal Image Intensities for Intrinsic Image Decomposition
Authors: Zeqing Leo Yuan, Mani Ramanagopal, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan β’
Published: 2025-09-12 β’
Source: arXiv
Decomposing an image into its intrinsic photometric factors--shading and reflectance--is a long-standing challenge due to the lack of extensive ground-truth data for real-world scenes. Recent methods rely on synthetic data or sparse annotations for limited indoor and even fewer outdoor scenes. We introduce a novel training-free approach for intrinsic image decomposition using only a pair of visible and thermal images. We leverage the principle that light not reflected from an opaque surface is absorbed and detected as heat by a thermal camera. This allows us to relate the ordinalities between visible and thermal image intensities to the ordinalities of shading and reflectance, which can densely self-supervise an optimizing neural network to recover shading and reflectance. We perform quantitative evaluations with known reflectance and shading under natural and artificial lighting, and qualitative experiments across diverse outdoor scenes. The results demonstrate superior performance over recent learning-based models and point toward a scalable path to curating real-world ordinal supervision, previously infeasible via manual labeling.
17. The large-scale kinematics of young stars in the Milky Way disc: first results from SDSS-V
Authors: Eleonora Zari, Jaime VillaseΓ±or, Marina Kounkel, Hans-Walter Rix, Neige Frankel, Andrew Tkachenko, Sergey Khoperskov, Elena D'Onghia, Alexandre Roman-Lopes, Carlos RomΓ‘n-ZΓΊΓ±iga, S. Guy Stringfellow, C. Jonathan Tan, Aida Wofford, Dmitry Bizyaev, John Donor, G. JosΓ© FernΓ‘ndez-Trincado, Sean Morrison, Kaike Pan, F. Sebastian Sanchez, Andrew Saydjari β’
Published: 2025-09-12 β’
Source: arXiv
We present a first large-scale kinematic map of $\sim$50,000 young OB stars ($T_{\rm eff} \geq 10,000$ K), based on BOSS spectroscopy from the Milky Way Mapper OB program in the ongoing Sloan Digital Sky Survey V (SDSS-V). Using photogeometric distances, line-of-sight velocities and Gaia DR3 proper motions, we map 3D Galactocentric velocities across the Galactic plane to $\sim$5 kpc from the Sun, with a focus on radial motions ($v_R$). Our results reveal mean radial motion with amplitudes of $\pm 30$ km/s that are coherent on kiloparsec scales, alternating between inward and outward motions. These $\bar{v}_R$ amplitudes are considerably higher than those observed for older, red giant populations. These kinematic patterns show only a weak correlation with spiral arm over-densities. Age estimates, derived from MIST isochrones, indicate that 85% of the sample is younger than $\sim300$ Myr and that the youngest stars ($\le 30$ Myr) align well with density enhancements. The age-dependent $\bar{v}_R$ in Auriga makes it plausible that younger stars exhibits different velocity variations than older giants. The origin of the radial velocity features remains uncertain, and may result from a combination of factors, including spiral arm dynamics, the Galactic bar, resonant interactions, or phase mixing following a perturbation. The present analysis is based on approximately one-third of the full target sample. The completed survey will enable a more comprehensive investigation of these features and a detailed dynamical interpretation.
18. Merging Physics-Based Synthetic Data and Machine Learning for Thermal Monitoring of Lithium-ion Batteries: The Role of Data Fidelity
Authors: Yusheng Zheng, Wenxue Liu, Yunhong Che, Ferdinand Grimm, Jingyuan Zhao, Xiaosong Hu, Simona Onori, Remus Teodorescu, Gregory J. Offer β’
Published: 2025-09-12 β’
Source: arXiv
Since the internal temperature is less accessible than surface temperature, there is an urgent need to develop accurate and real-time estimation algorithms for better thermal management and safety. This work presents a novel framework for resource-efficient and scalable development of accurate, robust, and adaptive internal temperature estimation algorithms by blending physics-based modeling with machine learning, in order to address the key challenges in data collection, model parameterization, and estimator design that traditionally hinder both approaches. In this framework, a physics-based model is leveraged to generate simulation data that includes different operating scenarios by sweeping the model parameters and input profiles. Such a cheap simulation dataset can be used to pre-train the machine learning algorithm to capture the underlying mapping relationship. To bridge the simulation-to-reality gap resulting from imperfect modeling, transfer learning with unsupervised domain adaptation is applied to fine-tune the pre-trained machine learning model, by using limited operational data (without internal temperature values) from target batteries. The proposed framework is validated under different operating conditions and across multiple cylindrical batteries with convective air cooling, achieving a root mean square error of 0.5 {\deg}C when relying solely on prior knowledge of battery thermal properties, and less than 0.1 {\deg}C when using thermal parameters close to the ground truth. Furthermore, the role of the simulation data quality in the proposed framework has been comprehensively investigated to identify promising ways of synthetic data generation to guarantee the performance of the machine learning model.
19. CMB Constraints on Quantized Spatial Curvature $Ξ©_K$ in globally CPT-symmetric universes
Authors: Wei-Ning Deng, Will Handley β’
Published: 2025-09-12 β’
Source: arXiv
The periodic solution of the Friedmann equation in conformal time, implies that only cosmological perturbations exhibiting corresponding symmetries are physically permissible, leading to a discrete spectrum of allowed wave vectors. Furthermore, in a spatially closed universe, these wave vectors are independently constrained to be integers. Matching these two distinct quantization conditions provides a novel theoretical constraint on the possible values of spatial curvature. In this work, we numerically solve the cosmological perturbation equations, incorporating radiation anisotropy and higher-order Boltzmann terms, to calculate these discrete wave vectors with improved precision. Subsequently, we generate Cosmic Microwave Background (CMB) power spectra for different characteristic spacings of these quantized wave vectors. Finally, we apply the constraint to Planck 2018 observational data to determine the cosmological parameters. This analysis yields a discrete set of allowed values for the spatial curvature, $\Omega_K$, including $[-0.076,-0.039, -0.024, -0.016, -0.012, \dots]$.
20. Proving symmetry of localized solutions and application to dihedral patterns in the planar Swift-Hohenberg PDE
Authors: Dominic Blanco, Matthieu Cadiot β’
Published: 2025-09-12 β’
Source: arXiv
In this article, we extend the framework developed previously to allow for rigorous proofs of existence of smooth, localized solutions in semi-linear partial differential equations possessing both space and non-space group symmetries. We demonstrate our approach on the Swift-Hohenberg model. In particular, for a given symmetry group $\mathcal{G}$, we construct a natural Hilbert space $H^l_{\mathcal{G}}$ containing only functions with $\mathcal{G}$-symmetry. In this space, products and differential operators are well-defined allowing for the study of autonomous semi-linear PDEs. Depending on the properties of $\mathcal{G}$, we derive a Newton-Kantorovich approach based on the construction of an approximate inverse around an approximate solution, $u_0$. More specifically, combining a meticulous analysis and computer-assisted techniques, the Newton-Kantorovich approach is validated thanks to the computation of some explicit bounds. The strategy for constructing $u_0$, the approximate inverse, and the computation of these bounds will depend on the properties of $\mathcal{G}$. We demonstrate the methodology on the 2D Swift-Hohenberg PDE by proving the existence of various dihedral localized patterns. The algorithmic details to perform the computer-assisted proofs can be found on Github.
21. Data distribution impacts the performance and generalisability of contrastive learning-based foundation models of electrocardiograms
Authors: Gul Rukh Khattak, Konstantinos Patlatzoglou, Joseph Barker, Libor Pastika, Boroumand Zeidaabadi, Ahmed El-Medany, Hesham Aggour, Yixiu Liang, Antonio H. Ribeiro, Jeffrey Annis, Antonio Luiz Pinho Ribeiro, Junbo Ge, Daniel B. Kramer, Jonathan W. Waks, Evan Brittain, Nicholas Peters, Fu Siong Ng, Arunashis Sau β’
Published: 2025-09-12 β’
Source: arXiv
Contrastive learning is a widely adopted self-supervised pretraining strategy, yet its dependence on cohort composition remains underexplored. We present Contrasting by Patient Augmented Electrocardiograms (CAPE) foundation model and pretrain on four cohorts (n = 5,203,352), from diverse populations across three continents (North America, South America, Asia). We systematically assess how cohort demographics, health status, and population diversity influence the downstream performance for prediction tasks also including two additional cohorts from another continent (Europe). We find that downstream performance depends on the distributional properties of the pretraining cohort, including demographics and health status. Moreover, while pretraining with a multi-centre, demographically diverse cohort improves in-distribution accuracy, it reduces out-of-distribution (OOD) generalisation of our contrastive approach by encoding cohort-specific artifacts. To address this, we propose the In-Distribution Batch (IDB) strategy, which preserves intra-cohort consistency during pretraining and enhances OOD robustness. This work provides important insights for developing clinically fair and generalisable foundation models.
22. On the semi-infinite cohomology of graded-unitary vertex algebras
Authors: Christopher Beem, Niklas Garner β’
Published: 2025-09-12 β’
Source: arXiv
Recently, the first author with A. Ardehali, M. Lemos, and L. Rastelli introduced the notion of graded unitarity for vertex algebras. This generalization of unitarity is motivated by the SCFT/VOA correspondence and introduces a novel Hilbert space structure on the state space of a large class of vertex algebras that are not unitary in the conventional sense. In this paper, we study the relative semi-infinite cohomology of graded-unitary vertex algebras that admit a chiral quantum moment map for an affine current algebra at twice the critical level. We show that the relative semi-infinite chain complex for such a graded-unitary vertex algebra has a structure analogous to that of differential forms on a compact K\"ahler manifold, generalizing a strong form of the classic construction of Banks--Peskin and Frenkel--Garland--Zuckerman. We deduce that the relative semi-infinite cohomology is itself graded-unitary, which establishes graded unitarity for a large class of vertex operator algebras arising from three- and four-dimensional supersymmetric quantum field theories. We further establish an outer USp$(2)$ action on the semi-infinite cohomology (which does not respect cohomological grading), analogous to the Lefschetz $\mathfrak{sl}(2)$ in K\"ahler geometry. We also show that the semi-infinite chain complex is quasi-isomorphic as a differential graded vertex algebra to its cohomology, in analogy to the formality result of Deligne--Griffiths--Morgan--Sullivan for the de Rham cohomology of compact K\"ahler manifolds. We conclude by observing consequences of these results to the associated Poisson vertex algebras and related finite-type derived Poisson reductions.
23. Robot guide with multi-agent control and automatic scenario generation with LLM
Authors: Elizaveta D. Moskovskaya, Anton D. Moscowsky β’
Published: 2025-09-12 β’
Source: arXiv
The work describes the development of a hybrid control architecture for an anthropomorphic tour guide robot, combining a multi-agent resource management system with automatic behavior scenario generation based on large language models. The proposed approach aims to overcome the limitations of traditional systems, which rely on manual tuning of behavior scenarios. These limitations include manual configuration, low flexibility, and lack of naturalness in robot behavior. The process of preparing tour scenarios is implemented through a two-stage generation: first, a stylized narrative is created, then non-verbal action tags are integrated into the text. The multi-agent system ensures coordination and conflict resolution during the execution of parallel actions, as well as maintaining default behavior after the completion of main operations, contributing to more natural robot behavior. The results obtained from the trial demonstrate the potential of the proposed approach for automating and scaling social robot control systems.
24. OpenCSP: A Deep Learning Framework for Crystal Structure Prediction from Ambient to High Pressure
Authors: Yinan Wang, Xiaoyang Wang, Zhenyu Wang, Jing Wu, Jian Lv, Han Wang β’
Published: 2025-09-12 β’
Source: arXiv
High-pressure crystal structure prediction (CSP) underpins advances in condensed matter physics, planetary science, and materials discovery. Yet, most large atomistic models are trained on near-ambient, equilibrium data, leading to degraded stress accuracy at tens to hundreds of gigapascals and sparse coverage of pressure-stabilized stoichiometries and dense coordination motifs. Here, we introduce OpenCSP, a machine learning framework for CSP tasks spanning ambient to high-pressure conditions. This framework comprises an open-source pressure-resolved dataset alongside a suite of publicly available atomistic models that are jointly optimized for accuracy in energy, force, and stress predictions. The dataset is constructed via randomized high-pressure sampling and iteratively refined through an uncertainty-guided concurrent learning strategy, which enriches underrepresented compression regimes while suppressing redundant DFT labeling. Despite employing a training corpus one to two orders of magnitude smaller than those of leading large models, OpenCSP achieves comparable or superior performance in high-pressure enthalpy ranking and stability prediction. Across benchmark CSP tasks spanning a wide pressure window, our models match or surpass MACE-MPA-0, MatterSim v1 5M, and GRACE-2L-OAM, with the largest gains observed at elevated pressures. These results demonstrate that targeted, pressure-aware data acquisition coupled with scalable architectures enables data-efficient, high-fidelity CSP, paving the way for autonomous materials discovery under ambient and extreme conditions.
25. A Holistic Architecture for Monitoring and Optimization of Robust Multi-Agent Path Finding Plan Execution
Authors: David ZahrΓ‘dka, Denisa MuΕΎΓkovΓ‘, David Woller, Miroslav Kulich, JiΕΓ Ε vancara, Roman BartΓ‘k β’
Published: 2025-09-12 β’
Source: arXiv
The goal of Multi-Agent Path Finding (MAPF) is to find a set of paths for a fleet of agents moving in a shared environment such that the agents reach their goals without colliding with each other. In practice, some of the robots executing the plan may get delayed, which can introduce collision risk. Although robust execution methods are used to ensure safety even in the presence of delays, the delays may still have a significant impact on the duration of the execution. At some point, the accumulated delays may become significant enough that instead of continuing with the execution of the original plan, even if it was optimal, there may now exist an alternate plan which will lead to a shorter execution. However, the problem is how to decide when to search for the alternate plan, since it is a costly procedure. In this paper, we propose a holistic architecture for robust execution of MAPF plans, its monitoring and optimization. We exploit a robust execution method called Action Dependency Graph to maintain an estimate of the expected execution duration during the plan's execution. This estimate is used to predict the potential that finding an alternate plan would lead to shorter execution. We empirically evaluate the architecture in experiments in a real-time simulator which we designed to mimic our real-life demonstrator of an autonomous warehouse robotic fleet.
26. DiffAero: A GPU-Accelerated Differentiable Simulation Framework for Efficient Quadrotor Policy Learning
Authors: Xinhong Zhang, Runqing Wang, Yunfan Ren, Jian Sun, Hao Fang, Jie Chen, Gang Wang β’
Published: 2025-09-12 β’
Source: arXiv
This letter introduces DiffAero, a lightweight, GPU-accelerated, and fully differentiable simulation framework designed for efficient quadrotor control policy learning. DiffAero supports both environment-level and agent-level parallelism and integrates multiple dynamics models, customizable sensor stacks (IMU, depth camera, and LiDAR), and diverse flight tasks within a unified, GPU-native training interface. By fully parallelizing both physics and rendering on the GPU, DiffAero eliminates CPU-GPU data transfer bottlenecks and delivers orders-of-magnitude improvements in simulation throughput. In contrast to existing simulators, DiffAero not only provides high-performance simulation but also serves as a research platform for exploring differentiable and hybrid learning algorithms. Extensive benchmarks and real-world flight experiments demonstrate that DiffAero and hybrid learning algorithms combined can learn robust flight policies in hours on consumer-grade hardware. The code is available at https://github.com/flyingbitac/diffaero.
27. A Certifiable Machine Learning-Based Pipeline to Predict Fatigue Life of Aircraft Structures
Authors: Γngel LadrΓ³n, Miguel SΓ‘nchez-DomΓnguez, Javier RozalΓ©n, Fernando R. SΓ‘nchez, Javier de Vicente, Lucas Lacasa, Eusebio Valero, Gonzalo Rubio β’
Published: 2025-09-12 β’
Source: arXiv
Fatigue life prediction is essential in both the design and operational phases of any aircraft, and in this sense safety in the aerospace industry requires early detection of fatigue cracks to prevent in-flight failures. Robust and precise fatigue life predictors are thus essential to ensure safety. Traditional engineering methods, while reliable, are time consuming and involve complex workflows, including steps such as conducting several Finite Element Method (FEM) simulations, deriving the expected loading spectrum, and applying cycle counting techniques like peak-valley or rainflow counting. These steps often require collaboration between multiple teams and tools, added to the computational time and effort required to achieve fatigue life predictions. Machine learning (ML) offers a promising complement to traditional fatigue life estimation methods, enabling faster iterations and generalization, providing quick estimates that guide decisions alongside conventional simulations. In this paper, we present a ML-based pipeline that aims to estimate the fatigue life of different aircraft wing locations given the flight parameters of the different missions that the aircraft will be operating throughout its operational life. We validate the pipeline in a realistic use case of fatigue life estimation, yielding accurate predictions alongside a thorough statistical validation and uncertainty quantification. Our pipeline constitutes a complement to traditional methodologies by reducing the amount of costly simulations and, thereby, lowering the required computational and human resources.
28. Virtual Agent Economies
Authors: Nenad Tomasev, Matija Franklin, Joel Z. Leibo, Julian Jacobs, William A. Cunningham, Iason Gabriel, Simon Osindero β’
Published: 2025-09-12 β’
Source: arXiv
The rapid adoption of autonomous AI agents is giving rise to a new economic layer where agents transact and coordinate at scales and speeds beyond direct human oversight. We propose the "sandbox economy" as a framework for analyzing this emergent system, characterizing it along two key dimensions: its origins (emergent vs. intentional) and its degree of separateness from the established human economy (permeable vs. impermeable). Our current trajectory points toward a spontaneous emergence of a vast and highly permeable AI agent economy, presenting us with opportunities for an unprecedented degree of coordination as well as significant challenges, including systemic economic risk and exacerbated inequality. Here we discuss a number of possible design choices that may lead to safely steerable AI agent markets. In particular, we consider auction mechanisms for fair resource allocation and preference resolution, the design of AI "mission economies" to coordinate around achieving collective goals, and socio-technical infrastructure needed to ensure trust, safety, and accountability. By doing this, we argue for the proactive design of steerable agent markets to ensure the coming technological shift aligns with humanity's long-term collective flourishing.
29. Evolution of Coordination Through Institutional Incentives: An Evolutionary Game Theory Approach
Authors: Ndidi Bianca Ogbo, Zhao Song, The Anh Han β’
Published: 2025-09-12 β’
Source: arXiv
There is a broad recognition that commitment-based mechanisms can promote coordination and cooperative behaviours in both biological populations and self-organised multi-agent systems by making individuals' intentions explicit prior to engagement. Yet their effectiveness depends on sustained compliance supported by institutions, especially in one-off interactions. Despite advances in quantitative studies of cooperation and commitment, most applied analyses and policy debates remain largely qualitative, with limited attention to the allocation of scarce institutional resources between enhancing participation and ensuring commitment compliance. Herein, we develop an evolutionary game-theoretic model that explicitly examines the strategic distribution of a limited budget for institutional incentives, namely rewards or punishments, aimed at these two critical objectives within pre-commitment frameworks. Our findings reveal that a reward-based incentive approach consistently yields greater coordination success than a punishment-based approach, with optimal outcomes arising when resources are appropriately distributed between participation promotion and compliance assurance. These findings offer novel insights for designing institutional incentives to promote broad, coordinated adoption of new technologies.
30. Maximising Energy Efficiency in Large-Scale Open RAN: Hybrid xApps and Digital Twin Integration
Authors: Ahmed Al-Tahmeesschi, Yi Chu, Gurdeep Singh, Charles Turyagyenda, Dritan Kaleshi, David Grace, Hamed Ahmadi β’
Published: 2025-09-12 β’
Source: arXiv
The growing demand for high-speed, ultra-reliable, and low-latency communications in 5G and beyond networks has significantly driven up power consumption, particularly within the Radio Access Network (RAN). This surge in energy demand poses critical operational and sustainability challenges for mobile network operators, necessitating innovative solutions that enhance energy efficiency without compromising Quality of Service (QoS). Open Radio Access Network (O-RAN), spearheaded by the O-RAN Alliance, offers disaggregated, programmable, and intelligent architectures, promoting flexibility, interoperability, and cost-effectiveness. However, this disaggregated approach adds complexity, particularly in managing power consumption across diverse network components such as Open Radio Units (RUs). In this paper, we propose a hybrid xApp leveraging heuristic methods and unsupervised machine learning, integrated with digital twin technology through the TeraVM AI RAN Scenario Generator (AI-RSG). This approach dynamically manages RU sleep modes to effectively reduce energy consumption. Our experimental evaluation in a realistic, large-scale emulated Open RAN scenario demonstrates that the hybrid xApp achieves approximately 13% energy savings, highlighting its practicality and significant potential for real-world deployments without compromising user QoS.
31. Semantic Rate-Distortion Theory with Applications
Authors: Yi-Qun Zhao, Zhi-Ming Ma, Geoffrey Ye Li, Shuai Yuan, Tong Ye, Chuan Zhou β’
Published: 2025-09-12 β’
Source: arXiv
Artificial intelligence (AI) is ushering in a new era for communication. As a result, the establishment of a semantic communication framework is putting on the agenda. Based on a realistic semantic communication model, this paper develops a rate-distortion framework for semantic compression. Different from the existing works primarily focusing on decoder-side estimation of intrinsic meaning and ignoring its inherent issues, such as ambiguity and polysemy, we exploit a constraint of conditional semantic probability distortion to effectively capture the essential features of practical semantic exchanges in an AI-assisted communication system. With the help of the methods in rate-distortion-perception theory, we establish a theorem specifying the minimum achievable rate under this semantic constraint and a traditional symbolic constraint and obtain its closed-form limit for a particular semantic scenario. From the experiments in this paper, bounding conditional semantic probability distortion can effectively improve both semantic transmission accuracy and bit-rate efficiency. Our framework bridges information theory and AI, enabling potential applications in bandwidth-efficient semantic-aware networks, enhanced transceiver understanding, and optimized semantic transmission for AI-driven systems.
32. Data-driven optimization of sparse sensor placement in thermal hydraulic experiments
Authors: Xicheng Wang, Yun. Feng, Dmitry Grishchenko, Pavel Kudinov, Ruifeng Tian, Sichao Tan β’
Published: 2025-09-12 β’
Source: arXiv
Thermal-Hydraulic (TH) experiments provide valuable insight into the physics of heat and mass transfer and qualified data for code development, calibration and validation. However, measurements are typically collected from sparsely distributed sensors, offering limited coverage over the domain of interest and phenomena of interest. Determination of the spatial configuration of these sensors is crucial and challenging during the pre-test design stage. This paper develops a data-driven framework for optimizing sensor placement in TH experiments, including (i) a sensitivity analysis to construct datasets, (ii) Proper Orthogonal Decomposition (POD) for dimensionality reduction, and (iii) QR factorization with column pivoting to determine optimal sensor configuration under spatial constraints. The framework is demonstrated on a test conducted in the TALL-3D Lead-bismuth eutectic (LBE) loop. In this case, the utilization of optical techniques, such as Particle Image Velocimetry (PIV), are impractical. Thereby the quantification of momentum and energy transport relies heavily on readings from Thermocouples (TCs). The test section was previously instrumented with many TCs determined through a manual process combining simulation results with expert judgement. The proposed framework provides a systematic and automated approach for sensor placement. The resulting TCs exhibit high sensitivity to the variation of uncertain input parameters and enable accurate full field reconstruction while maintaining robustness against measurement noise.
33. Space-Time Tradeoffs for Spatial Conjunctive Queries
Authors: Aryan Esmailpour, Xiao Hu, Stavros Sintos β’
Published: 2025-09-12 β’
Source: arXiv
Given a conjunctive query and a database instance, we aim to develop an index that can efficiently answer spatial queries on the results of a conjunctive query. We are interested in some commonly used spatial queries, such as range emptiness, range count, and nearest neighbor queries. These queries have essential applications in data analytics, such as filtering relational data based on attribute ranges and temporal graph analysis for counting graph structures like stars, paths, and cliques. Furthermore, this line of research can accelerate relational algorithms that incorporate spatial queries in their workflow, such as relational clustering. Known approaches either have to spend $\tilde{O}(N)$ query time or use space as large as the number of query results, which are inefficient or unrealistic to employ in practice. Hence, we aim to construct an index that answers spatial conjunctive queries in both time- and space-efficient ways. In this paper, we establish lower bounds on the tradeoff between answering time and space usage. For $k$-star (resp. $k$-path) queries, we show that any index for range emptiness, range counting or nearest neighbor queries with $T$ answering time requires $\Omega\left(N+\frac{N^k}{T^k}\right)$ (resp. $\Omega\left(N+\frac{N^2}{T^{2/(k-1)}}\right)$) space. Then, we construct optimal indexes for answering range emptiness and range counting problems over $k$-star and $k$-path queries. Extending this result, we build an index for hierarchical queries. By resorting to the generalized hypertree decomposition, we can extend our index to arbitrary conjunctive queries for supporting spatial conjunctive queries. Finally, we show how our new indexes can be used to improve the running time of known algorithms in the relational setting.
34. LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA
Authors: Jing Huang, Zhiya Tan, Shutao Gong, Fanwei Zeng, Jianshu Li β’
Published: 2025-09-12 β’
Source: arXiv
As large vision language models (VLMs) advance, their capabilities in multilingual visual question answering (mVQA) have significantly improved. Chain-of-thought (CoT) reasoning has been proven to enhance interpretability and complex reasoning. However, most existing approaches rely primarily on textual CoT and provide limited support for multilingual multimodal reasoning, constraining their deployment in real-world applications. To address this gap, we introduce \textbf{LaV-CoT}, the first Language-aware Visual CoT framework with Multi-Aspect Reward Optimization. LaV-CoT incorporates an interpretable multi-stage reasoning pipeline consisting of Text Summary with Bounding Box (BBox), Language Identification, Spatial Object-level Captioning, and Step-by-step Logical Reasoning. Following this reasoning pipeline, we design an automated data curation method that generates multilingual CoT annotations through iterative generation, correction, and refinement, enabling scalable and high-quality training data. To improve reasoning and generalization, LaV-CoT adopts a two-stage training paradigm combining Supervised Fine-Tuning (SFT) with Language-aware Group Relative Policy Optimization (GRPO), guided by verifiable multi-aspect rewards including language consistency, structural accuracy, and semantic alignment. Extensive evaluations on public datasets including MMMB, Multilingual MMBench, and MTVQA show that LaV-CoT achieves up to \(\sim\)9.5\% accuracy improvements over open-source baselines of similar size and even surpasses models with 2$\times$ larger scales by \(\sim\)2.6\%. Moreover, LaV-CoT outperforms advanced proprietary models such as GPT-4o-0513 and Gemini-2.5-flash. We further conducted an online A/B test to validate our method on real-world data, highlighting its effectiveness for industrial deployment. Our code is available at this link: \href{https://github.com/HJNVR/LaV-CoT}
35. A Framework for AI-Supported Mediation in Community-based Online Collaboration
Authors: Soobin Cho, Mark Zachry, David W. McDonald β’
Published: 2025-09-12 β’
Source: arXiv
Online spaces involve diverse communities engaging in various forms of collaboration, which naturally give rise to discussions, some of which inevitably escalate into conflict or disputes. To address such situations, AI has primarily been used for moderation. While moderation systems are important because they help maintain order, common moderation strategies of removing or suppressing content and users rarely address the underlying disagreements or the substantive content of disputes. Mediation, by contrast, fosters understanding, reduces emotional tension, and facilitates consensus through guided negotiation. Mediation not only enhances the quality of collaborative decisions but also strengthens relationships among group members. For this reason, we argue for shifting focus toward AI-supported mediation. In this work, we propose an information-focused framework for AI-supported mediation designed for community-based collaboration. Within this framework, we hypothesize that AI must acquire and reason over three key types of information: content, culture, and people.