1. Observation of Metal-Insulator and Spectral Phase Transitions in Aubry-AndrΓ©-Harper Models
Authors: Quan Lin, Christopher Cedzich, Qi Zhou, Peng Xue β’
Published: 2025-08-11 β’
Source: arXiv
Non-Hermitian extensions of the Aubry-Andr\'e-Harper (AAH) model reveal a rich variety of phase transitions arising from the interplay of quasiperiodicity and non-Hermiticity. Despite their theoretical significance, experimental explorations remain challenging due to complexities in realizing controlled non-Hermiticity. Here, we present the first experimental realization of the unitary almost-Mathieu operator (UAMO) which simulates the AAH model by employing single-photon quantum walks. Through precise control of quasiperiodicity, we systematically explore the phase diagram displaying a phase transition between localized and delocalized regimes in the Hermitian limit. Subsequently, by introducing non-reciprocal hopping, we experimentally probe the parity-time (PT) symmetry-breaking transition that is characterized by the emergence of complex quasienergies. Moreover, we identify a novel spectral transition exclusive to discrete-time settings, where all quasienergies become purely imaginary. Both transitions are connected to changes in the spectral winding number, demonstrating their topological origins. These results clarify the interplay between localization, symmetry breaking, and topology in non-Hermitian quasicrystals, paving the way for future exploration of synthetic quantum matter.
2. Sensitivity toward dark matter annihilation imprints on 21-cm signal with SKA-Low: A convolutional neural network approach
Authors: Pravin Kumar Natwariya, Kenji Kadota, Atsushi J. Nishizawa β’
Published: 2025-08-11 β’
Source: arXiv
This study investigates the sensitivity of the radio interferometers to identify imprints of spatially inhomogeneous dark matter annihilation signatures in the 21-cm signal during the pre-reionization era. We focus on the upcoming low-mode survey of the Square Kilometre Array (SKA-Low) telescope. Using CNNs, we analyze simulated 3D 21-cm differential brightness temperature maps generated via the DM21cm code, which is based on 21cmFAST and DarkHistory, to distinguish between spatially homogeneous and inhomogeneous energy injection/deposition scenarios arising from dark matter annihilation. The inhomogeneous case accounts for local dark matter density contrasts and gas properties, such as thermal and ionization states, while the homogeneous model assumes uniform energy deposition. Our study focuses on two primary annihilation channels to electron-positron pairs ($e^+e^-$) and photons ( $\gamma \gamma$), exploring dark matter masses from 1 MeV to 100 MeV and a range of annihilation cross-sections. For $\gamma \gamma$ channel, the distinction across dark matter models is less pronounced due to the larger mean free path of the emitted photons, resulting in a more uniform energy deposition. For $e^+e^-$ channel, the results indicate that the CNNs can effectively differentiate between the inhomogeneous and homogeneous cases. Despite observational challenges, the results demonstrate that these effects remain detectable even after incorporating noise from next-generation radio interferometers, such as the SKA. We find that the inhomogeneous dark matter annihilation models can leave measurable imprints on the 21-cm signal maps distinguishable from the homogeneous scenarios for the dark matter masses $m_{\rm DM}=1$ MeV and the annihilation cross-sections of $\geq 5 \times 10^{-30}~{\rm cm^3/sec}$ ($\geq 5 \times 10^{-29}~{\rm cm^3/sec}$ for $m_{\rm DM}=100$ MeV) for moderate SKA-Low noise.
3. StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation
Authors: Shuyuan Tu, Yueming Pan, Yinming Huang, Xintong Han, Zhen Xing, Qi Dai, Chong Luo, Zuxuan Wu, Yu-Gang Jiang β’
Published: 2025-08-11 β’
Source: arXiv
Current diffusion models for audio-driven avatar video generation struggle to synthesize long videos with natural audio synchronization and identity consistency. This paper presents StableAvatar, the first end-to-end video diffusion transformer that synthesizes infinite-length high-quality videos without post-processing. Conditioned on a reference image and audio, StableAvatar integrates tailored training and inference modules to enable infinite-length video generation. We observe that the main reason preventing existing models from generating long videos lies in their audio modeling. They typically rely on third-party off-the-shelf extractors to obtain audio embeddings, which are then directly injected into the diffusion model via cross-attention. Since current diffusion backbones lack any audio-related priors, this approach causes severe latent distribution error accumulation across video clips, leading the latent distribution of subsequent segments to drift away from the optimal distribution gradually. To address this, StableAvatar introduces a novel Time-step-aware Audio Adapter that prevents error accumulation via time-step-aware modulation. During inference, we propose a novel Audio Native Guidance Mechanism to further enhance the audio synchronization by leveraging the diffusion's own evolving joint audio-latent prediction as a dynamic guidance signal. To enhance the smoothness of the infinite-length videos, we introduce a Dynamic Weighted Sliding-window Strategy that fuses latent over time. Experiments on benchmarks show the effectiveness of StableAvatar both qualitatively and quantitatively.
4. Identifying nonequilibrium degrees of freedom in high-dimensional stochastic systems
Authors: Catherine Ji, Ravin Raj, Benjamin Eysenbach, Gautam Reddy β’
Published: 2025-08-11 β’
Source: arXiv
Any coarse-grained description of a nonequilibrium system should faithfully represent its latent irreversible degrees of freedom. However, standard dimensionality reduction methods typically prioritize accurate reconstruction over physical relevance. Here, we introduce a model-free approach to identify irreversible degrees of freedom in stochastic systems that are in a nonequilibrium steady state. Our method leverages the insight that a black-box classifier, trained to differentiate between forward and time-reversed trajectories, implicitly estimates the local entropy production rate. By parameterizing this classifier as a quadratic form of learned state representations, we obtain nonlinear embeddings of high-dimensional state-space dynamics, which we term Latent Embeddings of Nonequilibrium Systems (LENS). LENS effectively identifies low-dimensional irreversible flows and provides a scalable, learning-based strategy for estimating entropy production rates directly from high-dimensional time series data.
5. Symmetry-Enriched Topological Phases and Their Gauging: A String-Net Model Realization
Authors: Nianrui Fu, Yu Zhao, Yidun Wan β’
Published: 2025-08-11 β’
Source: arXiv
We present a systematic framework for constructing exactly-solvable lattice models of symmetry-enriched topological (SET) phases based on an enlarged version of the string-net model. We also gauge the global symmetries of our SET models to obtain string-net models of pure topological phases. Without invoking externally imposed onsite symmetry actions, our approach promotes the string-net model of a pure topological order, specified by an input unitary fusion category $\mathscr{F}$, to an SET model, specified by a multifusion category together with a set of isomorphisms. Two complementary construction strategies are developed in the main text: (i) promotion via outer automorphisms of $\mathscr{F}$ and (ii) promotion via the Frobenius algebras of $\mathscr{F}$. The global symmetries derived via these two strategies are intrinsic to topological phases and are thus termed blood symmetries, as opposed to adopted symmetries, which can be arbitrarily imposed on topological phases. We propose the concept of symmetry-gauging family of topological phases, which are related by gauging their blood symmetries. With our approach, we construct the first explicit lattice realization of a nonabelian-symmetry-enriched topological phase -- the $S_3$ symmetry-enriched $\mathbb{Z}_2 \times \mathbb{Z}_2$ quantum-double phase. The approach further reveals the role of local excitations in SET phases and establishes their symmetry constraints.
6. Bringing Everyone to the Table: An Experimental Study of LLM-Facilitated Group Decision Making
Authors: Mohammed Alsobay, David M. Rothschild, Jake M. Hofman, Daniel G. Goldstein β’
Published: 2025-08-11 β’
Source: arXiv
Group decision-making often suffers from uneven information sharing, hindering decision quality. While large language models (LLMs) have been widely studied as aids for individuals, their potential to support groups of users, potentially as facilitators, is relatively underexplored. We present a pre-registered randomized experiment with 1,475 participants assigned to 281 five-person groups completing a hidden profile task--selecting an optimal city for a hypothetical sporting event--under one of four facilitation conditions: no facilitation, a one-time message prompting information sharing, a human facilitator, or an LLM (GPT-4o) facilitator. We find that LLM facilitation increases information shared within a discussion by raising the minimum level of engagement with the task among group members, and that these gains come at limited cost in terms of participants' attitudes towards the task, their group, or their facilitator. Whether by human or AI, there is no significant effect of facilitation on the final decision outcome, suggesting that even substantial but partial increases in information sharing are insufficient to overcome the hidden profile effect studied. To support further research into how LLM-based interfaces can support the future of collaborative decision making, we release our experimental platform, the Group-AI Interaction Laboratory (GRAIL), as an open-source tool.
7. ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks
Authors: Kaijun Wang, Liqin Lu, Mingyu Liu, Jianuo Jiang, Zeju Li, Bolin Zhang, Wancai Zheng, Xinyi Yu, Hao Chen, Chunhua Shen β’
Published: 2025-08-11 β’
Source: arXiv
Language-guided long-horizon mobile manipulation has long been a grand challenge in embodied semantic reasoning, generalizable manipulation, and adaptive locomotion. Three fundamental limitations hinder progress: First, although large language models have improved spatial reasoning and task planning through semantic priors, existing implementations remain confined to tabletop scenarios, failing to address the constrained perception and limited actuation ranges of mobile platforms. Second, current manipulation strategies exhibit insufficient generalization when confronted with the diverse object configurations encountered in open-world environments. Third, while crucial for practical deployment, the dual requirement of maintaining high platform maneuverability alongside precise end-effector control in unstructured settings remains understudied. In this work, we present ODYSSEY, a unified mobile manipulation framework for agile quadruped robots equipped with manipulators, which seamlessly integrates high-level task planning with low-level whole-body control. To address the challenge of egocentric perception in language-conditioned tasks, we introduce a hierarchical planner powered by a vision-language model, enabling long-horizon instruction decomposition and precise action execution. At the control level, our novel whole-body policy achieves robust coordination across challenging terrains. We further present the first benchmark for long-horizon mobile manipulation, evaluating diverse indoor and outdoor scenarios. Through successful sim-to-real transfer, we demonstrate the system's generalization and robustness in real-world deployments, underscoring the practicality of legged manipulators in unstructured environments. Our work advances the feasibility of generalized robotic assistants capable of complex, dynamic tasks. Our project page: https://kaijwang.github.io/odyssey.github.io/
8. SAGOnline: Segment Any Gaussians Online
Authors: Wentao Sun, Quanyun Wu, Hanqing Xu, Kyle Gao, Zhengsen Xu, Yiping Chen, Dedong Zhang, Lingfei Ma, John S. Zelek, Jonathan Li β’
Published: 2025-08-11 β’
Source: arXiv
3D Gaussian Splatting (3DGS) has emerged as a powerful paradigm for explicit 3D scene representation, yet achieving efficient and consistent 3D segmentation remains challenging. Current methods suffer from prohibitive computational costs, limited 3D spatial reasoning, and an inability to track multiple objects simultaneously. We present Segment Any Gaussians Online (SAGOnline), a lightweight and zero-shot framework for real-time 3D segmentation in Gaussian scenes that addresses these limitations through two key innovations: (1) a decoupled strategy that integrates video foundation models (e.g., SAM2) for view-consistent 2D mask propagation across synthesized views; and (2) a GPU-accelerated 3D mask generation and Gaussian-level instance labeling algorithm that assigns unique identifiers to 3D primitives, enabling lossless multi-object tracking and segmentation across views. SAGOnline achieves state-of-the-art performance on NVOS (92.7% mIoU) and Spin-NeRF (95.2% mIoU) benchmarks, outperforming Feature3DGS, OmniSeg3D-gs, and SA3D by 15--1500 times in inference speed (27 ms/frame). Qualitative results demonstrate robust multi-object segmentation and tracking in complex scenes. Our contributions include: (i) a lightweight and zero-shot framework for 3D segmentation in Gaussian scenes, (ii) explicit labeling of Gaussian primitives enabling simultaneous segmentation and tracking, and (iii) the effective adaptation of 2D video foundation models to the 3D domain. This work allows real-time rendering and 3D scene understanding, paving the way for practical AR/VR and robotic applications.
9. Autonomous Air-Ground Vehicle Operations Optimization in Hazardous Environments: A Multi-Armed Bandit Approach
Authors: Jimin Choi, Max Z. Li β’
Published: 2025-08-11 β’
Source: arXiv
Hazardous environments such as chemical spills, radiological zones, and bio-contaminated sites pose significant threats to human safety and public infrastructure. Rapid and reliable hazard mitigation in these settings often unsafe for humans, calling for autonomous systems that can adaptively sense and respond to evolving risks. This paper presents a decision-making framework for autonomous vehicle dispatch in hazardous environments with uncertain and evolving risk levels. The system integrates a Bayesian Upper Confidence Bound (BUCB) sensing strategy with task-specific vehicle routing problems with profits (VRPP), enabling adaptive coordination of unmanned aerial vehicles (UAVs) for hazard sensing and unmanned ground vehicles (UGVs) for cleaning. Using VRPP allows selective site visits under resource constraints by assigning each site a visit value that reflects sensing or cleaning priorities. Site-level hazard beliefs are maintained through a time-weighted Bayesian update. BUCB scores guide UAV routing to balance exploration and exploitation under uncertainty, while UGV routes are optimized to maximize expected hazard reduction under resource constraints. Simulation results demonstrate that our framework reduces the number of dispatch cycles to resolve hazards by around 30% on average compared to baseline dispatch strategies, underscoring the value of uncertainty-aware vehicle dispatch for reliable hazard mitigation.
10. Average Contraction Coefficients of Quantum Channels
Authors: Ruben Ibarrondo, Daniel Stilck FranΓ§a β’
Published: 2025-08-11 β’
Source: arXiv
The data-processing inequality ensures quantum channels reduce state distinguishability, with contraction coefficients quantifying optimal bounds. However, these can be overly optimistic and not representative of the usual behavior. We study how noise contracts distinguishability of `typical' states, beyond the worst-case. To that end, we introduce and study a family of moments of contraction for quantum divergences, which interpolate between the worst-case contraction coefficient of a channel and its average behavior under a chosen ensemble of input states. We establish general properties of these moments, relate moments for different divergences, and derive bounds in terms of channel parameters like the entropy or purity of its Choi state. Focusing on the trace distance, we obtain upper and lower bounds on its average contraction under tensor-product noise channels, and prove that, depending on the local noise strength, there is a phase transition in the limit of many channel uses: below a critical error rate the average contraction remains near unity, whereas above it decays exponentially with system size. We extend these phase-transition phenomena to random quantum circuits with unital noise, showing that constant-depth noisy circuits do not shrink the trace distance on average, even when given highly entangled states as input. In contrast, even at $\log\log n$ depth, the average trace distance can become superpolynomially small. Finally, we explore moments of contraction for f-divergences and discuss applications to local differential privacy, demonstrating that noise regimes ensuring privacy can render outputs essentially indistinguishable on average. Thus, our results provide a fine-grained framework to quantify typical channel noise in quantum information and computation and unveil new phenomena in contraction coefficients, such as phase transitions for average contraction.
11. Color it, Code it, Cancel it: k-local dynamical decoupling from classical additive codes
Authors: Minh T. P. Nguyen, Maximilian Rimbach-Russ, Stefano Bosco β’
Published: 2025-08-11 β’
Source: arXiv
Dynamical decoupling is a central technique in quantum computing for actively suppressing decoherence and systematic imperfections through sequences of single-qubit operations. Conventional sequences typically aim to completely freeze system dynamics, often resulting in long protocols whose length scales exponentially with system size. In this work, we introduce a general framework for constructing time-optimal, selectively-tailored sequences that remove only specific local interactions. By combining techniques from graph coloring and classical coding theory, our approach enables compact and hardware-tailored sequences across diverse qubit platforms, efficiently canceling undesired Hamiltonian terms while preserving target interactions. This opens up broad applications in quantum computing and simulation. At the core of our method is a mapping between dynamical decoupling sequence design and error-detecting codes, which allows us to leverage powerful coding-theoretic tools to construct customized sequences. To overcome exponential overheads, we exploit symmetries in colored interaction hypergraphs, extending graph-coloring strategies to arbitrary many-body Hamiltonians. We demonstrate the effectiveness of our framework through concrete examples, including compact sequences that suppress residual ZZ and ZZZ interactions in superconducting qubits and Heisenberg exchange coupling in spin qubits. We also show how it enables Hamiltonian engineering by simulating the anisotropic Kitaev honeycomb model using only isotropic Heisenberg interactions.
12. Tunable edge and depth sensing via phase-change nonlocal metasurfaces
Authors: Kenan Guo, Yue Jiang, Shuyuan Xiao, Tingting Liu β’
Published: 2025-08-11 β’
Source: arXiv
Performing simultaneous depth-of-field (DoF) extension and edge enhancement within a single optical element remains a fundamental challenge in advanced imaging. Here, we propose a wavelength-tunable nonlocal Huygens' metasurface capable of simultaneously extracting depth and edge features of images in a single-shot exposure. Using the selective polarization response of the Huygens' metasurfaces, the circularly polarized converted component undergoes geometric phase modulation for wavefront shaping to extend the DoF, while the non-converted component acts as a spatial frequency filter to enhance edge contrast. The integration of a phase-change material, Sb$_{2}$S$_{3}$, enables continuous tuning of the resonance wavelength across a range of 100 nm by modulating its refractive index, granting the system excellent broadband spectral adaptability. This work offers a novel and compact solution for real-time depth sensing and feature extraction in applications such as autonomous navigation and biomedical imaging.
13. Spatial-ORMLLM: Improve Spatial Relation Understanding in the Operating Room with Multimodal Large Language Model
Authors: Peiqi He, Zhenhao Zhang, Yixiang Zhang, Xiongjun Zhao, Shaoliang Peng β’
Published: 2025-08-11 β’
Source: arXiv
Precise spatial modeling in the operating room (OR) is foundational to many clinical tasks, supporting intraoperative awareness, hazard avoidance, and surgical decision-making. While existing approaches leverage large-scale multimodal datasets for latent-space alignment to implicitly learn spatial relationships, they overlook the 3D capabilities of MLLMs. However, this approach raises two issues: (1) Operating rooms typically lack multiple video and audio sensors, making multimodal 3D data difficult to obtain; (2) Training solely on readily available 2D data fails to capture fine-grained details in complex scenes. To address this gap, we introduce Spatial-ORMLLM, the first large vision-language model for 3D spatial reasoning in operating rooms using only RGB modality to infer volumetric and semantic cues, enabling downstream medical tasks with detailed and holistic spatial context. Spatial-ORMLLM incorporates a Spatial-Enhanced Feature Fusion Block, which integrates 2D modality inputs with rich 3D spatial knowledge extracted by the estimation algorithm and then feeds the combined features into the visual tower. By employing a unified end-to-end MLLM framework, it combines powerful spatial features with textual features to deliver robust 3D scene reasoning without any additional expert annotations or sensor inputs. Experiments on multiple benchmark clinical datasets demonstrate that Spatial-ORMLLM achieves state-of-the-art performance and generalizes robustly to previously unseen surgical scenarios and downstream tasks.
14. Differential rotation of solar Ξ±-sunspots and implications for stellar light curves
Authors: Emily Joe LΓΆΓnitz, Alexander G. M. Pietrow, Hritam Chakraborty, Meetu Verma, Ioannis Kontogiannis, Horst Balthasar, Carsten Denker, Monika Lendl β’
Published: 2025-08-11 β’
Source: arXiv
Differential rotation is a key driver of magnetic activity and dynamo processes in the Sun and other stars, especially as the rate differs across the solar layers, but also in active regions. We aim to accurately quantify the velocity at which round {\alpha}-spots traverse the solar disk as a function of their latitude, and compare these rates to those of the quiet-Sun and other sunspot types. We then extend this work to other stars and investigate how differential rotation affects the modulation of stellar light curves by introducing a generalized stellar differential rotation law. We manually identify and track 105 {\alpha}-sunspots in the 6173 {\AA} continuum using the Helioseismic and Magnetic Imager (HMI) aboard the Solar Dynamics Observatory (SDO). We measure the angular velocities of each spot through center-of-mass and geometric ellipse-fitting methods to derive a differential rotation law for round {\alpha}-sunspots. Results. Using over a decade of HMI data we derive a differential rotation law for {\alpha}-sunspots. When compared to previous measurements we find that {\alpha}-sunspots rotate 1.56% faster than the surrounding quiet-Sun, but 1.35% slower than the average sunspot population. This supports the hypothesis that the depth at which flux tubes are anchored influences sunspot motions across the solar disk. We extend this analysis to other stars by introducing a scaling law based on the rotation rates of these stars. This scaling law is implemented into the Stellar Activity Grid for Exoplanets (SAGE) code to illustrate how differential rotation alters the photometric modulation of active stars. Our findings emphasize the necessity of considering differential rotation effects when modeling stellar activity and exoplanet transit signatures
15. PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation
Authors: Sihan Zhao, Zixuan Wang, Tianyu Luan, Jia Jia, Wentao Zhu, Jiebo Luo, Junsong Yuan, Nan Xi β’
Published: 2025-08-11 β’
Source: arXiv
Human motion generation has found widespread applications in AR/VR, film, sports, and medical rehabilitation, offering a cost-effective alternative to traditional motion capture systems. However, evaluating the fidelity of such generated motions is a crucial, multifaceted task. Although previous approaches have attempted at motion fidelity evaluation using human perception or physical constraints, there remains an inherent gap between human-perceived fidelity and physical feasibility. Moreover, the subjective and coarse binary labeling of human perception further undermines the development of a robust data-driven metric. We address these issues by introducing a physical labeling method. This method evaluates motion fidelity by calculating the minimum modifications needed for a motion to align with physical laws. With this approach, we are able to produce fine-grained, continuous physical alignment annotations that serve as objective ground truth. With these annotations, we propose PP-Motion, a novel data-driven metric to evaluate both physical and perceptual fidelity of human motion. To effectively capture underlying physical priors, we employ Pearson's correlation loss for the training of our metric. Additionally, by incorporating a human-based perceptual fidelity loss, our metric can capture fidelity that simultaneously considers both human perception and physical alignment. Experimental results demonstrate that our metric, PP-Motion, not only aligns with physical laws but also aligns better with human perception of motion fidelity than previous work.
16. MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision
Authors: Zhonghao Yan, Muxi Diao, Yuxuan Yang, Jiayuan Xu, Kaizhou Zhang, Ruoyan Jing, Lele Yang, Yanxi Liu, Kongming Liang, Zhanyu Ma β’
Published: 2025-08-11 β’
Source: arXiv
Accurately grounding regions of interest (ROIs) is critical for diagnosis and treatment planning in medical imaging. While multimodal large language models (MLLMs) combine visual perception with natural language, current medical-grounding pipelines still rely on supervised fine-tuning with explicit spatial hints, making them ill-equipped to handle the implicit queries common in clinical practice. This work makes three core contributions. We first define Unified Medical Reasoning Grounding (UMRG), a novel vision-language task that demands clinical reasoning and pixel-level grounding. Second, we release U-MRG-14K, a dataset of 14K samples featuring pixel-level masks alongside implicit clinical queries and reasoning traces, spanning 10 modalities, 15 super-categories, and 108 specific categories. Finally, we introduce MedReasoner, a modular framework that distinctly separates reasoning from segmentation: an MLLM reasoner is optimized with reinforcement learning, while a frozen segmentation expert converts spatial prompts into masks, with alignment achieved through format and accuracy rewards. MedReasoner achieves state-of-the-art performance on U-MRG-14K and demonstrates strong generalization to unseen clinical queries, underscoring the significant promise of reinforcement learning for interpretable medical grounding.
17. CD-TVD: Contrastive Diffusion for 3D Super-Resolution with Scarce High-Resolution Time-Varying Data
Authors: Chongke Bi, Xin Gao, Jiangkang Deng, Guan β’
Published: 2025-08-11 β’
Source: arXiv
Large-scale scientific simulations require significant resources to generate high-resolution time-varying data (TVD). While super-resolution is an efficient post-processing strategy to reduce costs, existing methods rely on a large amount of HR training data, limiting their applicability to diverse simulation scenarios. To address this constraint, we proposed CD-TVD, a novel framework that combines contrastive learning and an improved diffusion-based super-resolution model to achieve accurate 3D super-resolution from limited time-step high-resolution data. During pre-training on historical simulation data, the contrastive encoder and diffusion superresolution modules learn degradation patterns and detailed features of high-resolution and low-resolution samples. In the training phase, the improved diffusion model with a local attention mechanism is fine-tuned using only one newly generated high-resolution timestep, leveraging the degradation knowledge learned by the encoder. This design minimizes the reliance on large-scale high-resolution datasets while maintaining the capability to recover fine-grained details. Experimental results on fluid and atmospheric simulation datasets confirm that CD-TVD delivers accurate and resource-efficient 3D super-resolution, marking a significant advancement in data augmentation for large-scale scientific simulations. The code is available at https://github.com/Xin-Gao-private/CD-TVD.
18. PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C
Authors: Pedro Orvalho, Marta Kwiatkowska β’
Published: 2025-08-11 β’
Source: arXiv
Python has become the dominant language for general-purpose programming, yet it lacks robust tools for formal verification. In contrast, programmers working in languages such as C benefit from mature model checkers, for example CBMC, which enable exhaustive symbolic reasoning and fault localisation. The inherent complexity of Python, coupled with the verbosity and low-level nature of existing transpilers (e.g., Cython), have historically limited the applicability of formal verification to Python programs. In this paper, we propose PyVeritas, a novel framework that leverages Large Language Models (LLMs) for high-level transpilation from Python to C, followed by bounded model checking and MaxSAT-based fault localisation in the generated C code. PyVeritas enables verification and bug localisation for Python code using existing model checking tools for C. Our empirical evaluation on two Python benchmarks demonstrates that LLM-based transpilation can achieve a high degree of accuracy, up to 80--90% for some LLMs, enabling effective development environment that supports assertion-based verification and interpretable fault diagnosis for small yet non-trivial Python programs.
19. Quantum Circuit Complexity of Matrix-Product Unitaries
Authors: Georgios Styliaris, Rahul Trivedi, J. Ignacio Cirac β’
Published: 2025-08-11 β’
Source: arXiv
Matrix-product unitaries (MPUs) are many-body unitary operators that, as a consequence of their tensor-network structure, preserve the entanglement area law in 1D systems. However, it is unknown how to implement an MPU as a quantum circuit since the individual tensors describing the MPU are not unitary. In this paper, we show that a large class of MPUs can be implemented with a polynomial-depth quantum circuit. For an $N$-site MPU built from a repeated bulk tensor with open boundary, we explicitly construct a quantum circuit of polynomial depth $T = O(N^{\alpha})$ realizing the MPU, where the constant $\alpha$ depends only on the bulk and boundary tensor and not the system size $N$. We show that this class includes nontrivial unitaries that generate long-range entanglement and, in particular, contains a large class of unitaries constructed from representations of $C^*$-weak Hopf algebras. Furthermore, we also adapt our construction to nonuniform translationally-varying MPUs and show that they can be implemented by a circuit of depth $O(N^{\beta} \, \mathrm{poly}\, D)$ where $\beta \le 1 + \log_2 \sqrt{D}/ s_{\min}$, with $D$ being the bond dimension and $s_{\min}$ is the smallest nonzero Schmidt value of the normalized Choi state corresponding to the MPU.
20. Federated Learning for Epileptic Seizure Prediction Across Heterogeneous EEG Datasets
Authors: Cem Ata Baykara, Saurav Raj Pandey, Ali Burak Γnal, Harlin Lee, Mete AkgΓΌn β’
Published: 2025-08-11 β’
Source: arXiv
Developing accurate and generalizable epileptic seizure prediction models from electroencephalography (EEG) data across multiple clinical sites is hindered by patient privacy regulations and significant data heterogeneity (non-IID characteristics). Federated Learning (FL) offers a privacy-preserving framework for collaborative training, but standard aggregation methods like Federated Averaging (FedAvg) can be biased by dominant datasets in heterogeneous settings. This paper investigates FL for seizure prediction using a single EEG channel across four diverse public datasets (Siena, CHB-MIT, Helsinki, NCH), representing distinct patient populations (adult, pediatric, neonate) and recording conditions. We implement privacy-preserving global normalization and propose a Random Subset Aggregation strategy, where each client trains on a fixed-size random subset of its data per round, ensuring equal contribution during aggregation. Our results show that locally trained models fail to generalize across sites, and standard weighted FedAvg yields highly skewed performance (e.g., 89.0% accuracy on CHB-MIT but only 50.8% on Helsinki and 50.6% on NCH). In contrast, Random Subset Aggregation significantly improves performance on under-represented clients (accuracy increases to 81.7% on Helsinki and 68.7% on NCH) and achieves a superior macro-average accuracy of 77.1% and pooled accuracy of 80.0% across all sites, demonstrating a more robust and fair global model. This work highlights the potential of balanced FL approaches for building effective and generalizable seizure prediction systems in realistic, heterogeneous multi-hospital environments while respecting data privacy.
21. From Natural Language to Solver-Ready Power System Optimization: An LLM-Assisted, Validation-in-the-Loop Framework
Authors: Yunkai Hu, Tianqiao Zhao, Meng Yue β’
Published: 2025-08-11 β’
Source: arXiv
This paper introduces a novel Large Language Models (LLMs)-assisted agent that automatically converts natural-language descriptions of power system optimization scenarios into compact, solver-ready formulations and generates corresponding solutions. In contrast to approaches that rely solely on LLM to produce solutions directly, the proposed method focuses on discovering a mathematically compatible formulation that can be efficiently solved by off-the-shelf optimization solvers. Directly using LLMs to produce solutions often leads to infeasible or suboptimal results, as these models lack the numerical precision and constraint-handling capabilities of established optimization solvers. The pipeline integrates a domain-aware prompt and schema with an LLM, enforces feasibility through systematic validation and iterative repair, and returns both solver-ready models and user-facing results. Using the unit commitment problem as a representative case study, the agent produces optimal or near-optimal schedules along with the associated objective costs. Results demonstrate that coupling the solver with task-specific validation significantly enhances solution reliability. This work shows that combining AI with established optimization frameworks bridges high-level problem descriptions and executable mathematical models, enabling more efficient decision-making in energy systems
22. Capsizing-Guided Trajectory Optimization for Autonomous Navigation with Rough Terrain
Authors: Wei Zhang, Yinchuan Wang, Wangtao Lu, Pengyu Zhang, Xiang Zhang, Yue Wang, Chaoqun Wang β’
Published: 2025-08-11 β’
Source: arXiv
It is a challenging task for ground robots to autonomously navigate in harsh environments due to the presence of non-trivial obstacles and uneven terrain. This requires trajectory planning that balances safety and efficiency. The primary challenge is to generate a feasible trajectory that prevents robot from tip-over while ensuring effective navigation. In this paper, we propose a capsizing-aware trajectory planner (CAP) to achieve trajectory planning on the uneven terrain. The tip-over stability of the robot on rough terrain is analyzed. Based on the tip-over stability, we define the traversable orientation, which indicates the safe range of robot orientations. This orientation is then incorporated into a capsizing-safety constraint for trajectory optimization. We employ a graph-based solver to compute a robust and feasible trajectory while adhering to the capsizing-safety constraint. Extensive simulation and real-world experiments validate the effectiveness and robustness of the proposed method. The results demonstrate that CAP outperforms existing state-of-the-art approaches, providing enhanced navigation performance on uneven terrains.
23. $100,000 or the Robot Gets it! Tech Workers' Resistance Guide: Tech Worker Actions, History, Risks, Impacts, and the Case for a Radical Flank
Authors: Mohamed Abdalla β’
Published: 2025-08-11 β’
Source: arXiv
Over the past decade, Big Tech has faced increasing levels of worker activism. While worker actions have resulted in positive outcomes (e.g., cancellation of Google's Project Dragonfly), such successes have become increasingly infrequent. This is, in part, because corporations have adjusted their strategies to dealing with increased worker activism (e.g., increased retaliation against workers, and contracts clauses that prevent cancellation due to worker pressure). This change in company strategy prompts urgent questions about updating worker strategies for influencing corporate behavior in an industry with vast societal impact. Current discourse on tech worker activism often lacks empirical grounding regarding its scope, history, and strategic calculus. Our work seeks to bridge this gap by firstly conducting a systematic analysis of worker actions at Google and Microsoft reported in U.S. newspapers to delineate their characteristics. We then situate these actions within the long history of labour movements and demonstrate that, despite perceptions of radicalism, contemporary tech activism is comparatively moderate. Finally, we engage directly with current and former tech activists to provide a novel catalogue of potential worker actions, evaluating their perceived risks, impacts, and effectiveness (concurrently publishing "Tech Workers' Guide to Resistance"). Our findings highlight considerable variation in strategic thinking among activists themselves. We conclude by arguing that the establishment of a radical flank could increase the effectiveness of current movements. "Tech Workers' Guide to Resistance" can be found at https://www.cs.toronto.edu/~msa/TechWorkersResistanceGuide.pdf or https://doi.org/10.5281/zenodo.16779082
24. Heterogeneity in Entity Matching: A Survey and Experimental Analysis
Authors: Mohammad Hossein Moslemi, Amir Mousavi, Behshid Behkamal, Mostafa Milani β’
Published: 2025-08-11 β’
Source: arXiv
Entity matching (EM) is a fundamental task in data integration and analytics, essential for identifying records that refer to the same real-world entity across diverse sources. In practice, datasets often differ widely in structure, format, schema, and semantics, creating substantial challenges for EM. We refer to this setting as Heterogeneous EM (HEM). This survey offers a unified perspective on HEM by introducing a taxonomy, grounded in prior work, that distinguishes two primary categories -- representation and semantic heterogeneity -- and their subtypes. The taxonomy provides a systematic lens for understanding how variations in data form and meaning shape the complexity of matching tasks. We then connect this framework to the FAIR principles -- Findability, Accessibility, Interoperability, and Reusability -- demonstrating how they both reveal the challenges of HEM and suggest strategies for mitigating them. Building on this foundation, we critically review recent EM methods, examining their ability to address different heterogeneity types, and conduct targeted experiments on state-of-the-art models to evaluate their robustness and adaptability under semantic heterogeneity. Our analysis uncovers persistent limitations in current approaches and points to promising directions for future research, including multimodal matching, human-in-the-loop workflows, deeper integration with large language models and knowledge graphs, and fairness-aware evaluation in heterogeneous settings.
25. AdaptFlow: Adaptive Workflow Optimization via Meta-Learning
Authors: Runchuan Zhu, Bowen Jiang, Lingrui Mei, Fangkai Yang, Lu Wang, Haoxiang Gao, Fengshuo Bai, Pu Zhao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang β’
Published: 2025-08-11 β’
Source: arXiv
Recent advances in large language models (LLMs) have sparked growing interest in agentic workflows, which are structured sequences of LLM invocations intended to solve complex tasks. However, existing approaches often rely on static templates or manually designed workflows, which limit adaptability to diverse tasks and hinder scalability. We propose AdaptFlow, a natural language-based meta-learning framework inspired by model-agnostic meta-learning (MAML). AdaptFlow learns a generalizable workflow initialization that enables rapid subtask-level adaptation. It employs a bi-level optimization scheme: the inner loop refines the workflow for a specific subtask using LLM-generated feedback, while the outer loop updates the shared initialization to perform well across tasks. This setup allows AdaptFlow to generalize effectively to unseen tasks by adapting the initialized workflow through language-guided modifications. Evaluated across question answering, code generation, and mathematical reasoning benchmarks, AdaptFlow consistently outperforms both manually crafted and automatically searched baselines, achieving state-of-the-art results with strong generalization across tasks and models. The source code and data are available at https://github.com/microsoft/DKI_LLM/tree/AdaptFlow/AdaptFlow.
26. Mitigating Biases in Surgical Operating Rooms with Geometry
Authors: Tony Danjun Wang, Tobias Czempiel, Nassir Navab, Lennart Bastian β’
Published: 2025-08-11 β’
Source: arXiv
Deep neural networks are prone to learning spurious correlations, exploiting dataset-specific artifacts rather than meaningful features for prediction. In surgical operating rooms (OR), these manifest through the standardization of smocks and gowns that obscure robust identifying landmarks, introducing model bias for tasks related to modeling OR personnel. Through gradient-based saliency analysis on two public OR datasets, we reveal that CNN models succumb to such shortcuts, fixating on incidental visual cues such as footwear beneath surgical gowns, distinctive eyewear, or other role-specific identifiers. Avoiding such biases is essential for the next generation of intelligent assistance systems in the OR, which should accurately recognize personalized workflow traits, such as surgical skill level or coordination with other staff members. We address this problem by encoding personnel as 3D point cloud sequences, disentangling identity-relevant shape and motion patterns from appearance-based confounders. Our experiments demonstrate that while RGB and geometric methods achieve comparable performance on datasets with apparent simulation artifacts, RGB models suffer a 12% accuracy drop in realistic clinical settings with decreased visual diversity due to standardizations. This performance gap confirms that geometric representations capture more meaningful biometric features, providing an avenue to developing robust methods of modeling humans in the OR.
27. WideSearch: Benchmarking Agentic Broad Info-Seeking
Authors: Ryan Wong, Jiawei Wang, Junjie Zhao, Li Chen, Yan Gao, Long Zhang, Xuan Zhou, Zuo Wang, Kai Xiang, Ge Zhang, Wenhao Huang, Yang Wang, Ke Wang β’
Published: 2025-08-11 β’
Source: arXiv
From professional research to everyday planning, many tasks are bottlenecked by wide-scale information seeking, which is more repetitive than cognitively complex. With the rapid development of Large Language Models (LLMs), automated search agents powered by LLMs offer a promising solution to liberate humans from this tedious work. However, the capability of these agents to perform such "wide-context" collection reliably and completely remains largely unevaluated due to a lack of suitable benchmarks. To bridge this gap, we introduce WideSearch, a new benchmark engineered to evaluate agent reliability on these large-scale collection tasks. The benchmark features 200 manually curated questions (100 in English, 100 in Chinese) from over 15 diverse domains, grounded in real user queries. Each task requires agents to collect large-scale atomic information, which could be verified one by one objectively, and arrange it into a well-organized output. A rigorous five-stage quality control pipeline ensures the difficulty, completeness, and verifiability of the dataset. We benchmark over 10 state-of-the-art agentic search systems, including single-agent, multi-agent frameworks, and end-to-end commercial systems. Most systems achieve overall success rates near 0\%, with the best performer reaching just 5\%. However, given sufficient time, cross-validation by multiple human testers can achieve a near 100\% success rate. These results demonstrate that present search agents have critical deficiencies in large-scale information seeking, underscoring urgent areas for future research and development in agentic search. Our dataset, evaluation pipeline, and benchmark results have been publicly released at https://widesearch-seed.github.io/
28. The Escalator Problem: Identifying Implicit Motion Blindness in AI for Accessibility
Authors: Xiantao Zhang β’
Published: 2025-08-11 β’
Source: arXiv
Multimodal Large Language Models (MLLMs) hold immense promise as assistive technologies for the blind and visually impaired (BVI) community. However, we identify a critical failure mode that undermines their trustworthiness in real-world applications. We introduce the Escalator Problem -- the inability of state-of-the-art models to perceive an escalator's direction of travel -- as a canonical example of a deeper limitation we term Implicit Motion Blindness. This blindness stems from the dominant frame-sampling paradigm in video understanding, which, by treating videos as discrete sequences of static images, fundamentally struggles to perceive continuous, low-signal motion. As a position paper, our contribution is not a new model but rather to: (I) formally articulate this blind spot, (II) analyze its implications for user trust, and (III) issue a call to action. We advocate for a paradigm shift from purely semantic recognition towards robust physical perception and urge the development of new, human-centered benchmarks that prioritize safety, reliability, and the genuine needs of users in dynamic environments.
29. Exploring Procedural Data Generation for Automatic Acoustic Guitar Fingerpicking Transcription
Authors: Sebastian Murgul, Michael Heizmann β’
Published: 2025-08-11 β’
Source: arXiv
Automatic transcription of acoustic guitar fingerpicking performances remains a challenging task due to the scarcity of labeled training data and legal constraints connected with musical recordings. This work investigates a procedural data generation pipeline as an alternative to real audio recordings for training transcription models. Our approach synthesizes training data through four stages: knowledge-based fingerpicking tablature composition, MIDI performance rendering, physical modeling using an extended Karplus-Strong algorithm, and audio augmentation including reverb and distortion. We train and evaluate a CRNN-based note-tracking model on both real and synthetic datasets, demonstrating that procedural data can be used to achieve reasonable note-tracking results. Finetuning with a small amount of real data further enhances transcription accuracy, improving over models trained exclusively on real recordings. These results highlight the potential of procedurally generated audio for data-scarce music information retrieval tasks.
30. SHIELDA: Structured Handling of Exceptions in LLM-Driven Agentic Workflows
Authors: Jingwen Zhou, Jieshan Chen, Qinghua Lu, Dehai Zhao, Liming Zhu β’
Published: 2025-08-11 β’
Source: arXiv
Large Language Model (LLM) agentic systems are software systems powered by LLMs that autonomously reason, plan, and execute multi-step workflows to achieve human goals, rather than merely executing predefined steps. During execution, these workflows frequently encounter exceptions. Existing exception handling solutions often treat exceptions superficially, failing to trace execution-phase exceptions to their reasoning-phase root causes. Furthermore, their recovery logic is brittle, lacking structured escalation pathways when initial attempts fail. To tackle these challenges, we first present a comprehensive taxonomy of 36 exception types across 12 agent artifacts. Building on this, we propose SHIELDA (Structured Handling of Exceptions in LLM-Driven Agentic Workflows), a modular runtime exception handling framework for LLM agentic workflows. SHIELDA uses an exception classifier to select a predefined exception handling pattern from a handling pattern registry. These patterns are then executed via a structured handling executor, comprising local handling, flow control, and state recovery, to enable phase-aware recovery by linking exceptions to their root causes and facilitating composable strategies. We validate SHIELDA's effectiveness through a case study on the AutoPR agent, demonstrating effective, cross-phase recovery from a reasoning-induced exception.
31. RIS-Assisted NOMA with Partial CSI and Mutual Coupling: A Machine Learning Approach
Authors: Bile Peng, Karl-Ludwig Besser, Shanpu Shen, Finn Siegismund-Poschmann, Ramprasad Raghunath, Daniel M. Mittleman, Vahid Jamali, Eduard A. Jorswieck β’
Published: 2025-08-11 β’
Source: arXiv
Non-orthogonal multiple access (NOMA) is a promising multiple access technique. Its performance depends strongly on the wireless channel property, which can be enhanced by reconfigurable intelligent surfaces (RISs). In this paper, we jointly optimize base station (BS) precoding and RIS configuration with unsupervised machine learning (ML), which looks for the optimal solution autonomously. In particular, we propose a dedicated neural network (NN) architecture RISnet inspired by domain knowledge in communication. Compared to state-of-the-art, the proposed approach combines analytical optimal BS precoding and ML-enabled RIS, has a high scalability to control more than 1000 RIS elements, has a low requirement for channel state information (CSI) in input, and addresses the mutual coupling between RIS elements. Beyond the considered problem, this work is an early contribution to domain knowledge enabled ML, which exploit the domain expertise of communication systems to design better approaches than general ML methods.
32. Diffusing the Blind Spot: Uterine MRI Synthesis with Diffusion Models
Authors: Johanna P. MΓΌller, Anika Knupfer, Pedro BlΓΆss, Edoardo Berardi Vittur, Bernhard Kainz, Jana Hutter β’
Published: 2025-08-11 β’
Source: arXiv
Despite significant progress in generative modelling, existing diffusion models often struggle to produce anatomically precise female pelvic images, limiting their application in gynaecological imaging, where data scarcity and patient privacy concerns are critical. To overcome these barriers, we introduce a novel diffusion-based framework for uterine MRI synthesis, integrating both unconditional and conditioned Denoising Diffusion Probabilistic Models (DDPMs) and Latent Diffusion Models (LDMs) in 2D and 3D. Our approach generates anatomically coherent, high fidelity synthetic images that closely mimic real scans and provide valuable resources for training robust diagnostic models. We evaluate generative quality using advanced perceptual and distributional metrics, benchmarking against standard reconstruction methods, and demonstrate substantial gains in diagnostic accuracy on a key classification task. A blinded expert evaluation further validates the clinical realism of our synthetic images. We release our models with privacy safeguards and a comprehensive synthetic uterine MRI dataset to support reproducible research and advance equitable AI in gynaecology.
33. Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation
Authors: Bowen Xue, Qixin Yan, Wenjing Wang, Hao Liu, Chen Li β’
Published: 2025-08-11 β’
Source: arXiv
Generating high-fidelity human videos that match user-specified identities is important yet challenging in the field of generative AI. Existing methods often rely on an excessive number of training parameters and lack compatibility with other AIGC tools. In this paper, we propose Stand-In, a lightweight and plug-and-play framework for identity preservation in video generation. Specifically, we introduce a conditional image branch into the pre-trained video generation model. Identity control is achieved through restricted self-attentions with conditional position mapping, and can be learned quickly with only 2000 pairs. Despite incorporating and training just $\sim$1% additional parameters, our framework achieves excellent results in video quality and identity preservation, outperforming other full-parameter training methods. Moreover, our framework can be seamlessly integrated for other tasks, such as subject-driven video generation, pose-referenced video generation, stylization, and face swapping.
34. Towards Human-AI Collaboration System for the Detection of Invasive Ductal Carcinoma in Histopathology Images
Authors: Shuo Han, Ahmed Karam Eldaly, Solomon Sunday Oyelere β’
Published: 2025-08-11 β’
Source: arXiv
Invasive ductal carcinoma (IDC) is the most prevalent form of breast cancer, and early, accurate diagnosis is critical to improving patient survival rates by guiding treatment decisions. Combining medical expertise with artificial intelligence (AI) holds significant promise for enhancing the precision and efficiency of IDC detection. In this work, we propose a human-in-the-loop (HITL) deep learning system designed to detect IDC in histopathology images. The system begins with an initial diagnosis provided by a high-performance EfficientNetV2S model, offering feedback from AI to the human expert. Medical professionals then review the AI-generated results, correct any misclassified images, and integrate the revised labels into the training dataset, forming a feedback loop from the human back to the AI. This iterative process refines the model's performance over time. The EfficientNetV2S model itself achieves state-of-the-art performance compared to existing methods in the literature, with an overall accuracy of 93.65\%. Incorporating the human-in-the-loop system further improves the model's accuracy using four experimental groups with misclassified images. These results demonstrate the potential of this collaborative approach to enhance AI performance in diagnostic systems. This work contributes to advancing automated, efficient, and highly accurate methods for IDC detection through human-AI collaboration, offering a promising direction for future AI-assisted medical diagnostics.
35. SwarmVLM: VLM-Guided Impedance Control for Autonomous Navigation of Heterogeneous Robots in Dynamic Warehousing
Authors: Malaika Zafar, Roohan Ahmed Khan, Faryal Batool, Yasheerah Yaqoot, Ziang Guo, Mikhail Litvinov, Aleksey Fedoseev, Dzmitry Tsetserukou β’
Published: 2025-08-11 β’
Source: arXiv
With the growing demand for efficient logistics, unmanned aerial vehicles (UAVs) are increasingly being paired with automated guided vehicles (AGVs). While UAVs offer the ability to navigate through dense environments and varying altitudes, they are limited by battery life, payload capacity, and flight duration, necessitating coordinated ground support. Focusing on heterogeneous navigation, SwarmVLM addresses these limitations by enabling semantic collaboration between UAVs and ground robots through impedance control. The system leverages the Vision Language Model (VLM) and the Retrieval-Augmented Generation (RAG) to adjust impedance control parameters in response to environmental changes. In this framework, the UAV acts as a leader using Artificial Potential Field (APF) planning for real-time navigation, while the ground robot follows via virtual impedance links with adaptive link topology to avoid collisions with short obstacles. The system demonstrated a 92% success rate across 12 real-world trials. Under optimal lighting conditions, the VLM-RAG framework achieved 8% accuracy in object detection and selection of impedance parameters. The mobile robot prioritized short obstacle avoidance, occasionally resulting in a lateral deviation of up to 50 cm from the UAV path, which showcases safe navigation in a cluttered setting.