1. Towards Generalizable Safety in Crowd Navigation via Conformal Uncertainty Handling
Authors: Jianpeng Yao, Xiaopan Zhang, Yu Xia, Zejin Wang, Amit K. Roy-Chowdhury, Jiachen Li β’
Published: 2025-08-07 β’
Source: arXiv
Mobile robots navigating in crowds trained using reinforcement learning are known to suffer performance degradation when faced with out-of-distribution scenarios. We propose that by properly accounting for the uncertainties of pedestrians, a robot can learn safe navigation policies that are robust to distribution shifts. Our method augments agent observations with prediction uncertainty estimates generated by adaptive conformal inference, and it uses these estimates to guide the agent's behavior through constrained reinforcement learning. The system helps regulate the agent's actions and enables it to adapt to distribution shifts. In the in-distribution setting, our approach achieves a 96.93% success rate, which is over 8.80% higher than the previous state-of-the-art baselines with over 3.72 times fewer collisions and 2.43 times fewer intrusions into ground-truth human future trajectories. In three out-of-distribution scenarios, our method shows much stronger robustness when facing distribution shifts in velocity variations, policy changes, and transitions from individual to group dynamics. We deploy our method on a real robot, and experiments show that the robot makes safe and robust decisions when interacting with both sparse and dense crowds. Our code and videos are available on https://gen-safe-nav.github.io/.
2. Simulating Human-Like Learning Dynamics with LLM-Empowered Agents
Authors: Yu Yuan, Lili Zhao, Wei Chen, Guangting Zheng, Kai Zhang, Mengdi Zhang, Qi Liu β’
Published: 2025-08-07 β’
Source: arXiv
Capturing human learning behavior based on deep learning methods has become a major research focus in both psychology and intelligent systems. Recent approaches rely on controlled experiments or rule-based models to explore cognitive processes. However, they struggle to capture learning dynamics, track progress over time, or provide explainability. To address these challenges, we introduce LearnerAgent, a novel multi-agent framework based on Large Language Models (LLMs) to simulate a realistic teaching environment. To explore human-like learning dynamics, we construct learners with psychologically grounded profiles-such as Deep, Surface, and Lazy-as well as a persona-free General Learner to inspect the base LLM's default behavior. Through weekly knowledge acquisition, monthly strategic choices, periodic tests, and peer interaction, we can track the dynamic learning progress of individual learners over a full-year journey. Our findings are fourfold: 1) Longitudinal analysis reveals that only Deep Learner achieves sustained cognitive growth. Our specially designed "trap questions" effectively diagnose Surface Learner's shallow knowledge. 2) The behavioral and cognitive patterns of distinct learners align closely with their psychological profiles. 3) Learners' self-concept scores evolve realistically, with the General Learner developing surprisingly high self-efficacy despite its cognitive limitations. 4) Critically, the default profile of base LLM is a "diligent but brittle Surface Learner"-an agent that mimics the behaviors of a good student but lacks true, generalizable understanding. Extensive simulation experiments demonstrate that LearnerAgent aligns well with real scenarios, yielding more insightful findings about LLMs' behavior.
3. The Missing Reward: Active Inference in the Era of Experience
Authors: Bo Wen β’
Published: 2025-08-07 β’
Source: arXiv
This paper argues that Active Inference (AIF) provides a crucial foundation for developing autonomous AI agents capable of learning from experience without continuous human reward engineering. As AI systems begin to exhaust high-quality training data and rely on increasingly large human workforces for reward design, the current paradigm faces significant scalability challenges that could impede progress toward genuinely autonomous intelligence. The proposal for an ``Era of Experience,'' where agents learn from self-generated data, is a promising step forward. However, this vision still depends on extensive human engineering of reward functions, effectively shifting the bottleneck from data curation to reward curation. This highlights what we identify as the \textbf{grounded-agency gap}: the inability of contemporary AI systems to autonomously formulate, adapt, and pursue objectives in response to changing circumstances. We propose that AIF can bridge this gap by replacing external reward signals with an intrinsic drive to minimize free energy, allowing agents to naturally balance exploration and exploitation through a unified Bayesian objective. By integrating Large Language Models as generative world models with AIF's principled decision-making framework, we can create agents that learn efficiently from experience while remaining aligned with human values. This synthesis offers a compelling path toward AI systems that can develop autonomously while adhering to both computational and physical constraints.
4. Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
Authors: Linghao Zhu, Yiran Guan, Dingkang Liang, Jianzhong Ju, Zhenbo Luo, Bin Qin, Jian Luan, Yuliang Liu, Xiang Bai β’
Published: 2025-08-07 β’
Source: arXiv
Reinforcement learning (RL) has emerged as an effective post-training paradigm for enhancing the reasoning capabilities of multimodal large language model (MLLM). However, current RL pipelines often suffer from training inefficiencies caused by two underexplored issues: Advantage Collapsing, where most advantages in a batch concentrate near zero, and Rollout Silencing, where the proportion of rollouts contributing non-zero gradients diminishes over time. These issues lead to suboptimal gradient updates and hinder long-term learning efficiency. To address these issues, we propose Shuffle-R1, a simple yet principled framework that improves RL fine-tuning efficiency by dynamically restructuring trajectory sampling and batch composition. It introduces (1) Pairwise Trajectory Sampling, which selects high-contrast trajectories with large advantages to improve gradient signal quality, and (2) Advantage-based Trajectory Shuffle, which increases exposure of valuable rollouts through informed batch reshuffling. Experiments across multiple reasoning benchmarks show that our framework consistently outperforms strong RL baselines with minimal overhead. These results highlight the importance of data-centric adaptations for more efficient RL training in MLLM.
5. The discrete periodic Pitman transform: invariances, braid relations, and Burke properties
Authors: Eva R. Engel, Benjamin Jasper Kra-Caskey, Oleksandr Lazorenko, Caio Hermano Maia de Oliveira, Evan Sorensen, Ivan Wong, Ryan Xu, Xinyi Zhang β’
Published: 2025-08-07 β’
Source: arXiv
We develop the theory of the discrete periodic Pitman transform, first introduced by Corwin, Gu, and the fifth author. We prove that, for polymers in a periodic environment, single-path and multi-path partition functions are preserved under the action of this transform on the weights in the polymer model. As a corollary, we prove that the discrete periodic Pitman transform satisfies the same braid relations that are satisfied for the full-line Pitman transform, shown by Biane, Bougerol, and O'Connell. Combined with a new inhomogeneous Burke property for the periodic Pitman transform, we prove a multi-path invariance result for the periodic inverse-gamma polymer under permutations of the column parameters. In the limit to the full-line case, we obtain a multi-path extension of a recent invariance result of Bates, Emrah, Martin, Sepp\"al\"ainen, and the fifth author, in both positive and zero-temperature. Additionally, we give a combinatorial description of the distribution of the $2$-component jointly invariant measures for a discrete-time Markov chain.
6. Non-omniscient backdoor injection with a single poison sample: Proving the one-poison hypothesis for linear regression and linear classification
Authors: Thorsten Peinemann, Paula Arnold, Sebastian Berndt, Thomas Eisenbarth, Esfandiar Mohammadi β’
Published: 2025-08-07 β’
Source: arXiv
Backdoor injection attacks are a threat to machine learning models that are trained on large data collected from untrusted sources; these attacks enable attackers to inject malicious behavior into the model that can be triggered by specially crafted inputs. Prior work has established bounds on the success of backdoor attacks and their impact on the benign learning task, however, an open question is what amount of poison data is needed for a successful backdoor attack. Typical attacks either use few samples, but need much information about the data points or need to poison many data points. In this paper, we formulate the one-poison hypothesis: An adversary with one poison sample and limited background knowledge can inject a backdoor with zero backdooring-error and without significantly impacting the benign learning task performance. Moreover, we prove the one-poison hypothesis for linear regression and linear classification. For adversaries that utilize a direction that is unused by the benign data distribution for the poison sample, we show that the resulting model is functionally equivalent to a model where the poison was excluded from training. We build on prior work on statistical backdoor learning to show that in all other cases, the impact on the benign learning task is still limited. We also validate our theoretical results experimentally with realistic benchmark data sets.
7. WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
Authors: Shaobin Zhuang, Yiwei Guo, Canmiao Fu, Zhipeng Huang, Zeyue Tian, Ying Zhang, Chen Li, Yali Wang β’
Published: 2025-08-07 β’
Source: arXiv
Visual tokenizer is a critical component for vision generation. However, the existing tokenizers often face unsatisfactory trade-off between compression ratios and reconstruction fidelity. To fill this gap, we introduce a powerful and concise WeTok tokenizer, which surpasses the previous leading tokenizers via two core innovations. (1) Group-wise lookup-free Quantization (GQ). We partition the latent features into groups, and perform lookup-free quantization for each group. As a result, GQ can efficiently overcome memory and computation limitations of prior tokenizers, while achieving a reconstruction breakthrough with more scalable codebooks. (2) Generative Decoding (GD). Different from prior tokenizers, we introduce a generative decoder with a prior of extra noise variable. In this case, GD can probabilistically model the distribution of visual data conditioned on discrete tokens, allowing WeTok to reconstruct visual details, especially at high compression ratios. Extensive experiments on mainstream benchmarks show superior performance of our WeTok. On the ImageNet 50k validation set, WeTok achieves a record-low zero-shot rFID (WeTok: 0.12 vs. FLUX-VAE: 0.18 vs. SD-VAE 3.5: 0.19). Furthermore, our highest compression model achieves a zero-shot rFID of 3.49 with a compression ratio of 768, outperforming Cosmos (384) 4.57 which has only 50% compression rate of ours. Code and models are available: https://github.com/zhuangshaobin/WeTok.
8. MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy
Authors: Shaoxiong Zhan, Yanlin Lai, Ziyu Lu, Dahua Lin, Ziqing Yang, Fei Tang β’
Published: 2025-08-07 β’
Source: arXiv
Large language models have achieved substantial progress in mathematical reasoning, yet their advancement is limited by the scarcity of high-quality, high-difficulty training data. Existing synthesis methods largely rely on transforming human-written templates, limiting both diversity and scalability. We propose MathSmith, a novel framework for synthesizing challenging mathematical problems to enhance LLM reasoning. Rather than modifying existing problems, MathSmith constructs new ones from scratch by randomly sampling concept-explanation pairs from PlanetMath, ensuring data independence and avoiding contamination. To increase difficulty, we design nine predefined strategies as soft constraints during rationales. We further adopts reinforcement learning to jointly optimize structural validity, reasoning complexity, and answer consistency. The length of the reasoning trace generated under autoregressive prompting is used to reflect cognitive complexity, encouraging the creation of more demanding problems aligned with long-chain-of-thought reasoning. Experiments across five benchmarks, categorized as easy & medium (GSM8K, MATH-500) and hard (AIME2024, AIME2025, OlympiadBench), show that MathSmith consistently outperforms existing baselines under both short and long CoT settings. Additionally, a weakness-focused variant generation module enables targeted improvement on specific concepts. Overall, MathSmith exhibits strong scalability, generalization, and transferability, highlighting the promise of high-difficulty synthetic data in advancing LLM reasoning capabilities.
9. Quench dynamics of entanglement entropy under projective charge measurements: the free fermion case
Authors: Riccardo Travaglino, Colin Rylands, Pasquale Calabrese β’
Published: 2025-08-07 β’
Source: arXiv
We consider the effect of projective measurements on the quench dynamics of the bipartite entanglement entropy in one dimensional free fermionic systems. In our protocol, we consider projective measurements of a $U(1)$ conserved charge, the particle number, on some large subsystem, and study the entanglement entropies between the same subsystem and its complement. We compare the dynamics emanating from two classes of initial states, one which is an eigenstate of the charge and another which is not. Moreover, we consider the effects of a single measurement as well as multiple which are periodically performed. Using the quasiparticle picture, we obtain analytic expressions for the behaviour of the entanglement which admit a transparent physical interpretation. In general, we find that measurements introduce two distinct types of corrections to the entanglement, which can be interpreted separately as classical and quantum contributions. The classical contribution is independent of the measurement outcome and scales logarithmically with variance of the charge distribution. In contrast, the quantum contribution depends on the specific measurement outcome and can be significant for individual realizations; however, it becomes negligible when averaged over all possible outcomes. Our expressions reduce to previously known results for symmetry resolved entanglement and full counting statistics in some relevant limits, and are confirmed by an exact calculation performed on the N\'eel initial state.
10. Enhancing PyKEEN with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models
Authors: Claudia d'Amato, Ivan Diliso, Nicola Fanizzi, Zafar Saeed β’
Published: 2025-08-07 β’
Source: arXiv
Embedding methods have become popular due to their scalability on link prediction and/or triple classification tasks on Knowledge Graphs. Embedding models are trained relying on both positive and negative samples of triples. However, in the absence of negative assertions, these must be usually artificially generated using various negative sampling strategies, ranging from random corruption to more sophisticated techniques which have an impact on the overall performance. Most of the popular libraries for knowledge graph embedding, support only basic such strategies and lack advanced solutions. To address this gap, we deliver an extension for the popular KGE framework PyKEEN that integrates a suite of several advanced negative samplers (including both static and dynamic corruption strategies), within a consistent modular architecture, to generate meaningful negative samples, while remaining compatible with existing PyKEEN -based workflows and pipelines. The developed extension not only enhancesPyKEEN itself but also allows for easier and comprehensive development of embedding methods and/or for their customization. As a proof of concept, we present a comprehensive empirical study of the developed extensions and their impact on the performance (link prediction tasks) of different embedding methods, which also provides useful insights for the design of more effective strategies
11. DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition
Authors: Haijing Liu, Tao Pu, Hefeng Wu, Keze Wang, Liang Lin β’
Published: 2025-08-07 β’
Source: arXiv
Open-Vocabulary Multi-Label Recognition (OV-MLR) aims to identify multiple seen and unseen object categories within an image, requiring both precise intra-class localization to pinpoint objects and effective inter-class reasoning to model complex category dependencies. While Vision-Language Pre-training (VLP) models offer a strong open-vocabulary foundation, they often struggle with fine-grained localization under weak supervision and typically fail to explicitly leverage structured relational knowledge beyond basic semantics, limiting performance especially for unseen classes. To overcome these limitations, we propose the Dual Adaptive Refinement Transfer (DART) framework. DART enhances a frozen VLP backbone via two synergistic adaptive modules. For intra-class refinement, an Adaptive Refinement Module (ARM) refines patch features adaptively, coupled with a novel Weakly Supervised Patch Selecting (WPS) loss that enables discriminative localization using only image-level labels. Concurrently, for inter-class transfer, an Adaptive Transfer Module (ATM) leverages a Class Relationship Graph (CRG), constructed using structured knowledge mined from a Large Language Model (LLM), and employs graph attention network to adaptively transfer relational information between class representations. DART is the first framework, to our knowledge, to explicitly integrate external LLM-derived relational knowledge for adaptive inter-class transfer while simultaneously performing adaptive intra-class refinement under weak supervision for OV-MLR. Extensive experiments on challenging benchmarks demonstrate that our DART achieves new state-of-the-art performance, validating its effectiveness.
12. Research on integrated intelligent energy management system based on big data analysis and machine learning
Authors: Jinzhou Xu, Yadan Zhang, Paola Tapia β’
Published: 2025-08-07 β’
Source: arXiv
The application of big data is one of the significant features of integrated smart energy. Applying it to the file management of integrated smart energy projects is of great significance for improving the efficiency of project management and control. This article first discussed the benefits and challenges of implementing big data analysis in document management and control of integrated smart energy projects. In addition, an implementation framework for big data analysis in integrated smart energy project document management was developed, and a method for optimizing the efficiency of integrated smart energy project document management through machine learning was proposed. Using various types of data and information generated during the project document management process, the efficiency of the entire process project document control through three different machine learning methods was optimized. The result of fitting a penalty linear regression model shows that when there is enough data as a training set, the accuracy of the model achieved can reach over 95\%. By using big data analysis and machine learning to analyze the efficiency of comprehensive smart energy project document management, it is possible to track the entire process of comprehensive smart energy project documents and optimize business processes, thereby strengthening project construction control and improving project construction efficiency.
13. The Anisotropic Interface Continuum Solvation Model and the Finite-Element Anisotropic Poisson Solver
Authors: Ziwei Chai, Sandra Luber β’
Published: 2025-08-07 β’
Source: arXiv
We propose an anisotropic interfacial continuum solvation (AICS) model to simulate the distinct in-plane and out-of-plane dielectric constants of liquids near solid-liquid interfaces and their spatial variations along the surface normal direction. In low-electron-density regions, each dielectric function in the diagonal components of a dielectric tensor varies monotonically with distance from the solid surface along the surface normal; in high-electron-density regions near the surface, each dielectric function adopts the electron-density-based formulation proposed by Andreussi et al. (J. Chem. Phys. 136, 064102 (2012)) The resulting dielectric tensor is continuously differentiable with respect to both electron density and spatial coordinates. We derived analytical expressions for electrostatic contributions to the KS potential and forces, and implemented AICS, including these analytical derivatives, into CP2K. To solve the anisotropic Poisson equations, we developed a parallel finite-element anisotropic Poisson solver (FEAPS) based on the FEniCSx platform and its interface with CP2K. Analytical forces were validated against finite-difference calculations, while electrostatic potentials computed under vacuum and isotropic solvent conditions using AICS and FEAPS were benchmarked against standard vacuum DFT and SCCS results, respectively. In the anisotropic solvent environment characterized by the enhanced in-plane and reduced out-of-plane dielectric functions near the Ag(111) surface, we calculated the resulting work functions and electrostatic potentials, and optimized the adsorption geometry for OH. Compared to the isotropic case, we observed more pronounced work function shifts and spatially modulated electrostatic profiles across different charge states. Our results also showed that OH tilted more towards the plane parallel to the surface under the anisotropic dielectric conditions.
14. Negative differential conductance in triangular molecular assemblies
Authors: Chao Li, Vladislav PokornΓ½, Prokop Hapala, Martin Ε½onda, Ping Zhou, Silvio Decurtins, Shi-Xia Liu, Fengqi Song, RΓ©my Pawlak, Ernst Meyer β’
Published: 2025-08-07 β’
Source: arXiv
We report the creation and characterization of a molecular-scale negative differential conductance (NDC) device by assembling a triangular trimer of 4,5,9,10-tetrabromo-1,3,6,8-tetraazapyrene (TBTAP) molecules on a superconducting Pb(111) substrate. Using low-temperature scanning tunneling spectroscopy, we observe robust NDC behavior manifesting as a decrease in current with increasing voltage between 0.7-0.9 V arising from the interplay of Coulomb blockade and strong inter-molecular capacitive coupling within the molecular cluster. Gate-controlled charging and discharging processes are directly visualized via two-dimensional differential conductance mapping, which reveals the emergence of Coulomb rings and spatial regions of NDC. Theoretical modeling using a three-impurity Anderson model and master equation approach quantitatively reproduces the experimental observations and demonstrates that the NDC emerges purely from electron correlations, independent of the underlying superconductivity. By tuning the geometry to a hexamer structure, we further show that cluster topology provides versatile control over electronic properties at the molecular scale. These results establish a functional platform for implementing multifunctional molecular devices and highlight a strategy toward programmable and scalable nanoelectronics.
15. Latency Minimization for Multi-AAV-Enabled ISCC Systems with Movable Antenna
Authors: Yiyang Chen, Wenchao Liu, Chunjie Wang, Yinyu Wu, Xuhui Zhang, Yanyan Shen β’
Published: 2025-08-07 β’
Source: arXiv
This paper investigates an autonomous aerial vehicle (AAV)-enabled integrated sensing, communication, and computation system, with a particular focus on integrating movable antennas (MAs) into the system for enhancing overall system performance. Specifically, multiple MA-enabled AVVs perform sensing tasks and simultaneously transmit the generated computational tasks to the base station for processing. To minimize the maximum latency under the sensing and resource constraints, we formulate an optimization problem that jointly coordinates the position of the MAs, the computation resource allocation, and the transmit beamforming. Due to the non-convexity of the objective function and strong coupling among variables, we propose a two-layer iterative algorithm leveraging particle swarm optimization and convex optimization to address it. The simulation results demonstrate that the proposed scheme achieves significant latency improvements compared to the baseline schemes.
16. Fairy$\pm i$: the First 2-bit Complex LLM with All Parameters in $\{\pm1, \pm i\}$
Authors: Feiyu Wang, Guoan Wang, Yihao Zhang, Shengfan Wang, Weitao Li, Bokai Huang, Shimao Chen, Zihan Jiang, Rui Xu, Tong Yang β’
Published: 2025-08-07 β’
Source: arXiv
Quantization-Aware Training (QAT) integrates quantization into the training loop, enabling LLMs to learn robust low-bit representations, and is widely recognized as one of the most promising research directions. All current QAT research focuses on minimizing quantization error on full-precision models, where the full-precision accuracy acts as an upper bound (accuracy ceiling). No existing method has even attempted to surpass this ceiling. To break this ceiling, we propose a new paradigm: raising the ceiling (full-precision model), and then still quantizing it efficiently into 2 bits. We propose Fairy$\pm i$, the first 2-bit quantization framework for complex-valued LLMs. Specifically, our method leverages the representational advantages of the complex domain to boost full-precision accuracy. We map weights to the fourth roots of unity $\{\pm1, \pm i\}$, forming a perfectly symmetric and information-theoretically optimal 2-bit representation. Importantly, each quantized weight has either a zero real or imaginary part, enabling multiplication-free inference using only additions and element swaps. Experimental results show that Fairy$\pm i$ outperforms the ceiling of existing 2-bit quantization approaches in terms of both PPL and downstream tasks, while maintaining strict storage and compute efficiency. This work opens a new direction for building highly accurate and practical LLMs under extremely low-bit constraints.
17. Numerical analysis of the stochastic Navier-Stokes equations
Authors: Dominic Breit, Andreas Prohl, JΓΆrn Wichman β’
Published: 2025-08-07 β’
Source: arXiv
The developments over the last five decades concerning numerical discretisations of the incompressible Navier--Stokes equations have lead to reliable tools for their approximation: those include stable methods to properly address the incompressibility constraint, stable discretisations to account for convection dominated problems, efficient time (splitting) methods, and methods to tackle their nonlinear character. While these tools may successfully be applied to reliably simulate even more complex fluid flow PDE models, their understanding requires a fundamental revision in the case of stochastic fluid models, which are gaining increased importance nowadays. This work motivates and surveys optimally convergent numerical methods for the stochastic Stokes and Navier--Stokes equations that were obtained in the last decades. Furtheremore, we computationally illustrate the failure of some of those methods from the deterministic setting, if they are straight-forwardly applied to the stochastic case. In fact, we explain why some of these deterministic methods perform sub-optimally by highlighting crucial analytical differences between the deterministic and stochastic equations -- and how modifications of the deterministic methods restore their optimal performance if they properly address the probabilistic nature of the stochastic problem. Next to the numerical analysis of schemes, we propose a general benchmark of prototypic fluid flow problems driven by different types of noise to also compare new algorithms by simulations in terms of complexities, efficiencies, and possible limitations. The driving motivation is to reach a better comparison of simulations for new schemes in terms of accuracy and complexities, and to also complement theoretical performance studies for restricted settings of data by more realistic ones.
18. Joint parameter estimation and multidimensional reconciliation for CV-QKD
Authors: Jisheng Dai, Xue-Qin Jiang, Peng Huang, Tao Wang, Guihua Zeng β’
Published: 2025-08-07 β’
Source: arXiv
Accurate quantum channel parameter estimation is essential for effective information reconciliation in continuous-variable quantum key distribution (CV-QKD). However, conventional maximum likelihood (ML) estimators rely on a large amount of discarded data (or pilot symbols), leading to a significant loss in symbol efficiency. Moreover, the separation between the estimation and reconciliation phases can introduce error propagation. In this paper, we propose a novel joint message-passing scheme that unifies channel parameter estimation and information reconciliation within a Bayesian framework. By leveraging the expectation-maximization (EM) algorithm, the proposed method simultaneously estimates unknown parameters during decoding, eliminating the need for separate ML estimation. Furthermore, we introduce a hybrid multidimensional rotation scheme that removes the requirement for norm feedback, significantly reducing classical channel overhead. To the best of our knowledge, this is the first work to unify multidimensional reconciliation and channel parameter estimation in CV-QKD, providing a practical solution for high-efficiency reconciliation with minimal pilots.
19. Do Political Opinions Transfer Between Western Languages? An Analysis of Unaligned and Aligned Multilingual LLMs
Authors: Franziska Weeber, Tanise Ceron, Sebastian PadΓ³ β’
Published: 2025-08-07 β’
Source: arXiv
Public opinion surveys show cross-cultural differences in political opinions between socio-cultural contexts. However, there is no clear evidence whether these differences translate to cross-lingual differences in multilingual large language models (MLLMs). We analyze whether opinions transfer between languages or whether there are separate opinions for each language in MLLMs of various sizes across five Western languages. We evaluate MLLMs' opinions by prompting them to report their (dis)agreement with political statements from voting advice applications. To better understand the interaction between languages in the models, we evaluate them both before and after aligning them with more left or right views using direct preference optimization and English alignment data only. Our findings reveal that unaligned models show only very few significant cross-lingual differences in the political opinions they reflect. The political alignment shifts opinions almost uniformly across all five languages. We conclude that in Western language contexts, political opinions transfer between languages, demonstrating the challenges in achieving explicit socio-linguistic, cultural, and political alignment of MLLMs.
20. Development of PANOSETI Telescopes for Ultra-High-Energy Gamma-Ray Astronomy
Authors: Nikolas Korzoun β’
Published: 2025-08-07 β’
Source: arXiv
Ultra-High-Energy (UHE, E $>100$ TeV) gamma rays are one of the few channels to search for and study Galactic PeVatrons. Among the most promising PeVatron candidates are the many UHE gamma-ray sources that have recently been identified on the Galactic Plane. Ground-based particle detectors see these sources as extended rather than point-like, and current generation Imaging Atmospheric Cherenkov Telescopes (IACTs) struggle to study them with effective areas and background rejection that are suboptimal at UHE. A cost-efficient way of constructing an array of IACTs explicitly designed for UHE sensitivity is to sparsely separate many small telescopes. We have simulated, prototyped, and twice deployed a pathfinder array that is instrumented with telescopes designed by the Panoramic Search for Extraterrestrial Intelligence (PANOSETI) team. These 0.5-meter Fresnel lens telescopes are purpose-built for imaging optical transients on nanosecond timescales and are equipped with a $10^\circ\times10^\circ$ silicon photomultiplier camera. Three PANOSETI telescopes were deployed twice in the same temporary configuration at Lick Observatory in March and October 2024. Here we give a brief description of the instrument and present a comparison of simulations with the data collected, including an analysis of the Crab Nebula. We also report on the ongoing deployment of PANOSETI telescopes for the Dark100 array that is planned to operate for five years at Palomar Observatory.
21. CleanUpBench: Embodied Sweeping and Grasping Benchmark
Authors: Wenbo Li, Guanting Chen, Tao Zhao, Jiyao Wang, Tianxin Hu, Yuwen Liao, Weixiang Guo, Shenghai Yuan β’
Published: 2025-08-07 β’
Source: arXiv
Embodied AI benchmarks have advanced navigation, manipulation, and reasoning, but most target complex humanoid agents or large-scale simulations that are far from real-world deployment. In contrast, mobile cleaning robots with dual mode capabilities, such as sweeping and grasping, are rapidly emerging as realistic and commercially viable platforms. However, no benchmark currently exists that systematically evaluates these agents in structured, multi-target cleaning tasks, revealing a critical gap between academic research and real-world applications. We introduce CleanUpBench, a reproducible and extensible benchmark for evaluating embodied agents in realistic indoor cleaning scenarios. Built on NVIDIA Isaac Sim, CleanUpBench simulates a mobile service robot equipped with a sweeping mechanism and a six-degree-of-freedom robotic arm, enabling interaction with heterogeneous objects. The benchmark includes manually designed environments and one procedurally generated layout to assess generalization, along with a comprehensive evaluation suite covering task completion, spatial efficiency, motion quality, and control performance. To support comparative studies, we provide baseline agents based on heuristic strategies and map-based planning. CleanUpBench bridges the gap between low-level skill evaluation and full-scene testing, offering a scalable testbed for grounded, embodied intelligence in everyday settings.
22. AI vs. Human Moderators: A Comparative Evaluation of Multimodal LLMs in Content Moderation for Brand Safety
Authors: Adi Levi, Or Levi, Sardhendu Mishra, Jonathan Morra β’
Published: 2025-08-07 β’
Source: arXiv
As the volume of video content online grows exponentially, the demand for moderation of unsafe videos has surpassed human capabilities, posing both operational and mental health challenges. While recent studies demonstrated the merits of Multimodal Large Language Models (MLLMs) in various video understanding tasks, their application to multimodal content moderation, a domain that requires nuanced understanding of both visual and textual cues, remains relatively underexplored. In this work, we benchmark the capabilities of MLLMs in brand safety classification, a critical subset of content moderation for safe-guarding advertising integrity. To this end, we introduce a novel, multimodal and multilingual dataset, meticulously labeled by professional reviewers in a multitude of risk categories. Through a detailed comparative analysis, we demonstrate the effectiveness of MLLMs such as Gemini, GPT, and Llama in multimodal brand safety, and evaluate their accuracy and cost efficiency compared to professional human reviewers. Furthermore, we present an in-depth discussion shedding light on limitations of MLLMs and failure cases. We are releasing our dataset alongside this paper to facilitate future research on effective and responsible brand safety and content moderation.
23. SMOL-MapSeg: Show Me One Label
Authors: Yunshuang Yuan, Frank Thiemann, Thorsten Dahms, Monika Sester β’
Published: 2025-08-07 β’
Source: arXiv
Historical maps are valuable for studying changes to the Earth's surface. With the rise of deep learning, models like UNet have been used to extract information from these maps through semantic segmentation. Recently, pre-trained foundation models have shown strong performance across domains such as autonomous driving, medical imaging, and industrial inspection. However, they struggle with historical maps. These models are trained on modern or domain-specific images, where patterns can be tied to predefined concepts through common sense or expert knowledge. Historical maps lack such consistency -- similar concepts can appear in vastly different shapes and styles. To address this, we propose On-Need Declarative (OND) knowledge-based prompting, which introduces explicit prompts to guide the model on what patterns correspond to which concepts. This allows users to specify the target concept and pattern during inference (on-need inference). We implement this by replacing the prompt encoder of the foundation model SAM with our OND prompting mechanism and fine-tune it on historical maps. The resulting model is called SMOL-MapSeg (Show Me One Label). Experiments show that SMOL-MapSeg can accurately segment classes defined by OND knowledge. It can also adapt to unseen classes through few-shot fine-tuning. Additionally, it outperforms a UNet-based baseline in average segmentation performance.
24. MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling
Authors: Jifan Gao, Mahmudur Rahman, John Caskey, Madeline Oguss, Ann O'Rourke, Randy Brown, Anne Stey, Anoop Mayampurath, Matthew M. Churpek, Guanhua Chen, Majid Afshar β’
Published: 2025-08-07 β’
Source: arXiv
Multimodal electronic health record (EHR) data provide richer, complementary insights into patient health compared to single-modality data. However, effectively integrating diverse data modalities for clinical prediction modeling remains challenging due to the substantial data requirements. We introduce a novel architecture, Mixture-of-Multimodal-Agents (MoMA), designed to leverage multiple large language model (LLM) agents for clinical prediction tasks using multimodal EHR data. MoMA employs specialized LLM agents ("specialist agents") to convert non-textual modalities, such as medical images and laboratory results, into structured textual summaries. These summaries, together with clinical notes, are combined by another LLM ("aggregator agent") to generate a unified multimodal summary, which is then used by a third LLM ("predictor agent") to produce clinical predictions. Evaluating MoMA on three prediction tasks using real-world datasets with different modality combinations and prediction settings, MoMA outperforms current state-of-the-art methods, highlighting its enhanced accuracy and flexibility across various tasks.
25. DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model
Authors: Rui Yu, Xianghang Zhang, Runkai Zhao, Huaicheng Yan, Meng Wang β’
Published: 2025-08-07 β’
Source: arXiv
End-to-end autonomous driving has been recently seen rapid development, exerting a profound influence on both industry and academia. However, the existing work places excessive focus on ego-vehicle status as their sole learning objectives and lacks of planning-oriented understanding, which limits the robustness of the overall decision-making prcocess. In this work, we introduce DistillDrive, an end-to-end knowledge distillation-based autonomous driving model that leverages diversified instance imitation to enhance multi-mode motion feature learning. Specifically, we employ a planning model based on structured scene representations as the teacher model, leveraging its diversified planning instances as multi-objective learning targets for the end-to-end model. Moreover, we incorporate reinforcement learning to enhance the optimization of state-to-decision mappings, while utilizing generative modeling to construct planning-oriented instances, fostering intricate interactions within the latent space. We validate our model on the nuScenes and NAVSIM datasets, achieving a 50\% reduction in collision rate and a 3-point improvement in closed-loop performance compared to the baseline model. Code and model are publicly available at https://github.com/YuruiAI/DistillDrive
26. UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation
Authors: Wonjun Kang, Byeongkeun Ahn, Minjae Lee, Kevin Galim, Seunghyuk Oh, Hyung Il Koo, Nam Ik Cho β’
Published: 2025-08-07 β’
Source: arXiv
Text-to-image (T2I) generation has been actively studied using Diffusion Models and Autoregressive Models. Recently, Masked Generative Transformers have gained attention as an alternative to Autoregressive Models to overcome the inherent limitations of causal attention and autoregressive decoding through bidirectional attention and parallel decoding, enabling efficient and high-quality image generation. However, compositional T2I generation remains challenging, as even state-of-the-art Diffusion Models often fail to accurately bind attributes and achieve proper text-image alignment. While Diffusion Models have been extensively studied for this issue, Masked Generative Transformers exhibit similar limitations but have not been explored in this context. To address this, we propose Unmasking with Contrastive Attention Guidance (UNCAGE), a novel training-free method that improves compositional fidelity by leveraging attention maps to prioritize the unmasking of tokens that clearly represent individual objects. UNCAGE consistently improves performance in both quantitative and qualitative evaluations across multiple benchmarks and metrics, with negligible inference overhead. Our code is available at https://github.com/furiosa-ai/uncage.
27. Voltage Support Procurement in Transmission Grids: Incentive Design via Online Bilevel Games
Authors: Zhisen Jiang, Saverio Bolognani, Giuseppe Belgioioso β’
Published: 2025-08-07 β’
Source: arXiv
The integration of distributed energy resources into transmission grid operations presents a complex challenge, particularly in the context of reactive power procurement for voltage support. This paper addresses this challenge by formulating the voltage regulation problem as a Stackelberg game, where the Transmission System Operator (TSO) designs incentives to guide the reactive power responses of Distribution System Operators (DSOs). We utilize a gradient-based iterative algorithm that updates the incentives to ensure that DSOs adjust their reactive power injections to maintain voltage stability. We incorporate principles from online feedback optimization to enable real-time implementation, utilizing voltage measurements in both TSO's and DSOs' policies. This approach not only enhances the robustness against model uncertainties and changing operating conditions but also facilitates the co-design of incentives and automation. Numerical experiments on a 5-bus transmission grid demonstrate the effectiveness of our approach in achieving voltage regulation while accommodating the strategic interactions of self-interested DSOs.
28. NomicLaw: Emergent Trust and Strategic Argumentation in LLMs During Collaborative Law-Making
Authors: Asutosh Hota, Jussi P. P. Jokinen β’
Published: 2025-08-07 β’
Source: arXiv
Recent advancements in large language models (LLMs) have extended their capabilities from basic text processing to complex reasoning tasks, including legal interpretation, argumentation, and strategic interaction. However, empirical understanding of LLM behavior in open-ended, multi-agent settings especially those involving deliberation over legal and ethical dilemmas remains limited. We introduce NomicLaw, a structured multi-agent simulation where LLMs engage in collaborative law-making, responding to complex legal vignettes by proposing rules, justifying them, and voting on peer proposals. We quantitatively measure trust and reciprocity via voting patterns and qualitatively assess how agents use strategic language to justify proposals and influence outcomes. Experiments involving homogeneous and heterogeneous LLM groups demonstrate how agents spontaneously form alliances, betray trust, and adapt their rhetoric to shape collective decisions. Our results highlight the latent social reasoning and persuasive capabilities of ten open-source LLMs and provide insights into the design of future AI systems capable of autonomous negotiation, coordination and drafting legislation in legal settings.
29. The Term 'Agent' Has Been Diluted Beyond Utility and Requires Redefinition
Authors: Brinnae Bent β’
Published: 2025-08-07 β’
Source: arXiv
The term 'agent' in artificial intelligence has long carried multiple interpretations across different subfields. Recent developments in AI capabilities, particularly in large language model systems, have amplified this ambiguity, creating significant challenges in research communication, system evaluation and reproducibility, and policy development. This paper argues that the term 'agent' requires redefinition. Drawing from historical analysis and contemporary usage patterns, we propose a framework that defines clear minimum requirements for a system to be considered an agent while characterizing systems along a multidimensional spectrum of environmental interaction, learning and adaptation, autonomy, goal complexity, and temporal coherence. This approach provides precise vocabulary for system description while preserving the term's historically multifaceted nature. After examining potential counterarguments and implementation challenges, we provide specific recommendations for moving forward as a field, including suggestions for terminology standardization and framework adoption. The proposed approach offers practical tools for improving research clarity and reproducibility while supporting more effective policy development.
30. mKG-RAG: Multimodal Knowledge Graph-Enhanced RAG for Visual Question Answering
Authors: Xu Yuan, Liangbo Ning, Wenqi Fan, Qing Li β’
Published: 2025-08-07 β’
Source: arXiv
Recently, Retrieval-Augmented Generation (RAG) has been proposed to expand internal knowledge of Multimodal Large Language Models (MLLMs) by incorporating external knowledge databases into the generation process, which is widely used for knowledge-based Visual Question Answering (VQA) tasks. Despite impressive advancements, vanilla RAG-based VQA methods that rely on unstructured documents and overlook the structural relationships among knowledge elements frequently introduce irrelevant or misleading content, reducing answer accuracy and reliability. To overcome these challenges, a promising solution is to integrate multimodal knowledge graphs (KGs) into RAG-based VQA frameworks to enhance the generation by introducing structured multimodal knowledge. Therefore, in this paper, we propose a novel multimodal knowledge-augmented generation framework (mKG-RAG) based on multimodal KGs for knowledge-intensive VQA tasks. Specifically, our approach leverages MLLM-powered keyword extraction and vision-text matching to distill semantically consistent and modality-aligned entities/relationships from multimodal documents, constructing high-quality multimodal KGs as structured knowledge representations. In addition, a dual-stage retrieval strategy equipped with a question-aware multimodal retriever is introduced to improve retrieval efficiency while refining precision. Comprehensive experiments demonstrate that our approach significantly outperforms existing methods, setting a new state-of-the-art for knowledge-based VQA.
31. Flow-driven magnetic microcatheter for superselective arterial embolization
Authors: Lucio Pancaldi, Ece ΓzelΓ§i, Mehdi Ali Gadiri, Julian Raub, Pascal John Mosimann, Mahmut Selman Sakar β’
Published: 2025-08-07 β’
Source: arXiv
Minimally invasive interventions performed inside brain vessels with the synergistic use of microcatheters pushed over guidewires have revolutionized the way aneurysms, stroke, arteriovenous malformations, brain tumors and other cerebrovascular conditions are being treated. However, a significant portion of the brain vasculature remains inaccessible from within because the conventional catheterization technique based on transmitting forces from the proximal to the distal end of the instruments imposes stringent constraints on their diameter and stiffness. Here we overcome this mechanical barrier by microengineering a new class of ultraminiaturized magnetic microcatheters in the form of an inflatable flat tube, making them extremely flexible and capable of harnessing the kinetic energy of blood flow for endovascular navigation. We introduce a compact and versatile magnetic steering platform that is compatible with conventional bi-plane fluoroscope imaging, and demonstrate for the first time safe and effortless navigation and tracking of hard-to-reach, distal, tortuous arteries that are as small as 180 um in diameter with a curvature radius as small as 0.69 mm. Furthermore, we demonstrate the superselective infusion of contrast and embolic liquid agents, all in a porcine model. These results pave the way to reach, diagnose, and treat currently inaccessible distal arteries that may be at risk of bleeding or feeding a tumor. Our endovascular technology can also be used to selectively target tissues for drug or gene delivery from within the arteries, not only in the central and peripheral nervous system but almost any other organ system, with improved accuracy, speed and safety.
32. A Novel Architecture for Symbolic Reasoning with Decision Trees and LLM Agents
Authors: Andrew Kiruluta β’
Published: 2025-08-07 β’
Source: arXiv
We propose a hybrid architecture that integrates decision tree-based symbolic reasoning with the generative capabilities of large language models (LLMs) within a coordinated multi-agent framework. Unlike prior approaches that loosely couple symbolic and neural modules, our design embeds decision trees and random forests as callable oracles within a unified reasoning system. Tree-based modules enable interpretable rule inference and causal logic, while LLM agents handle abductive reasoning, generalization, and interactive planning. A central orchestrator maintains belief state consistency and mediates communication across agents and external tools, enabling reasoning over both structured and unstructured inputs. The system achieves strong performance on reasoning benchmarks. On \textit{ProofWriter}, it improves entailment consistency by +7.2\% through logic-grounded tree validation. On GSM8k, it achieves +5.3\% accuracy gains in multistep mathematical problems via symbolic augmentation. On \textit{ARC}, it boosts abstraction accuracy by +6.0\% through integration of symbolic oracles. Applications in clinical decision support and scientific discovery show how the system encodes domain rules symbolically while leveraging LLMs for contextual inference and hypothesis generation. This architecture offers a robust, interpretable, and extensible solution for general-purpose neuro-symbolic reasoning.
33. A Conceptual Model and Methodology for Sustainability-aware, IoT-enhanced Business Processes
Authors: Victoria Torres Bosch, Ronny Seiger, Manuela Albert Albiol, Antoni Mestre Gascon, Pedro Jose Valderas Aranda β’
Published: 2025-08-07 β’
Source: arXiv
The real-time data collection and automation capabilities offered by the Internet of Things (IoT) are revolutionizing and transforming Business Processes (BPs) into IoT-enhanced BPs, showing high potential for improving sustainability. Although already studied in Business Process Management (BPM), sustainability research has primarily focused on environmental concerns. However, achieving a holistic and lasting impact requires a systematic approach to address sustainability beyond the environmental dimension. This work proposes a conceptual model and a structured methodology with the goal of analyzing the potential of IoT to measure and improve the sustainability of BPs. The conceptual model formally represents key sustainability concepts, linking BPM and IoT by highlighting how IoT devices support and contribute to sustainability. The methodology guides the systematic analysis of existing BPs, identifies opportunities, and implements sustainability-aware, IoT-enhanced BPs. The approach is illustrated through a running example from the tourism domain and a case study in healthcare.
34. Resistance Technologies: Moving Beyond Alternative Designs
Authors: Iness Ben Guirat, Jan Tobias MΓΌhlberg β’
Published: 2025-08-07 β’
Source: arXiv
The discourse about sustainable technology has emerged from the acknowledgment of the environmental collapse we are facing. In this paper, we argue that addressing this crisis requires more than the development of sustainable alternatives to current online services or the optimization of resources using various dashboards and AI. Rather, the focus must shift toward designing technologies that protect us from the consequences of the environmental damages. Among these consequences, wars, genocide and new forms of colonialism are perhaps the most significant. We identify "protection" not in terms of military defense as Western States like to argue, but as part of sovereignty. We seek to define the term of "Resistance Technologies" for such technologies, arguing further that anti-surveillance technologies are a foundational component of sovereignty and must be part of future conversations around sustainability. Finally, our paper seeks to open a discourse with the Computing-within-Limits community and beyond, towards defining other essential aspects or concepts of technologies that we see as core values of "Resistance Technology".
35. FAITH: A Framework for Assessing Intrinsic Tabular Hallucinations in finance
Authors: Mengao Zhang, Jiayu Fu, Tanya Warrier, Yuwen Wang, Tianhui Tan, Ke-wei Huang β’
Published: 2025-08-07 β’
Source: arXiv
Hallucination remains a critical challenge for deploying Large Language Models (LLMs) in finance. Accurate extraction and precise calculation from tabular data are essential for reliable financial analysis, since even minor numerical errors can undermine decision-making and regulatory compliance. Financial applications have unique requirements, often relying on context-dependent, numerical, and proprietary tabular data that existing hallucination benchmarks rarely capture. In this study, we develop a rigorous and scalable framework for evaluating intrinsic hallucinations in financial LLMs, conceptualized as a context-aware masked span prediction task over real-world financial documents. Our main contributions are: (1) a novel, automated dataset creation paradigm using a masking strategy; (2) a new hallucination evaluation dataset derived from S&P 500 annual reports; and (3) a comprehensive evaluation of intrinsic hallucination patterns in state-of-the-art LLMs on financial tabular data. Our work provides a robust methodology for in-house LLM evaluation and serves as a critical step toward building more trustworthy and reliable financial Generative AI systems.