1. LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models
Authors: Ruijie Hou, Yueyang Jiao, Hanxu Hu, Yingming Li, Wai Lam, Huajian Zhang, Hongyuan Lu •
Published: 2025-09-18 •
Source: arXiv
The problem of data contamination is now almost inevitable during the development of large language models (LLMs), with the training data commonly integrating those evaluation benchmarks even unintentionally. This problem subsequently makes it hard to benchmark LLMs fairly. Instead of constructing contamination-free datasets (quite hard), we propose a novel framework, \textbf{LNE-Blocking}, to restore model performance prior to contamination on potentially leaked datasets. Our framework consists of two components: contamination detection and disruption operation. For the prompt, the framework first uses the contamination detection method, \textbf{LNE}, to assess the extent of contamination in the model. Based on this, it adjusts the intensity of the disruption operation, \textbf{Blocking}, to elicit non-memorized responses from the model. Our framework is the first to efficiently restore the model's greedy decoding performance. This comes with a strong performance on multiple datasets with potential leakage risks, and it consistently achieves stable recovery results across different models and varying levels of data contamination. We release the code at https://github.com/RuijieH/LNE-Blocking to facilitate research.
2. Generalizable Geometric Image Caption Synthesis
Authors: Yue Xin, Wenyuan Wang, Rui Pan, Ruida Wang, Howard Meng, Renjie Pi, Shizhe Diao, Tong Zhang •
Published: 2025-09-18 •
Source: arXiv
Multimodal large language models have various practical applications that demand strong reasoning abilities. Despite recent advancements, these models still struggle to solve complex geometric problems. A key challenge stems from the lack of high-quality image-text pair datasets for understanding geometric images. Furthermore, most template-based data synthesis pipelines typically fail to generalize to questions beyond their predefined templates. In this paper, we bridge this gap by introducing a complementary process of Reinforcement Learning with Verifiable Rewards (RLVR) into the data generation pipeline. By adopting RLVR to refine captions for geometric images synthesized from 50 basic geometric relations and using reward signals derived from mathematical problem-solving tasks, our pipeline successfully captures the key features of geometry problem-solving. This enables better task generalization and yields non-trivial improvements. Furthermore, even in out-of-distribution scenarios, the generated dataset enhances the general reasoning capabilities of multimodal large language models, yielding accuracy improvements of $2.8\%\text{-}4.8\%$ in statistics, arithmetic, algebraic, and numerical tasks with non-geometric input images of MathVista and MathVerse, along with $2.4\%\text{-}3.9\%$ improvements in Art, Design, Tech, and Engineering tasks in MMMU.
3. FlowRL: Matching Reward Distributions for LLM Reasoning
Authors: Xuekai Zhu, Daixuan Cheng, Dinghuai Zhang, Hengli Li, Kaiyan Zhang, Che Jiang, Youbang Sun, Ermo Hua, Yuxin Zuo, Xingtai Lv, Qizheng Zhang, Lin Chen, Fanghao Shao, Bo Xue, Yunchong Song, Zhenjie Yang, Ganqu Cui, Ning Ding, Jianfeng Gao, Xiaodong Liu, Bowen Zhou, Hongyuan Mei, Zhouhan Lin •
Published: 2025-09-18 •
Source: arXiv
We propose FlowRL: matching the full reward distribution via flow balancing instead of maximizing rewards in large language model (LLM) reinforcement learning (RL). Recent advanced reasoning models adopt reward-maximizing methods (\eg, PPO and GRPO), which tend to over-optimize dominant reward signals while neglecting less frequent but valid reasoning paths, thus reducing diversity. In contrast, we transform scalar rewards into a normalized target distribution using a learnable partition function, and then minimize the reverse KL divergence between the policy and the target distribution. We implement this idea as a flow-balanced optimization method that promotes diverse exploration and generalizable reasoning trajectories. We conduct experiments on math and code reasoning tasks: FlowRL achieves a significant average improvement of $10.0\%$ over GRPO and $5.1\%$ over PPO on math benchmarks, and performs consistently better on code reasoning tasks. These results highlight reward distribution-matching as a key step toward efficient exploration and diverse reasoning in LLM reinforcement learning.
4. Localized In-Plane Cavity Optomechanics in MEMS
Authors: Sasan Rahmanian •
Published: 2025-09-18 •
Source: arXiv
This study demonstrates the realization of localized in-plane optomechanical microcavities embedded within an electrostatic MEMS architecture. The system consists of a curved, clamped-clamped microbeam, fabricated on a silicon-on-insulator (SOI) wafer. A green laser emitted from a Laser Doppler Vibrometer (LDV), is directed perpendicularly onto the device under a vacuum pressure of 7 mTorr, with the beam aligned to fill the gap between the movable microbeam and its adjacent side fixed mirror. This configuration forms localized cavity optomechanical resonators that enable the generation of optomechanical soliton frequency combs through phonon lasing without electrical excitation. The optomechanical resonators' dynamics are examined through experiments and numerical simulations. First, the experimental findings unveil that in electrostatic MEMS structures, the two reflective electrodes positioned to form a capacitive gap can inadvertently form localized cavities. These cavities significantly affect optical readouts, as the photodetected signal encodes contributions from both Doppler-shifted electromagnetic waves and light scattered from the intracavity optical field. This dual contributions can distort mechanical response interpretation unless appropriately filtered. Second, experiments show that optical pumping at various positions along the microbeam induces periodic pulse trains with distinct free spectral ranges (FSRs), each corresponding to different mechanical modes. Our results present the generation of solitary optical wavepackets using in-plane localized Fabry-P\'erot microcavities formed within a MEMS device. The results suggest a path toward chip-scale, soliton frequency combs generators featuring frequency spacing on the order of kilohertz, without relying on integrated fiber optics.
5. Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction
Authors: Yuanbo Xie, Yingjie Zhang, Tianyun Liu, Duohe Ma, Tingwen Liu •
Published: 2025-09-18 •
Source: arXiv
Jailbreak attacks pose persistent threats to large language models (LLMs). Current safety alignment methods have attempted to address these issues, but they experience two significant limitations: insufficient safety alignment depth and unrobust internal defense mechanisms. These limitations make them vulnerable to adversarial attacks such as prefilling and refusal direction manipulation. We introduce DeepRefusal, a robust safety alignment framework that overcomes these issues. DeepRefusal forces the model to dynamically rebuild its refusal mechanisms from jailbreak states. This is achieved by probabilistically ablating the refusal direction across layers and token depths during fine-tuning. Our method not only defends against prefilling and refusal direction attacks but also demonstrates strong resilience against other unseen jailbreak strategies. Extensive evaluations on four open-source LLM families and six representative attacks show that DeepRefusal reduces attack success rates by approximately 95%, while maintaining model capabilities with minimal performance degradation.
6. Positive maps and extendibility hierarchies from copositive matrices
Authors: Aabhas Gulati, Ion Nechita, Sang-Jun Park •
Published: 2025-09-18 •
Source: arXiv
The characterization of positive, non-CP linear maps is a central problem in operator algebras and quantum information theory, where such maps serve as entanglement witnesses. This work introduces and systematically studies a new convex cone of PCOP (pairwise copositive). We establish that this cone is dual to the cone of PCP (pairwise completely positive) and, critically, provides a complete characterization for the positivity of the broad class of covariant maps. We provide a way to lift matrices from the classical cone of COP to PCOP, thereby creating a powerful bridge between the well-studied theory of copositive forms and the structure of positive maps. We develop an analogous framework for decomposable maps, introducing the cone PDEC. As a primary application of this framework, we define a novel family of linear maps $\Phi_t^G$ parameterized by a graph $G$ and a real parameter $t$. We derive exact thresholds on $t$ that determine when these maps are positive or decomposable, linking these properties to fundamental graph-theoretic parameters. This construction yields vast new families of positive indecomposable maps, for which we provide explicit examples derived from infinite classes of graphs, most notably rank 3 strongly regular graphs such as Paley graphs. On the dual side, we investigate the entanglement properties of large classes of symmetric states, such as the Dicke states. We prove that the sum-of-squares (SOS) hierarchies used in polynomial optimization to approximate the cone of copositive matrices correspond precisely to dual cones of witnesses for different levels of the PPT bosonic extendibility hierarchy. Leveraging this duality, we provide an explicit construction of bipartite (mixture of) Dicke states that are simultaneously entangled and $\mathcal{K}_r$-PPT bosonic extendible for any desired hierarchy level $r \geq 2$ and local dimension $n \geq 5$.
7. Measuring the Two-Dimensional Thermal Structures of Protoplanetary Disks
Authors: Anna J. Fehr, Sean M. Andrews •
Published: 2025-09-18 •
Source: arXiv
We present a flexible, annulus-by-annulus method to constrain the 2-D thermal structure of a protoplanetary disk from optically thick spectral line emission. Using synthetic disk models with a known temperature and density structure, we extracted the vertical emission surfaces and brightness temperatures in radial annuli for multiple CO isotopologue transitions and used them to infer the vertical temperature profiles. This approach reliably recovers the injected temperature structure despite noise and finite resolution. We demonstrated that even a modest set of emission lines can constrain the temperature across a wide range of radii and elevations. Nevertheless, biases in the extracted emission surfaces constitute a major source of systematic error. Finally, we applied this method to archival ALMA observations of the HD 163296 disk, revealing that simple parametric radial temperature models may obscure the complexity of real disks and that additional observations are necessary to distinguish between different models of the vertical structure. This flexible framework can be readily applied to other systems, helping to characterize the thermal environments that shape planet formation.
8. Orion: Fuzzing Workflow Automation
Authors: Max Bazalii, Marius Fleischer •
Published: 2025-09-18 •
Source: arXiv
Fuzz testing is one of the most effective techniques for finding software vulnerabilities. While modern fuzzers can generate inputs and monitor executions automatically, the overall workflow, from analyzing a codebase, to configuring harnesses, to triaging results, still requires substantial manual effort. Prior attempts focused on single stages such as harness synthesis or input minimization, leaving researchers to manually connect the pieces into a complete fuzzing campaign. We introduce Orion, a framework that automates the the manual bottlenecks of fuzzing by integrating LLM reasoning with traditional tools, allowing campaigns to scale to settings where human effort alone was impractical. Orion uses LLMs for code reasoning and semantic guidance, while relying on deterministic tools for verification, iterative refinement, and tasks that require precision. Across our benchmark suite, Orion reduces human effort by 46-204x depending on the workflow stage, and we demonstrate its effectiveness through the discovery of two previously unknown vulnerabilities in the widely used open-source clib library.
9. Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
Authors: Yujun Zhou, Zhenwen Liang, Haolin Liu, Wenhao Yu, Kishan Panaganti, Linfeng Song, Dian Yu, Xiangliang Zhang, Haitao Mi, Dong Yu •
Published: 2025-09-18 •
Source: arXiv
Large language models (LLMs) are increasingly trained with reinforcement learning from verifiable rewards (RLVR), yet real-world deployment demands models that can self-improve without labels or external judges. Existing label-free methods, confidence minimization, self-consistency, or majority-vote objectives, stabilize learning but steadily shrink exploration, causing an entropy collapse: generations become shorter, less diverse, and brittle. Unlike prior approaches such as Test-Time Reinforcement Learning (TTRL), which primarily adapt models to the immediate unlabeled dataset at hand, our goal is broader: to enable general improvements without sacrificing the model's inherent exploration capacity and generalization ability, i.e., evolving. We formalize this issue and propose EVolution-Oriented and Label-free Reinforcement Learning (EVOL-RL), a simple rule that couples stability with variation under a label-free setting. EVOL-RL keeps the majority-voted answer as a stable anchor (selection) while adding a novelty-aware reward that favors responses whose reasoning differs from what has already been produced (variation), measured in semantic space. Implemented with GRPO, EVOL-RL also uses asymmetric clipping to preserve strong signals and an entropy regularizer to sustain search. This majority-for-selection + novelty-for-variation design prevents collapse, maintains longer and more informative chains of thought, and improves both pass@1 and pass@n. EVOL-RL consistently outperforms the majority-only TTRL baseline; e.g., training on label-free AIME24 lifts Qwen3-4B-Base AIME25 pass@1 from TTRL's 4.6% to 16.4%, and pass@16 from 18.5% to 37.9%. EVOL-RL not only prevents diversity collapse but also unlocks stronger generalization across domains (e.g., GPQA). Furthermore, we demonstrate that EVOL-RL also boosts performance in the RLVR setting, highlighting its broad applicability.
10. Channel Prediction under Network Distribution Shift Using Continual Learning-based Loss Regularization
Authors: Muhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal, Muhammad Ibtsaam Qadir, Muhammad Ali Jamshed, Dean F. Hougen, John M. Cioffi •
Published: 2025-09-18 •
Source: arXiv
Modern wireless networks face critical challenges when mobile users traverse heterogeneous network configurations with varying antenna layouts, carrier frequencies, and scattering statistics. Traditional predictors degrade under distribution shift, with NMSE rising by 37.5\% during cross-configuration handovers. This work addresses catastrophic forgetting in channel prediction by proposing a continual learning framework based on loss regularization. The approach augments standard training objectives with penalty terms that selectively preserve network parameters essential for previous configurations while enabling adaptation to new environments. Two prominent regularization strategies are investigated: Elastic Weight Consolidation (EWC) and Synaptic Intelligence (SI). Across 3GPP scenarios and multiple architectures, SI lowers the high-SNR NMSE floor by up to 1.8 dB ($\approx$32--34\%), while EWC achieves up to 1.4 dB ($\approx$17--28\%). Notably, standard EWC incurs $\mathcal{O}(MK)$ complexity (storing $M$ Fisher diagonal entries and corresponding parameter snapshots across $K$ tasks) unless consolidated, whereas SI maintains $\mathcal{O}(M)$ memory complexity (storing $M$ model parameters), independent of task sequence length, making it suitable for resource-constrained wireless infrastructure
11. MaRVIn: A Cross-Layer Mixed-Precision RISC-V Framework for DNN Inference, from ISA Extension to Hardware Acceleration
Authors: Giorgos Armeniakos, Alexis Maras, Sotirios Xydis, Dimitrios Soudris •
Published: 2025-09-18 •
Source: arXiv
The evolution of quantization and mixed-precision techniques has unlocked new possibilities for enhancing the speed and energy efficiency of NNs. Several recent studies indicate that adapting precision levels across different parameters can maintain accuracy comparable to full-precision models while significantly reducing computational demands. However, existing embedded microprocessors lack sufficient architectural support for efficiently executing mixed-precision NNs, both in terms of ISA extensions and hardware design, resulting in inefficiencies such as excessive data packing/unpacking and underutilized arithmetic units. In this work, we propose novel ISA extensions and a micro-architecture implementation specifically designed to optimize mixed-precision execution, enabling energy-efficient deep learning inference on RISC-V architectures. We introduce MaRVIn, a cross-layer hardware-software co-design framework that enhances power efficiency and performance through a combination of hardware improvements, mixed-precision quantization, ISA-level optimizations, and cycle-accurate emulation. At the hardware level, we enhance the ALU with configurable mixed-precision arithmetic (2, 4, 8 bits) for weights/activations and employ multi-pumping to reduce execution latency while implementing soft SIMD for efficient 2-bit ops. At the software level, we integrate a pruning-aware fine-tuning method to optimize model compression and a greedy-based DSE approach to efficiently search for Pareto-optimal mixed-quantized models. Additionally, we incorporate voltage scaling to boost the power efficiency of our system. Our experimental evaluation over widely used DNNs and datasets, such as CIFAR10 and ImageNet, demonstrates that our framework can achieve, on average, 17.6x speedup for less than 1% accuracy loss and outperforms the ISA-agnostic state-of-the-art RISC-V cores, delivering up to 1.8 TOPs/W.
12. Conditional Prior-based Non-stationary Channel Estimation Using Accelerated Diffusion Models
Authors: Muhammad Ahmed Mohsin, Ahsan Bilal, Muhammad Umer, Asad Aali, Muhammad Ali Jamshed, Dean F. Hougen, John M. Cioffi •
Published: 2025-09-18 •
Source: arXiv
Wireless channels in motion-rich urban microcell (UMi) settings are non-stationary; mobility and scatterer dynamics shift the distribution over time, degrading classical and deep estimators. This work proposes conditional prior diffusion for channel estimation, which learns a history-conditioned score to denoise noisy channel snapshots. A temporal encoder with cross-time attention compresses a short observation window into a context vector, which captures the channel's instantaneous coherence and steers the denoiser via feature-wise modulation. In inference, an SNR-matched initialization selects the diffusion step whose marginal aligns with the measured input SNR, and the process follows a shortened, geometrically spaced schedule, preserving the signal-to-noise trajectory with far fewer iterations. Temporal self-conditioning with the previous channel estimate and a training-only smoothness penalty further stabilizes evolution without biasing the test-time estimator. Evaluations on a 3GPP benchmark show lower NMSE across all SNRs than LMMSE, GMM, LSTM, and LDAMP baselines, demonstrating stable performance and strong high SNR fidelity.
13. Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding
Authors: Zaiquan Yang, Yuhao Liu, Gerhard Hancke, Rynson W. H. Lau •
Published: 2025-09-18 •
Source: arXiv
Spatio-temporal video grounding (STVG) aims at localizing the spatio-temporal tube of a video, as specified by the input text query. In this paper, we utilize multimodal large language models (MLLMs) to explore a zero-shot solution in STVG. We reveal two key insights about MLLMs: (1) MLLMs tend to dynamically assign special tokens, referred to as \textit{grounding tokens}, for grounding the text query; and (2) MLLMs often suffer from suboptimal grounding due to the inability to fully integrate the cues in the text query (\textit{e.g.}, attributes, actions) for inference. Based on these insights, we propose a MLLM-based zero-shot framework for STVG, which includes novel decomposed spatio-temporal highlighting (DSTH) and temporal-augmented assembling (TAS) strategies to unleash the reasoning ability of MLLMs. The DSTH strategy first decouples the original query into attribute and action sub-queries for inquiring the existence of the target both spatially and temporally. It then uses a novel logit-guided re-attention (LRA) module to learn latent variables as spatial and temporal prompts, by regularizing token predictions for each sub-query. These prompts highlight attribute and action cues, respectively, directing the model's attention to reliable spatial and temporal related visual regions. In addition, as the spatial grounding by the attribute sub-query should be temporally consistent, we introduce the TAS strategy to assemble the predictions using the original video frames and the temporal-augmented frames as inputs to help improve temporal consistency. We evaluate our method on various MLLMs, and show that it outperforms SOTA methods on three common STVG benchmarks. The code will be available at https://github.com/zaiquanyang/LLaVA_Next_STVG.
14. Monetary Policy and Exchange Rate Fluctuations
Authors: Yongheng Hu •
Published: 2025-09-18 •
Source: arXiv
In this paper, we model USD-CNY bilateral exchange rate fluctuations as a general stochastic process and incorporate monetary policy shock to examine how bilateral exchange rate fluctuations affect the Revealed Comparative Advantage (RCA) index. Numerical simulations indicate that as the mean of bilateral exchange rate fluctuations increases, i.e., currency devaluation, the RCA index rises. Moreover, smaller bilateral exchange rate fluctuations after the policy shock cause the RCA index to gradually converge toward its mean level. For the empirical analysis, we select the USD-CNY bilateral exchange rate and provincial manufacturing industry export competitiveness data in China from 2008 to 2021. We find that in the short term, when exchange rate fluctuations stabilize within a range less than 0.2 RMB depreciation will effectively boost export competitiveness. Then, the 8.11 exchange rate policy reversed the previous linear trend of the CNY, stabilizing it within a narrow fluctuation range over the long term. This policy leads to a gradual convergence of provincial RCA indices toward a relatively high level, which is commensurate with our numerical simulations, and indirectly enhances provincial export competitiveness.
15. Unveiling TeV halos among unidentified extended TeV sources
Authors: Michela Rigoselli, Sarah Recchia, Alberto Bonollo, Silvia Crestan, Giada Peron, Andrea Giuliani, Sandro Mereghetti •
Published: 2025-09-18 •
Source: arXiv
In recent years, the number of known sources emitting very- and ultra-high-energy gamma-rays has increased significantly thanks to facilities such as LHAASO and HAWC. Many of the observed sources are still unidentified or poorly constrained due to the limited angular resolution of these instruments; however, it is now ascertained that approximately half of them have a pulsar in coincidence. Some of these unidentified extended sources may be the result of the diffusion of leptons accelerated by the pulsar itself or in its nebula to energies exceeding 50 TeV. This new class of sources, called TeV halos, is characterized by a peculiar radial profile that, if properly resolved, is key to distinguishing them from other TeV sources that are associated with a pulsar, such as supernova remnants and pulsar wind nebulae. In this contribution, we consider all the pulsars which are spatially coincident with an unidentified extended TeV source, in order to quantify whether its spin-down power, age and distance allow the pulsar to produce a TeV halo with the observed flux and extension. We also investigate how the next generation of Imaging Atmospheric Cherenkov Telescopes (IACTs), namely the Cherenkov Telescope Array Observatory (CTAO) and the ASTRI Mini-Array, will observe and characterize these TeV halos. We present a set of simulated sources with the expected morphology and spectrum, and we show for which of them we can distinguish between TeV halos and other classes of extended sources.
16. Bayesian inference for spatio-temporal hidden Markov models using the exchange algorithm
Authors: Daniele Tancini, Riccardo Rastelli, Francesco Bartolucci •
Published: 2025-09-18 •
Source: arXiv
Spatio-temporal hidden Markov models are extremely difficult to estimate because their latent joint distributions are available only in trivial cases. In the estimation phase, these latent distributions are usually substituted with pseudo-distributions, which could affect the estimation results, in particular in the presence of strong dependence between the latent variables. In this work, we propose a spatio-temporal hidden Markov model where the latent process is an extension of the autologistic model. We show how inference can be carried out in a Bayesian framework using an approximate exchange algorithm, which circumvents the impractical calculations of the normalizing constants that arise in the model. Our proposed method leads to a Markov chain Monte Carlo sampler that targets the correct posterior distribution of the model and not a pseudo-posterior. In addition, we develop a new initialization approach for the approximate exchange method, reducing the computational time of the algorithm. An extensive simulation study shows that the approximate exchange algorithm generally outperforms the pseudo-distribution approach, yielding more accurate parameter estimates. Finally, the proposed methodology is applied to a real-world case study analyzing rainfall levels across Italian regions over time.
17. Self-Improving Embodied Foundation Models
Authors: Seyed Kamyar Seyed Ghasemipour, Ayzaan Wahid, Jonathan Tompson, Pannag Sanketi, Igor Mordatch •
Published: 2025-09-18 •
Source: arXiv
Foundation models trained on web-scale data have revolutionized robotics, but their application to low-level control remains largely limited to behavioral cloning. Drawing inspiration from the success of the reinforcement learning stage in fine-tuning large language models, we propose a two-stage post-training approach for robotics. The first stage, Supervised Fine-Tuning (SFT), fine-tunes pretrained foundation models using both: a) behavioral cloning, and b) steps-to-go prediction objectives. In the second stage, Self-Improvement, steps-to-go prediction enables the extraction of a well-shaped reward function and a robust success detector, enabling a fleet of robots to autonomously practice downstream tasks with minimal human supervision. Through extensive experiments on real-world and simulated robot embodiments, our novel post-training recipe unveils significant results on Embodied Foundation Models. First, we demonstrate that the combination of SFT and Self-Improvement is significantly more sample-efficient than scaling imitation data collection for supervised learning, and that it leads to policies with significantly higher success rates. Further ablations highlight that the combination of web-scale pretraining and Self-Improvement is the key to this sample-efficiency. Next, we demonstrate that our proposed combination uniquely unlocks a capability that current methods cannot achieve: autonomously practicing and acquiring novel skills that generalize far beyond the behaviors observed in the imitation learning datasets used during training. These findings highlight the transformative potential of combining pretrained foundation models with online Self-Improvement to enable autonomous skill acquisition in robotics. Our project website can be found at https://self-improving-efms.github.io .
18. Code Less to Code More: Streamlining Language Server Protocol and Type System Development for Language Families
Authors: Federico Bruzzone, Walter Cazzola, Luca Favalli •
Published: 2025-09-18 •
Source: arXiv
Developing editing support for $L$ languages in $E$ editors is complex and time-consuming. Some languages do not provide dedicated editors, while others offer a single native editor. The $\textit{language server protocol}$ (LSP) reduces the language-editor combinations $L \times E$ to $L + E$, where a single language server communicates with editors via LSP plugins. However, overlapping implementations of linguistic components remain an issue. Existing language workbenches struggle with modularity, reusability, and leveraging type systems for language server generation. In this work, we propose: (i) Typelang, a family of domain-specific languages for modular, composable, and reusable type system implementation, (ii) a modular language server generation process, producing servers for languages built in a modular workbench, (iii) the variant-oriented programming paradigm and a cross-artifact coordination layer to manage interdependent software variants, and (iv) an LSP plugin generator, reducing $E$ to $1$ by automating plugin creation for multiple editors. To simplify editing support for language families, each language artifact integrates its own Typelang variant, used to generate language servers. This reduces combinations to $T \times 1$, where $T = L$ represents the number of type systems. Further reuse of language artifacts across languages lowers this to $N \times 1$, where $N << T$, representing unique type systems. We implement Typelang in Neverlang, generating language servers for each artifact and LSP plugins for three editors. Empirical evaluation shows a 93.48% reduction in characters needed for type system implementation and 100% automation of LSP plugin generation, significantly lowering effort for editing support in language families, especially when artifacts are reused.
19. Parameterizing quasi-quintessence and quasi-phantom fields without the nearly flat potential approximation
Authors: Anna Chiara Alfano, Youri Carloni •
Published: 2025-09-18 •
Source: arXiv
An alternative dark energy description based on a generalized K-essence scenario is here explored. In particular, we consider a \emph{quasi-quintessence} and/or \emph{quasi-phantom} field, whose pressure does not depend on the kinetic energy, firstly discussed in the context of the cosmological constant problem. In so doing, we fix the background evolution and investigate the main observational signatures of its corresponding fluid-like representation. The corresponding scalar field can be parameterized independently from the potential form and without imposing the condition $\omega \sim -1$ used for quintessence and phantom fields. Additionally, we constrain the model parameters by performing Monte-Carlo Markov chain simulations through the adoption of the Metropolis-Hastings algorithm and perform separated analyses, employing different data catalogs. More precisely, as data sets we employ observational Hubble data, type Ia supernovae and the second data release from the DESI Collaboration, namely DESI DR2. We define a hierarchy among analyses and, precisely, in the first we adopt all three samples, while the second excludes the DESI data points, with the aim of facing its effect on corresponding bounds. Our findings suggest that the \emph{quasi-quintessence} scenario prefers Planck's value of the Hubble constant $H_0$, but suggesting that, when the DESI sample is excluded from our computations, $\omega_0$ enters the phantom regime, although still compatible at $1$-$\sigma$ confidence level with a cosmological constant. Remarkably, these results appear in tension than those found for a standard quintessence, explored within the context of the recent DESI release, likely indicating that the DESI data may furnish inconclusive results depending on the kind of scalar field involved into the computation.
20. Optimal Learning from Label Proportions with General Loss Functions
Authors: Lorne Applebaum, Travis Dick, Claudio Gentile, Haim Kaplan, Tomer Koren •
Published: 2025-09-18 •
Source: arXiv
Motivated by problems in online advertising, we address the task of Learning from Label Proportions (LLP). In this partially-supervised setting, training data consists of groups of examples, termed bags, for which we only observe the average label value. The main goal, however, remains the design of a predictor for the labels of individual examples. We introduce a novel and versatile low-variance de-biasing methodology to learn from aggregate label information, significantly advancing the state of the art in LLP. Our approach exhibits remarkable flexibility, seamlessly accommodating a broad spectrum of practically relevant loss functions across both binary and multi-class classification settings. By carefully combining our estimators with standard techniques, we substantially improve sample complexity guarantees for a large class of losses of practical relevance. We also empirically validate the efficacy of our proposed approach across a diverse array of benchmark datasets, demonstrating compelling empirical advantages over standard baselines.
21. Gaia DR3 Variable White Dwarfs vetted by ZTF
Authors: Timour Jestin, Thinh Nguyen, Laurent Eyer, Lorenzo Rimoldini, Ashish Mahabal, Marc Audard, Pedro Garcia-Lario, Panagiotis Gavras, Krzysztof Nienartowicz •
Published: 2025-09-18 •
Source: arXiv
The publications of Gaia DR2 and DR3 have brought major improvements in stellar astrometry and photometry, particularly regarding the description of the white dwarf sequence. Notably, Gaia DR2 enabled the detection of variability in white dwarfs based solely on averaged astrometric and photometric quantities, i.e. the astrometric 5 parameters (positions, proper motion, and parallax) and general photometry properties in the G, BP and RP bands (mean, standard deviation and number of measurements). We identify and classify variable white dwarfs using Gaia DR3 data and Zwicky Transient Facility DR23 observations. The objective is to construct a catalogue of pulsating white dwarf candidates with robust selection criteria. We define a new sample of candidate variable white dwarfs using Gaia DR3 astrometric and photometric data. We cross-match this sample with the ZTF DR23 catalogue and apply a multiband Lomb-Scargle periodogram analysis to detect periodic variability. We then use the OPTICS unsupervised clustering algorithm to to group and classify the confirmed periodic stars. We identify 1423 variable white dwarfs candidates from Gaia DR3, with 864 having ZTF time series. 141 present significant periodicity. We classify these objects into known categories, including ZZ Ceti stars, GW Vir, V777 Her, and white dwarf-main sequence binaries. Our analysis yields several periodic stars, including three ZZ Ceti, 15 GW Vir, one V777 Her, and 24 WD-MS binaries. Furthermore, it reveals a significant population of potentialy variable stars, though without confirmed periodicity. Finally we publish our catalogue of candidate variable white dwarfs including variability status, periodicity, and classification information for the 864 sources with ZTF time series, 519 of them newly identified (including 83 new periodic stars).
22. Doppler Radiance Field-Guided Antenna Selection for Improved Generalization in Multi-Antenna Wi-Fi-based Human Activity Recognition
Authors: Navid Hasanzadeh, Shahrokh Valaee •
Published: 2025-09-18 •
Source: arXiv
With the IEEE 802.11bf Task Group introducing amendments to the WLAN standard for advanced sensing, interest in using Wi-Fi Channel State Information (CSI) for remote sensing has surged. Recent findings indicate that learning a unified three-dimensional motion representation through Doppler Radiance Fields (DoRFs) derived from CSI significantly improves the generalization capabilities of Wi-Fi-based human activity recognition (HAR). Despite this progress, CSI signals remain affected by asynchronous access point (AP) clocks and additive noise from environmental and hardware sources. Consequently, even with existing preprocessing techniques, both the CSI data and Doppler velocity projections used in DoRFs are still susceptible to noise and outliers, limiting HAR performance. To address this challenge, we propose a novel framework for multi-antenna APs to suppress noise and identify the most informative antennas based on DoRF fitting errors, which capture inconsistencies among Doppler velocity projections. Experimental results on a challenging small-scale hand gesture recognition dataset demonstrate that the proposed DoRF-guided Wi-Fi-based HAR approach significantly improves generalization capability, paving the way for robust real-world sensing deployments.
23. Prestige over merit: An adapted audit of LLM bias in peer review
Authors: Anthony Howell, Jieshu Wang, Luyu Du, Julia Melkers, Varshil Shah •
Published: 2025-09-18 •
Source: arXiv
Large language models (LLMs) are playing an increasingly integral, though largely informal, role in scholarly peer review. Yet it remains unclear whether LLMs reproduce the biases observed in human decision-making. We adapt a resume-style audit to scientific publishing, developing a multi-role LLM simulation (editor/reviewer) that evaluates a representative set of high-quality manuscripts across the physical, biological, and social sciences under randomized author identities (institutional prestige, gender, race). The audit reveals a strong and consistent institutional-prestige bias: identical papers attributed to low-prestige affiliations face a significantly higher risk of rejection, despite only modest differences in LLM-assessed quality. To probe mechanisms, we generate synthetic CVs for the same author profiles; these encode large prestige-linked disparities and an inverted prestige-tenure gradient relative to national benchmarks. The results suggest that both domain norms and prestige-linked priors embedded in training data shape paper-level outcomes once identity is visible, converting affiliation into a decisive status cue.
24. Limitations of Public Chest Radiography Datasets for Artificial Intelligence: Label Quality, Domain Shift, Bias and Evaluation Challenges
Authors: Amy Rafferty, Rishi Ramaesh, Ajitha Rajan •
Published: 2025-09-18 •
Source: arXiv
Artificial intelligence has shown significant promise in chest radiography, where deep learning models can approach radiologist-level diagnostic performance. Progress has been accelerated by large public datasets such as MIMIC-CXR, ChestX-ray14, PadChest, and CheXpert, which provide hundreds of thousands of labelled images with pathology annotations. However, these datasets also present important limitations. Automated label extraction from radiology reports introduces errors, particularly in handling uncertainty and negation, and radiologist review frequently disagrees with assigned labels. In addition, domain shift and population bias restrict model generalisability, while evaluation practices often overlook clinically meaningful measures. We conduct a systematic analysis of these challenges, focusing on label quality, dataset bias, and domain shift. Our cross-dataset domain shift evaluation across multiple model architectures revealed substantial external performance degradation, with pronounced reductions in AUPRC and F1 scores relative to internal testing. To assess dataset bias, we trained a source-classification model that distinguished datasets with near-perfect accuracy, and performed subgroup analyses showing reduced performance for minority age and sex groups. Finally, expert review by two board-certified radiologists identified significant disagreement with public dataset labels. Our findings highlight important clinical weaknesses of current benchmarks and emphasise the need for clinician-validated datasets and fairer evaluation frameworks.
25. Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
Authors: Simin Li, Zheng Yuwei, Zihao Mao, Linhao Wang, Ruixiao Xu, Chengdong Ma, Xin Yu, Yuqing Ma, Qi Dou, Xin Wang, Jie Luo, Bo An, Yaodong Yang, Weifeng Lv, Xianglong Liu •
Published: 2025-09-18 •
Source: arXiv
Partial agent failure becomes inevitable when systems scale up, making it crucial to identify the subset of agents whose compromise would most severely degrade overall performance. In this paper, we study this Vulnerable Agent Identification (VAI) problem in large-scale multi-agent reinforcement learning (MARL). We frame VAI as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC), where the upper level involves an NP-hard combinatorial task of selecting the most vulnerable agents, and the lower level learns worst-case adversarial policies for these agents using mean-field MARL. The two problems are coupled together, making HAD-MFC difficult to solve. To solve this, we first decouple the hierarchical process by Fenchel-Rockafellar transform, resulting a regularized mean-field Bellman operator for upper level that enables independent learning at each level, thus reducing computational complexity. We then reformulate the upper-level combinatorial problem as a MDP with dense rewards from our regularized mean-field Bellman operator, enabling us to sequentially identify the most vulnerable agents by greedy and RL algorithms. This decomposition provably preserves the optimal solution of the original HAD-MFC. Experiments show our method effectively identifies more vulnerable agents in large-scale MARL and the rule-based system, fooling system into worse failures, and learns a value function that reveals the vulnerability of each agent.
26. Sample Efficient Experience Replay in Non-stationary Environments
Authors: Tianyang Duan, Zongyuan Zhang, Songxiao Guo, Yuanye Zhao, Zheng Lin, Zihan Fang, Yi Liu, Dianxin Luan, Dong Huang, Heming Cui, Yong Cui •
Published: 2025-09-18 •
Source: arXiv
Reinforcement learning (RL) in non-stationary environments is challenging, as changing dynamics and rewards quickly make past experiences outdated. Traditional experience replay (ER) methods, especially those using TD-error prioritization, struggle to distinguish between changes caused by the agent's policy and those from the environment, resulting in inefficient learning under dynamic conditions. To address this challenge, we propose the Discrepancy of Environment Dynamics (DoE), a metric that isolates the effects of environment shifts on value functions. Building on this, we introduce Discrepancy of Environment Prioritized Experience Replay (DEER), an adaptive ER framework that prioritizes transitions based on both policy updates and environmental changes. DEER uses a binary classifier to detect environment changes and applies distinct prioritization strategies before and after each shift, enabling more sample-efficient learning. Experiments on four non-stationary benchmarks demonstrate that DEER further improves the performance of off-policy algorithms by 11.54 percent compared to the best-performing state-of-the-art ER methods.
27. A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making
Authors: Xiao Wu, Ting-Zhu Huang, Liang-Jian Deng, Yanyuan Qiao, Imran Razzak, Yutong Xie •
Published: 2025-09-18 •
Source: arXiv
Medical decision-making often involves integrating knowledge from multiple clinical specialties, typically achieved through multidisciplinary teams. Inspired by this collaborative process, recent work has leveraged large language models (LLMs) in multi-agent collaboration frameworks to emulate expert teamwork. While these approaches improve reasoning through agent interaction, they are limited by static, pre-assigned roles, which hinder adaptability and dynamic knowledge integration. To address these limitations, we propose KAMAC, a Knowledge-driven Adaptive Multi-Agent Collaboration framework that enables LLM agents to dynamically form and expand expert teams based on the evolving diagnostic context. KAMAC begins with one or more expert agents and then conducts a knowledge-driven discussion to identify and fill knowledge gaps by recruiting additional specialists as needed. This supports flexible, scalable collaboration in complex clinical scenarios, with decisions finalized through reviewing updated agent comments. Experiments on two real-world medical benchmarks demonstrate that KAMAC significantly outperforms both single-agent and advanced multi-agent methods, particularly in complex clinical scenarios (i.e., cancer prognosis) requiring dynamic, cross-specialty expertise. Our code is publicly available at: https://github.com/XiaoXiao-Woo/KAMAC.
28. SPATIALGEN: Layout-guided 3D Indoor Scene Generation
Authors: Chuan Fang, Heng Li, Yixun Liang, Jia Zheng, Yongsen Mao, Yuan Liu, Rui Tang, Zihan Zhou, Ping Tan •
Published: 2025-09-18 •
Source: arXiv
Creating high-fidelity 3D models of indoor environments is essential for applications in design, virtual reality, and robotics. However, manual 3D modeling remains time-consuming and labor-intensive. While recent advances in generative AI have enabled automated scene synthesis, existing methods often face challenges in balancing visual quality, diversity, semantic consistency, and user control. A major bottleneck is the lack of a large-scale, high-quality dataset tailored to this task. To address this gap, we introduce a comprehensive synthetic dataset, featuring 12,328 structured annotated scenes with 57,440 rooms, and 4.7M photorealistic 2D renderings. Leveraging this dataset, we present SpatialGen, a novel multi-view multi-modal diffusion model that generates realistic and semantically consistent 3D indoor scenes. Given a 3D layout and a reference image (derived from a text prompt), our model synthesizes appearance (color image), geometry (scene coordinate map), and semantic (semantic segmentation map) from arbitrary viewpoints, while preserving spatial consistency across modalities. SpatialGen consistently generates superior results to previous methods in our experiments. We are open-sourcing our data and models to empower the community and advance the field of indoor scene understanding and generation.
29. RoboEye: Enhancing 2D Robotic Object Identification with Selective 3D Geometric Keypoint Matching
Authors: Xingwu Zhang, Guanxuan Li, Zhuocheng Zhang, Zijun Long •
Published: 2025-09-18 •
Source: arXiv
The rapidly growing number of product categories in large-scale e-commerce makes accurate object identification for automated packing in warehouses substantially more difficult. As the catalog grows, intra-class variability and a long tail of rare or visually similar items increase, and when combined with diverse packaging, cluttered containers, frequent occlusion, and large viewpoint changes-these factors amplify discrepancies between query and reference images, causing sharp performance drops for methods that rely solely on 2D appearance features. Thus, we propose RoboEye, a two-stage identification framework that dynamically augments 2D semantic features with domain-adapted 3D reasoning and lightweight adapters to bridge training deployment gaps. In the first stage, we train a large vision model to extract 2D features for generating candidate rankings. A lightweight 3D-feature-awareness module then estimates 3D feature quality and predicts whether 3D re-ranking is necessary, preventing performance degradation and avoiding unnecessary computation. When invoked, the second stage uses our robot 3D retrieval transformer, comprising a 3D feature extractor that produces geometry-aware dense features and a keypoint-based matcher that computes keypoint-correspondence confidences between query and reference images instead of conventional cosine-similarity scoring. Experiments show that RoboEye improves Recall@1 by 7.1% over the prior state of the art (RoboLLM). Moreover, RoboEye operates using only RGB images, avoiding reliance on explicit 3D inputs and reducing deployment costs. The code used in this paper is publicly available at: https://github.com/longkukuhi/RoboEye.
30. PERAL: Perception-Aware Motion Control for Passive LiDAR Excitation in Spherical Robots
Authors: Shenghai Yuan, Jason Wai Hao Yee, Weixiang Guo, Zhongyuan Liu, Thien-Minh Nguyen, Lihua Xie •
Published: 2025-09-18 •
Source: arXiv
Autonomous mobile robots increasingly rely on LiDAR-IMU odometry for navigation and mapping, yet horizontally mounted LiDARs such as the MID360 capture few near-ground returns, limiting terrain awareness and degrading performance in feature-scarce environments. Prior solutions - static tilt, active rotation, or high-density sensors - either sacrifice horizontal perception or incur added actuators, cost, and power. We introduce PERAL, a perception-aware motion control framework for spherical robots that achieves passive LiDAR excitation without dedicated hardware. By modeling the coupling between internal differential-drive actuation and sensor attitude, PERAL superimposes bounded, non-periodic oscillations onto nominal goal- or trajectory-tracking commands, enriching vertical scan diversity while preserving navigation accuracy. Implemented on a compact spherical robot, PERAL is validated across laboratory, corridor, and tactical environments. Experiments demonstrate up to 96 percent map completeness, a 27 percent reduction in trajectory tracking error, and robust near-ground human detection, all at lower weight, power, and cost compared with static tilt, active rotation, and fixed horizontal baselines. The design and code will be open-sourced upon acceptance.
31. Constraining gamma-ray burst parameters with the first ultra-high energy neutrino event KM3-230213A
Authors: KM3NeT Collaboration, O. Adriani, A. Albert, A. R. Alhebsi, S. Alshalloudi, M. Alshamsi, S. Alves Garre, A. Ambrosone, F. Ameli, M. Andre, L. Aphecetche, M. Ardid, S. Ardid, J. Aublin, F. Badaracco, L. Bailly-Salins, B. Baret, A. Bariego-Quintana, Y. Becherini, M. Bendahman, F. Benfenati Gualandi, M. Benhassi, D. M. Benoit, BeňuÅ¡ová, E. Berbee, E. Berti, V. Bertin, P. Betti, S. Biagi, M. Boettcher, D. Bonanno, S. Bottai, A. B. Bouasla, J. Boumaaza, M. Bouta, M. Bouwhuis, C. Bozza, R. M. Bozza, H. BrânzaÅ¡, F. Bretaudeau, M. Breuhaus, R. Bruijn, J. Brunner, R. Bruno, E. Buis, R. Buompane, J. Busto, B. Caiffi, D. Calvo, A. Capone, F. Carenini, V. Carretero, T. Cartraud, P. Castaldi, V. Cecchini, S. Celli, L. Cerisy, M. Chabab, A. Chen, S. Cherubini, T. Chiarusi, M. Circella, R. Clark, R. Cocimano, J. A. B. Coelho, A. Coleiro, A. Condorelli, R. Coniglione, P. Coyle, A. Creusot, G. Cuttone, R. Dallier, A. De Benedittis, G. De Wasseige, V. Decoene, P. Deguire, I. Del Rosso, L. S. Di Mauro, I. Di Palma, A. F. DÃaz, D. Diego-Tortosa, C. Distefano, A. Domi, C. Donzaud, D. Dornic, E. Drakopoulou, D. Drouhin, J. -G. Ducoin, P. Duverne, R. Dvornický, T. Eberl, E. Eckerová, A. Eddymaoui, T. van Eeden, M. Eff, D. van Eijk, I. El Bojaddaini, S. El Hedri, S. El Mentawi, A. Enzenhöfer, G. Ferrara, M. D. Filipović, F. Filippini, D. Franciotti, L. A. Fusco, S. Gagliardini, T. Gal, J. GarcÃa Méndez, A. Garcia Soto, C. Gatius Oliver, N. Geißelbrecht, E. Genton, H. Ghaddari, L. Gialanella, B. K. Gibson, E. Giorgio, I. Goos, P. Goswami, S. R. Gozzini, R. Gracia, B. Guillon, C. Haack, H. van Haren, A. Heijboer, L. Hennig, J. J. Hernández-Rey, A. Idrissi, W. Idrissi Ibnsalih, G. Illuminati, R. Jaimes, O. Janik, D. Joly, M. de Jong, P. de Jong, B. J. Jung, P. KalaczyÅ„ski, J. Keegans, V. Kikvadze, G. Kistauri, C. Kopper, A. Kouchner, Y. Y. Kovalev, L. Krupa, V. Kueviakoe, V. Kulikovskiy, R. Kvatadze, M. Labalme, R. Lahmann, M. Lamoureux, G. Larosa, C. Lastoria, J. Lazar, A. Lazo, G. Lehaut, V. Lemaître, E. Leonora, N. Lessing, G. Levi, M. Lindsey Clark, F. Longhitano, S. Madarapu, F. Magnani, L. Malerba, F. Mamedov, A. Manfreda, A. Manousakis, M. Marconi, A. Margiotta, A. Marinelli, C. Markou, L. Martin, M. Mastrodicasa, S. Mastroianni, J. Mauro, K. C. K. Mehta, G. Miele, P. Migliozzi, E. Migneco, M. L. Mitsou, C. M. Mollo, L. Morales-Gallegos, N. Mori, A. Moussa, I. Mozun Mateo, R. Muller, M. R. Musone, M. Musumeci, S. Navas, A. Nayerhoda, C. A. Nicolau, B. Nkosi, B. Ó Fearraigh, V. Oliviero, A. Orlando, E. Oukacha, L. Pacini, D. Paesani, J. Palacios González, G. Papalashvili, P. Papini, V. Parisi, A. Parmar, C. Pastore, A. M. Păun, G. E. PăvălaÅ¡, S. Peña MartÃnez, M. Perrin-Terrin, V. Pestel, M. Petropavlova, P. Piattelli, A. Plavin, C. Poirè, V. Popa, T. Pradier, J. Prado, S. Pulvirenti, C. A. Quiroz-Rangel, N. Randazzo, A. Ratnani, S. Razzaque, I. C. Rea, D. Real, G. Riccobene, J. Robinson, A. Romanov, E. Ros, A. Å aina, F. Salesa Greus, D. F. E. Samtleben, A. Sánchez Losa, S. Sanfilippo, M. Sanguineti, D. Santonocito, P. Sapienza, M. Scaringella, M. Scarnera, J. Schnabel, J. Schumann, J. Seneca, P. A. Sevle Myhr, I. Sgura, R. Shanidze, Chengyu Shao, A. Sharma, Y. Shitov, F. Å imkovic, A. Simonelli, A. Sinopoulou, B. Spisso, M. Spurio, O. Starodubtsev, D. Stavropoulos, I. Å tekl, D. Stocco, M. Taiuti, G. Takadze, Y. Tayalati, H. Thiersen, S. Thoudam, I. Tosta e Melo, B. Trocmé, V. Tsourapis, E. Tzamariudaki, A. Ukleja, A. Vacheret, V. Valsecchi, V. Van Elewyck, G. Vannoye, E. Vannuccini, G. Vasileiadis, F. Vazquez de Sola, A. Veutro, S. Viola, D. Vivolo, A. van Vliet, E. de Wolf, I. Lhenry-Yvon, S. Zavatarelli, D. Zito, J. D. Zornoza, J. Zúñiga •
Published: 2025-09-18 •
Source: arXiv
Context: The detection of the highest energy neutrino observed to date by KM3NeT, with an estimated energy of 220 PeV, opens up new possibilities for the study and identification of the astrophysical sources responsible for a diffuse flux of such ultra-high-energy neutrinos, among which gamma-ray bursts are longstanding candidates. Aims: Based on the event KM3-230213A, we derive constraints on the baryon loading and density of the surrounding environment in models of blastwaves in long-duration gamma-ray bursts. Methods: We compute the diffuse flux from gamma-ray burst blastwaves, either expanding in a constant density interstellar medium or developing in a radially decreasing density of a wind-like environment surrounding the gamma-ray burst progenitor star, by taking into account the expected neutrino spectra and luminosity function. We use a Poisson likelihood method to constrain the blastwave model parameters by calculating the expected number of neutrino events within the 90% confidence level energy range of KM3-230213A and by using the joint exposure of KM3NeT/ARCA, IceCube and Pierre Auger. Results: We constrain the baryon loading to be $\leq \{392, 131, 39, 13\}$ at 90% confidence level, which is inversely proportional to a varying interstellar medium particle density of $\{1, 3, 10, 30\}$ cm$^{-3}$. In the wind-like environment case, the baryon loading is $\leq \{20, 50, 100\}$ at 90% confidence level, which is proportional to the sixth power of a varying density parameter of $\{0.05, 0.06, 0.07\}$.
32. MARIC: Multi-Agent Reasoning for Image Classification
Authors: Wonduk Seo, Minhyeong Yu, Hyunjin An, Seunghyun Lee •
Published: 2025-09-18 •
Source: arXiv
Image classification has traditionally relied on parameter-intensive model training, requiring large-scale annotated datasets and extensive fine tuning to achieve competitive performance. While recent vision language models (VLMs) alleviate some of these constraints, they remain limited by their reliance on single pass representations, often failing to capture complementary aspects of visual content. In this paper, we introduce Multi Agent based Reasoning for Image Classification (MARIC), a multi agent framework that reformulates image classification as a collaborative reasoning process. MARIC first utilizes an Outliner Agent to analyze the global theme of the image and generate targeted prompts. Based on these prompts, three Aspect Agents extract fine grained descriptions along distinct visual dimensions. Finally, a Reasoning Agent synthesizes these complementary outputs through integrated reflection step, producing a unified representation for classification. By explicitly decomposing the task into multiple perspectives and encouraging reflective synthesis, MARIC mitigates the shortcomings of both parameter-heavy training and monolithic VLM reasoning. Experiments on 4 diverse image classification benchmark datasets demonstrate that MARIC significantly outperforms baselines, highlighting the effectiveness of multi-agent visual reasoning for robust and interpretable image classification.
33. Empathy-R1: A Chain-of-Empathy and Reinforcement Learning Framework for Long-Form Mental Health Support
Authors: Xianrong Yao, Dong She, Chenxu Zhang, Yimeng Zhang, Yueru Sun, Noman Ahmed, Yang Gao, Zhanpeng Jin •
Published: 2025-09-18 •
Source: arXiv
Empathy is critical for effective mental health support, especially when addressing Long Counseling Texts (LCTs). However, existing Large Language Models (LLMs) often generate replies that are semantically fluent but lack the structured reasoning necessary for genuine psychological support, particularly in a Chinese context. To bridge this gap, we introduce Empathy-R1, a novel framework that integrates a Chain-of-Empathy (CoE) reasoning process with Reinforcement Learning (RL) to enhance response quality for LCTs. Inspired by cognitive-behavioral therapy, our CoE paradigm guides the model to sequentially reason about a help-seeker's emotions, causes, and intentions, making its thinking process both transparent and interpretable. Our framework is empowered by a new large-scale Chinese dataset, Empathy-QA, and a two-stage training process. First, Supervised Fine-Tuning instills the CoE's reasoning structure. Subsequently, RL, guided by a dedicated reward model, refines the therapeutic relevance and contextual appropriateness of the final responses. Experiments show that Empathy-R1 achieves strong performance on key automatic metrics. More importantly, human evaluations confirm its superiority, showing a clear preference over strong baselines and achieving a Win@1 rate of 44.30% on our new benchmark. By enabling interpretable and contextually nuanced responses, Empathy-R1 represents a significant advancement in developing responsible and genuinely beneficial AI for mental health support.
34. OpenLens AI: Fully Autonomous Research Agent for Health Infomatics
Authors: Yuxiao Cheng, Jinli Suo •
Published: 2025-09-18 •
Source: arXiv
Health informatics research is characterized by diverse data modalities, rapid knowledge expansion, and the need to integrate insights across biomedical science, data analytics, and clinical practice. These characteristics make it particularly well-suited for agent-based approaches that can automate knowledge exploration, manage complex workflows, and generate clinically meaningful outputs. Recent progress in large language model (LLM)-based agents has demonstrated promising capabilities in literature synthesis, data analysis, and even end-to-end research execution. However, existing systems remain limited for health informatics because they lack mechanisms to interpret medical visualizations and often overlook domain-specific quality requirements. To address these gaps, we introduce OpenLens AI, a fully automated framework tailored to health informatics. OpenLens AI integrates specialized agents for literature review, data analysis, code generation, and manuscript preparation, enhanced by vision-language feedback for medical visualization and quality control for reproducibility. The framework automates the entire research pipeline, producing publication-ready LaTeX manuscripts with transparent and traceable workflows, thereby offering a domain-adapted solution for advancing health informatics research.
35. Enterprise AI Must Enforce Participant-Aware Access Control
Authors: Shashank Shreedhar Bhatt, Tanmay Rajore, Khushboo Aggarwal, Ganesh Ananthanarayanan, Ranveer Chandra, Nishanth Chandran, Suyash Choudhury, Divya Gupta, Emre Kiciman, Sumit Kumar Pandey, Srinath Setty, Rahul Sharma, Teijia Zhao •
Published: 2025-09-18 •
Source: arXiv
Large language models (LLMs) are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.