1. Axion-photon conversion in transient compact stars: Systematics, constraints, and opportunities
Authors: Damiano F. G. Fiorillo, Γngel Gil Muyor, Hans-Thomas Janka, Georg G. Raffelt, Edoardo Vitagliano β’
Published: 2025-09-16 β’
Source: arXiv
We study magnetic conversion of ultra-relativistic axion-like particles (ALPs) into photons in compact-star environments, focusing on the hot, transient conditions of core-collapse supernova (SN) remnants and neutron-star mergers (NSMs). We address previously overlooked uncertainties, particularly the suppression caused by ejected matter near the stellar surface, a region crucial to the conversion process. We derive analytical expressions for the transition rate; they reveal the influence of key parameters and their uncertainties. We update constraints using historical gamma-ray data from SN~1987A and find $g_{a\gamma}<5\times10^{-12}~{\rm GeV}^{-1}$ for $m_a\lesssim10^{-9}$ meV. We also forecast sensitivities for a future Galactic SN and for NSMs, assuming observations with Fermi-LAT or similar gamma-ray instruments. We distinguish ALPs -- defined as coupling only to photons and produced via Primakoff scattering -- from axions, which also couple to nucleons and emerge through nuclear bremsstrahlung. We omit pionic axion production due to its large uncertainties and inconsistencies, though it could contribute comparably to bremsstrahlung under optimistic assumptions. For the compact sources, we adopt time-averaged one-zone models, guided by numerical simulations, to enable clear and reproducible parametric studies.
2. Do Natural Language Descriptions of Model Activations Convey Privileged Information?
Authors: Millicent Li, Alberto Mario Ceballos Arroyo, Giordano Rogers, Naomi Saphra, Byron C. Wallace β’
Published: 2025-09-16 β’
Source: arXiv
Recent interpretability methods have proposed to translate LLM internal representations into natural language descriptions using a second verbalizer LLM. This is intended to illuminate how the target model represents and operates on inputs. But do such activation verbalization approaches actually provide privileged knowledge about the internal workings of the target model, or do they merely convey information about its inputs? We critically evaluate popular verbalization methods across datasets used in prior work and find that they succeed at benchmarks without any access to target model internals, suggesting that these datasets are not ideal for evaluating verbalization methods. We then run controlled experiments which reveal that verbalizations often reflect the parametric knowledge of the verbalizer LLM which generated them, rather than the activations of the target LLM being decoded. Taken together, our results indicate a need for targeted benchmarks and experimental controls to rigorously assess whether verbalization methods provide meaningful insights into the operations of LLMs.
3. VAR-PZ: Constraining the Photometric Redshifts of Quasars using Variability
Authors: S. Satheesh Sheeba, R. J. Assef, T. Anguita, P. SΓ‘nchez-SΓ‘ez, R. Shirley, T. T. Ananna, F. E. Bauer, A. Bobrick, C. G. Bornancini, S. E. I. Bosman, W. N. Brandt, D. De Cicco, B. Czerny, M. FatoviΔ, K. Ichikawa, D. IliΔ, A. B. KovaΔeviΔ, G. Li, M. Liao, A. Rojas-LilayΓΊ, M. Marculewicz, D. Marsango, C. Mazzucchelli, T. Mkrtchyan, S. Panda, A. Peca, B. Rani, C. Ricci, G. T. Richards, M. Salvato, D. P. Schneider, M. J. Temple, F. Tombesi, W. Yu, I. Yoon, F. Zou β’
Published: 2025-09-16 β’
Source: arXiv
The Vera C. Rubin Observatory LSST is expected to discover tens of millions of new Active Galactic Nuclei (AGNs). The survey's exceptional cadence and sensitivity will enable UV/optical/NIR monitoring of a significant fraction of these objects. The unprecedented number of sources makes spectroscopic follow-up for the vast majority of them unfeasible in the near future, so most studies will have to rely on photometric redshifts estimates which are traditionally much less reliable for AGN than for inactive galaxies. This work presents a novel methodology to constrain the photometric redshift of AGNs that leverages the effects of cosmological time dilation, and of the luminosity and wavelength dependence of AGN variability. Specifically, we assume that the variability can be modeled as a damped random walk (DRW) process, and adopt a parametric model to characterize the DRW timescale ($\tau$) and asymptotic amplitude of the variability (SF$_\infty$) based on the redshift, the rest-frame wavelength, and the AGN luminosity. We construct variability-based photo-$z$ priors by modeling the observed variability using the expected DRW parameters at a given redshift. These variability-based photometric redshift (VAR-PZ) priors are then combined with traditional SED fitting to improve the redshift estimates from SED fitting. Validation is performed using observational data from the SDSS, demonstrating significant reduction in catastrophic outliers by more than 10% in comparison with SED fitting techniques and improvements in redshift precision. The simulated light curves with both SDSS and LSST-like cadences and baselines confirm that, VAR-PZ will be able to constrain the photometric redshifts of SDSS-like AGNs by bringing the outlier fractions down to below 7% from 32% (SED-alone) at the end of the survey.
4. WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Authors: Kuan Li, Zhongwang Zhang, Huifeng Yin, Rui Ye, Yida Zhao, Liwen Zhang, Litu Ou, Dingchu Zhang, Xixi Wu, Jialong Wu, Xinyu Wang, Zile Qiao, Zhen Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou β’
Published: 2025-09-16 β’
Source: arXiv
Transcending human cognitive limitations represents a critical frontier in LLM training. Proprietary agentic systems like DeepResearch have demonstrated superhuman capabilities on extremely complex information-seeking benchmarks such as BrowseComp, a feat previously unattainable. We posit that their success hinges on a sophisticated reasoning pattern absent in open-source models: the ability to systematically reduce extreme uncertainty when navigating vast information landscapes. Based on this insight, we introduce WebSailor, a complete post-training methodology designed to instill this crucial capability. Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation, RFT cold start, and an efficient agentic RL training algorithm, Duplicating Sampling Policy Optimization (DUPO). With this integrated pipeline, WebSailor significantly outperforms all open-source agents in complex information-seeking tasks, matching proprietary agents' performance and closing the capability gap.
5. Mixed Triplet-Singlet Order Parameter in Decoupled Superconducting 1H Monolayers of Transition-Metal Dichalcogenides
Authors: Avior Almoalem, Sajilesh Kunhiparambath, Roni Anna Gofman, Yuval Nitzav, Ilay Mangel, Nitzan Ragoler, Jun Fujii, Ivana Vobornik, Francois Bertran, Amit Kanigel, Jonathan Ruhman, Vidya Madhavan β’
Published: 2025-09-16 β’
Source: arXiv
Understanding the emergence of unconventional superconductivity, where the order parameter deviates from simple isotropic s-wave pairing, is a central puzzle in condensed matter physics. Transition-metal dichalcogenides (TMDCs), though generally regarded as conventional superconductors, display signatures of this unusual behavior and thus provide a particularly intriguing platform to explore how exotic states arise. Here we investigate the misfit compound (SnS)$_{1.15}$(TaS$_2$), a heterostructure composed of alternating SnS and 1H-TaS$_2$ layers. Using transport, photoemission, and scanning tunneling spectroscopy, we demonstrate that the SnS layers effectively decouple the TaS$_2$ into electronically isolated 1H sheets. In this limit, the tunneling density of states reveals a clear two-gap superconducting spectrum with T$_c \sim$ 3.1 K. A theoretical model based on lack of inversion symmetry and finite-range attraction reproduces the observed multi-gap structure as a mixed singlet-triplet state. These results establish misfit compounds as a powerful platform for studying unconventional superconductivity in isolated 1H layers and for realizing multiple uncoupled superconductors within a single crystal.
6. Investigating Seamless Transitions Between Immersive Computational Notebooks and Embodied Data Interactions
Authors: Sungwon In, Eric Krokos, Kirsten Whitley, Chris North, Yalong Yang β’
Published: 2025-09-16 β’
Source: arXiv
A growing interest in Immersive Analytics (IA) has led to the extension of computational notebooks (e.g., Jupyter Notebook) into an immersive environment to enhance analytical workflows. However, existing solutions rely on the WIMP (windows, icons, menus, pointer) metaphor, which remains impractical for complex data exploration. Although embodied interaction offers a more intuitive alternative, immersive computational notebooks and embodied data exploration systems are implemented as standalone tools. This separation requires analysts to invest considerable effort to transition from one environment to an entirely different one during analytical workflows. To address this, we introduce ICoN, a prototype that facilitates a seamless transition between computational notebooks and embodied data explorations within a unified, fully immersive environment. Our findings reveal that unification improves transition efficiency and intuitiveness during analytical workflows, highlighting its potential for seamless data analysis.
7. Inferring Soil Drydown Behaviour with Adaptive Bayesian Online Changepoint Analysis
Authors: Mengyi Gong, Christopher Nemeth, Rebecca Killick, Peter Strauss, John Quinton β’
Published: 2025-09-16 β’
Source: arXiv
Continuous soil-moisture measurements provide a direct lens on subsurface hydrological processes, notably the post-rainfall "drydown" phase. Because these records consist of distinct, segment-specific behaviours whose forms and scales vary over time, realistic inference demands a model that captures piecewise dynamics while accommodating parameters that are unknown a priori. Building on Bayesian Online Changepoint Detection (BOCPD), we introduce two complementary extensions: a particle-filter variant that substitutes exact marginalisation with sequential Monte Carlo to enable real-time inference when critical parameters cannot be integrated out analytically, and an online-gradient variant that embeds stochastic gradient updates within BOCPD to learn application-relevant parameters on the fly without prohibitive computational cost. After validating both algorithms on synthetic data that replicate the temporal structure of field observations-detailing hyperparameter choices, priors, and cost-saving strategies-we apply them to soil-moisture series from experimental sites in Austria and the United States, quantifying site-specific drydown rates and demonstrating the advantages of our adaptive framework over static models.
8. Contrastive timbre representations for musical instrument and synthesizer retrieval
Authors: Gwendal Le Vaillant, Yannick Molle β’
Published: 2025-09-16 β’
Source: arXiv
Efficiently retrieving specific instrument timbres from audio mixtures remains a challenge in digital music production. This paper introduces a contrastive learning framework for musical instrument retrieval, enabling direct querying of instrument databases using a single model for both single- and multi-instrument sounds. We propose techniques to generate realistic positive/negative pairs of sounds for virtual musical instruments, such as samplers and synthesizers, addressing limitations in common audio data augmentation methods. The first experiment focuses on instrument retrieval from a dataset of 3,884 instruments, using single-instrument audio as input. Contrastive approaches are competitive with previous works based on classification pre-training. The second experiment considers multi-instrument retrieval with a mixture of instruments as audio input. In this case, the proposed contrastive framework outperforms related works, achieving 81.7\% top-1 and 95.7\% top-5 accuracies for three-instrument mixtures.
9. RepIt: Representing Isolated Targets to Steer Language Models
Authors: Vincent Siu, Nathan W. Henry, Nicholas Crispino, Yang Liu, Dawn Song, Chenguang Wang β’
Published: 2025-09-16 β’
Source: arXiv
While activation steering in large language models (LLMs) is a growing area of research, methods can often incur broader effects than desired. This motivates isolation of purer concept vectors to enable targeted interventions and understand LLM behavior at a more granular level. We present RepIt, a simple and data-efficient framework for isolating concept-specific representations. Across five frontier LLMs, RepIt enables precise interventions: it selectively suppresses refusal on targeted concepts while preserving refusal elsewhere, producing models that answer WMD-related questions while still scoring as safe on standard benchmarks. We further show that the corrective signal localizes to just 100-200 neurons and that robust target representations can be extracted from as few as a dozen examples on a single A6000. This efficiency raises a dual concern: manipulations can be performed with modest compute and data to extend to underrepresented data-scarce topics while evading existing benchmarks. By disentangling refusal vectors with RepIt, this work demonstrates that targeted interventions can counteract overgeneralization, laying the foundation for more granular control of model behavior.
10. Bar Evolution in Edge-on Galaxies: A Demographic Study of Boxy/Peanut Bulges
Authors: Atul A. Samanta, Ankit Kumar, Mousumi Das, M. Celeste Artale β’
Published: 2025-09-16 β’
Source: arXiv
Boxy/peanut and X-shaped (BP/X) bulges are prominent features in edge-on disk galaxies and are believed to be vertically thickened bars. Despite their relevance in bar evolution, a statistically robust census of these structures in large surveys has been lacking. We aim to provide the largest catalog of BP/X structures in edge-on galaxies to date, and to investigate their properties and role in shaping galaxy scaling relations. We selected a sample of 6684 edge-on galaxies from SDSS DR8 using Galaxy Zoo classifications, requiring a high edge-on probability ($> 0.9$) and a minimum of 10 independent votes. Two-dimensional image decomposition is performed using GALFIT to obtain structural parameters. Residual images are visually inspected to classify BP/X features into four categories: strong both-sided, both-sided, one-sided, and control (no BP/X). We also estimated stellar mass, distance, and physical size for each galaxy. Out of 6653 classified galaxies, we identified 1675 ($\sim$25%) with both-sided BP/X features-545 ($\sim$8%) strong and 1130 ($\sim$17%) faint-as well as 1108 ($\sim$17%) one-sided structures, making up a total of 2783 BP/X-hosting galaxies ($\sim$42%). One-sided structures, likely signatures of ongoing buckling, are more frequent than strong both-sided bulges across all stellar masses. The fraction of BP/X bulges increases with stellar surface mass density, indicating a connection with bar formation in dense disks. We also find that galaxies with strong BP/X bulges contribute to increased scatter in the stellar mass-size and stellar mass-surface density relations, particularly at higher masses.
11. Bioluminescence in turbulence: intermittent straining lights up dinoflagellates
Authors: Praphul Kumar, Jason R. Picardo β’
Published: 2025-09-16 β’
Source: arXiv
Dinoflagellates are marine phytoplankton that emit flashes of light in response to flow-induced deformation; they are responsible for illuminating breaking-waves, wakes of ships, and other intensely turbulent spots of the upper ocean. Here, we ask how bioluminescence is affected by the fluctuating nature of turbulence -- a question motivated by the dependence of emitted flashes on both the extent and rate of deformation. Introducing a light-emitting dumbbell as a minimal model, we study the Lagrangian dynamics of flashing in a homogeneous isotropic turbulent flow, and contrast it with that in an extensional flow and a Gaussian random flow. We show that turbulent fluctuations strongly enhance bioluminescence, while introducing a Poisson-like stochasticity in the flashing dynamics. Furthermore, the intermittent fluctuations of the velocity-gradient subjects the dinoflagellate to bursts of extreme straining and produces bright flashes -- more intense, though less frequent, than what would result from Gaussian fluctuations. Our results suggest that radiant displays of marine bioluminescence are strongly promoted by turbulence and its dissipation-scale intermittency.
12. Runaway electron interactions with whistler waves in tokamak plasmas: energy-dependent transport scaling
Authors: Yashika Ghai, D. Del-Castillo-Negrete, D. A. Spong, M. T. Beidler β’
Published: 2025-09-16 β’
Source: arXiv
Resonant interactions between high energy runaway electrons (REs) and whistler waves are a promising mechanism for RE mitigation in tokamak plasmas. While prior studies have largely relied on quasi-linear diffusion models in simplified geometries, we present a first-principles-informed framework that models RE-whistler interactions in a 3D tokamak equilibrium. This is achieved by coupling AORSA, which computes whistler eigenmodes for a given tokamak plasma equilibrium, and KORC, a kinetic orbit code that tracks full orbit RE trajectories in prescribed wave fields. Our results demonstrate that REs undergo scattering to large pitch angles and exhibit anomalous diffusion in both pitch-angle and kinetic energy space. Crucially, we observe a transition between diffusive, sub-diffusive, and super-diffusive transport regimes as a function of initial RE energy - an effect not captured by existing quasi-linear models. This anomalous transport behavior represents a significant advancement in understanding RE dynamics in the presence of wave - particle interactions. By identifying the conditions under which anomalous diffusion arises, this work lays the theoretical foundation for designing targeted, wave-based mitigation strategies in future tokamak experiments.
13. JANUS: A Dual-Constraint Generative Framework for Stealthy Node Injection Attacks
Authors: Jiahao Zhang, Xiaobing Pei, Zhaokun Zhong, Wenqiang Hao, Zhenghao Tang β’
Published: 2025-09-16 β’
Source: arXiv
Graph Neural Networks (GNNs) have demonstrated remarkable performance across various applications, yet they are vulnerable to sophisticated adversarial attacks, particularly node injection attacks. The success of such attacks heavily relies on their stealthiness, the ability to blend in with the original graph and evade detection. However, existing methods often achieve stealthiness by relying on indirect proxy metrics, lacking consideration for the fundamental characteristics of the injected content, or focusing only on imitating local structures, which leads to the problem of local myopia. To overcome these limitations, we propose a dual-constraint stealthy node injection framework, called Joint Alignment of Nodal and Universal Structures (JANUS). At the local level, we introduce a local feature manifold alignment strategy to achieve geometric consistency in the feature space. At the global level, we incorporate structured latent variables and maximize the mutual information with the generated structures, ensuring the injected structures are consistent with the semantic patterns of the original graph. We model the injection attack as a sequential decision process, which is optimized by a reinforcement learning agent. Experiments on multiple standard datasets demonstrate that the JANUS framework significantly outperforms existing methods in terms of both attack effectiveness and stealthiness.
14. Beyond Private or Public: Large Language Models as Quasi-Public Goods in the AI Economy
Authors: Yukun Zhang, TianYang Zhang β’
Published: 2025-09-16 β’
Source: arXiv
This paper conceptualizes Large Language Models (LLMs) as a form of mixed public goods within digital infrastructure, analyzing their economic properties through a comprehensive theoretical framework. We develop mathematical models to quantify the non-rivalry characteristics, partial excludability, and positive externalities of LLMs. Through comparative analysis of open-source and closed-source development paths, we identify systematic differences in resource allocation efficiency, innovation trajectories, and access equity. Our empirical research evaluates the spillover effects and network externalities of LLMs across different domains, including knowledge diffusion, innovation acceleration, and industry transformation. Based on these findings, we propose policy recommendations for balancing innovation incentives with equitable access, including public-private partnership mechanisms, computational resource democratization, and governance structures that optimize social welfare. This interdisciplinary approach contributes to understanding the economic nature of foundation AI models and provides policy guidance for their development as critical digital infrastructure
15. Post-Hoc Split-Point Self-Consistency Verification for Efficient, Unified Quantification of Aleatoric and Epistemic Uncertainty in Deep Learning
Authors: Zhizhong Zhao, Ke Chen β’
Published: 2025-09-16 β’
Source: arXiv
Uncertainty quantification (UQ) is vital for trustworthy deep learning, yet existing methods are either computationally intensive, such as Bayesian or ensemble methods, or provide only partial, task-specific estimates, such as single-forward-pass techniques. In this paper, we propose a post-hoc single-forward-pass framework that jointly captures aleatoric and epistemic uncertainty without modifying or retraining pretrained models. Our method applies \emph{Split-Point Analysis} (SPA) to decompose predictive residuals into upper and lower subsets, computing \emph{Mean Absolute Residuals} (MARs) on each side. We prove that, under ideal conditions, the total MAR equals the harmonic mean of subset MARs; deviations define a novel \emph{Self-consistency Discrepancy Score} (SDS) for fine-grained epistemic estimation across regression and classification. For regression, side-specific quantile regression yields prediction intervals with improved empirical coverage, which are further calibrated via SDS. For classification, when calibration data are available, we apply SPA-based calibration identities to adjust the softmax outputs and then compute predictive entropy on these calibrated probabilities. Extensive experiments on diverse regression and classification benchmarks demonstrate that our framework matches or exceeds several state-of-the-art UQ methods while incurring minimal overhead. Our source code is available at https://github.com/zzz0527/SPC-UQ.
16. A generalized reduction scheme for the Stochastic Weighted Particle Method
Authors: Matthew Goeckner, Donovan Harcey, Rainier Q Pederson, Axel Niyonzima, John Zweck β’
Published: 2025-09-16 β’
Source: arXiv
The Stochastic Weighted Particle Method (SWPM) of Rjasanow and Wagner is a generalization of the Direct Simulation Monte Carlo method for computing the probability density function of the velocities of a system of interacting particles for applications that include rarefied gas dynamics and plasma processing systems. Key components of a SWPM simulation are a particle grouping technique and particle reduction scheme. These are periodically applied to reduce the computational cost of simulations due to the gradual increase in the number of stochastic particles. A general framework for designing particle reduction schemes is introduced that enforces the preservation of a prescribed set of moments of the distribution through the construction and explicit solution of a system of linear equations for particle weights in terms of particle velocities and the moments to be preserved. This framework is applied to preserve all moments of the distribution up to order three. Numerical simulations are performed to verify the scheme and quantify the degree to which even higher-order moments and tail functionals are preserved. These results reveal an unexpected trade off between the preservation of these higher-order moments and tail functionals.
17. Large Language Model-assisted Meta-optimizer for Automated Design of Constrained Evolutionary Algorithm
Authors: Xu Yang, Rui Wang, Kaiwen Li, Wenhua Li, Weixiong Huang β’
Published: 2025-09-16 β’
Source: arXiv
Meta-black-box optimization has been significantly advanced through the use of large language models (LLMs), yet in fancy on constrained evolutionary optimization. In this work, AwesomeDE is proposed that leverages LLMs as the strategy of meta-optimizer to generate update rules for constrained evolutionary algorithm without human intervention. On the meanwhile, $RTO^2H$ framework is introduced for standardize prompt design of LLMs. The meta-optimizer is trained on a diverse set of constrained optimization problems. Key components, including prompt design and iterative refinement, are systematically analyzed to determine their impact on design quality. Experimental results demonstrate that the proposed approach outperforms existing methods in terms of computational efficiency and solution accuracy. Furthermore, AwesomeDE is shown to generalize well across distinct problem domains, suggesting its potential for broad applicability. This research contributes to the field by providing a scalable and data-driven methodology for automated constrained algorithm design, while also highlighting limitations and directions for future work.
18. Simulating Clinical AI Assistance using Multimodal LLMs: A Case Study in Diabetic Retinopathy
Authors: Nadim Barakat, William Lotter β’
Published: 2025-09-16 β’
Source: arXiv
Diabetic retinopathy (DR) is a leading cause of blindness worldwide, and AI systems can expand access to fundus photography screening. Current FDA-cleared systems primarily provide binary referral outputs, where this minimal output may limit clinical trust and utility. Yet, determining the most effective output format to enhance clinician-AI performance is an empirical challenge that is difficult to assess at scale. We evaluated multimodal large language models (MLLMs) for DR detection and their ability to simulate clinical AI assistance across different output types. Two models were tested on IDRiD and Messidor-2: GPT-4o, a general-purpose MLLM, and MedGemma, an open-source medical model. Experiments included: (1) baseline evaluation, (2) simulated AI assistance with synthetic predictions, and (3) actual AI-to-AI collaboration where GPT-4o incorporated MedGemma outputs. MedGemma outperformed GPT-4o at baseline, achieving higher sensitivity and AUROC, while GPT-4o showed near-perfect specificity but low sensitivity. Both models adjusted predictions based on simulated AI inputs, but GPT-4o's performance collapsed with incorrect ones, whereas MedGemma remained more stable. In actual collaboration, GPT-4o achieved strong results when guided by MedGemma's descriptive outputs, even without direct image access (AUROC up to 0.96). These findings suggest MLLMs may improve DR screening pipelines and serve as scalable simulators for studying clinical AI assistance across varying output configurations. Open, lightweight models such as MedGemma may be especially valuable in low-resource settings, while descriptive outputs could enhance explainability and clinician trust in clinical workflows.
19. Ultrafast non-adiabatic molecular energy conversion into photons induced by quantized electromagnetic fields
Authors: Arley FlΓ³rez LΓ³pez, Johan F. Triana, JosΓ© Luis Sanz-Vicario β’
Published: 2025-09-16 β’
Source: arXiv
Molecular polaritons within the mid-infrared regime have emerged as a source for modifying and manipulating molecular and photonic properties. However, the development of new methodologies for photon generation is still a challenge in nanophotonics. We propose a molecular model based on the Holstein-quantum-Rabi Hamiltonian, which also incorporates realistic dipole moments and non-adiabatic couplings among electronic excited states, to study the ultrafast photodynamics of diatomic molecules in confined electromagnetic fields within quantized cavities. In addition to vibronic transitions due to intrinsic non-adiabatic couplings, two types of light-induced crossings emerge: one type is located at molecular nuclear geometries where the rotating wave approximation is fulfilled, and another type appears at different geometries where counter-rotating transitions may occur. We make a comprehensive study of polariton photodynamics within a time window of a few tens of femtoseconds, where dissipative mechanisms do not influence the polariton photodynamics. We stress the dramatic change of the polariton energy spectrum as a function of the Huang-Rhys factor when non-adiabatic couplings are included in the model. We conclude that both the molecular non-adiabatic couplings and, more specifically, the counter-rotating couplings in the cavity-molecule interaction play a crucial role in converting vibronic energy into photons through excited dressed states. We also show that the sign of the Huang-Rhys factor has a significant impact on this photon conversion. Our work paves the way for the development of many-photon generation powered by strong light-matter interaction, along with potential applications using alkaline earth monohydride molecules.
20. Single-stream Policy Optimization
Authors: Zhongwen Xu, Zihan Ding β’
Published: 2025-09-16 β’
Source: arXiv
We revisit policy-gradient optimization for Large Language Models (LLMs) from a single-stream perspective. Prevailing group-based methods like GRPO reduce variance with on-the-fly baselines but suffer from critical flaws: frequent degenerate groups erase learning signals, and synchronization barriers hinder scalability. We introduce Single-stream Policy Optimization (SPO), which eliminates these issues by design. SPO replaces per-group baselines with a persistent, KL-adaptive value tracker and normalizes advantages globally across the batch, providing a stable, low-variance learning signal for every sample. Being group-free, SPO enables higher throughput and scales effectively in long-horizon or tool-integrated settings where generation times vary. Furthermore, the persistent value tracker naturally enables an adaptive curriculum via prioritized sampling. Experiments using Qwen3-8B show that SPO converges more smoothly and attains higher accuracy than GRPO, while eliminating computation wasted on degenerate groups. Ablation studies confirm that SPO's gains stem from its principled approach to baseline estimation and advantage normalization, offering a more robust and efficient path for LLM reasoning. Across five hard math benchmarks with Qwen3 8B, SPO improves the average maj@32 by +3.4 percentage points (pp) over GRPO, driven by substantial absolute point gains on challenging datasets, including +7.3 pp on BRUMO 25, +4.4 pp on AIME 25, +3.3 pp on HMMT 25, and achieves consistent relative gain in pass@$k$ across the evaluated $k$ values. SPO's success challenges the prevailing trend of adding incidental complexity to RL algorithms, highlighting a path where fundamental principles, not architectural workarounds, drive the next wave of progress in LLM reasoning.
21. Numerical Investigations of Jet A Hexane Binary Fuel Droplet Impact on a Heated Solid Surface
Authors: Arghya Paul, Kanak Raj, Pratim Kumar β’
Published: 2025-09-16 β’
Source: arXiv
In the present work, Jet A-Hexane binary fuel droplet impact dynamics on heated solid surfaces were studied numerically. This study is crucial for practical applications such as fuel injection in combustors and thermal management of engine components. Volume of fluid (VOF) method was used to analyse the impact dynamics, spreading behaviour, vaporisation, and heat transfer of n-hexane and Jet-A blended fuel droplets on heated stainless-steel surfaces. Droplet impact dynamics were investigated for two Weber numbers, i.e., 25 and 50, and surface temperatures ranging from 50C to 227C to capture transitions from gentle spreading to nucleate boiling and rebound phenomena. This work examines how fuel blending influences inertia, lamella formation, vapour recoil, and film boiling regimes. The results show that higher inertia in blended fuels enhances spreading but also triggers stronger vapour recoil at elevated temperatures, leading to droplet rebound. In contrast, pure hexane transitions to a stable film boiling regime at high surface temperatures, resulting in a decline in smoother heat flux. New correlations were developed linking Weber number, spreading ratio, and wall heat flux, offering predictive insights for real-world combustion scenarios. These findings advance the understanding of bi-component fuel droplet impacts on heated surfaces and provide a framework for designing efficient spray systems in combustors and thermal management in propulsion and power generation applications.
22. Efficiency, Envy, and Incentives in Combinatorial Assignment
Authors: ThΓ nh Nguyen, Alexander Teytelboym, Shai Vardi β’
Published: 2025-09-16 β’
Source: arXiv
Ensuring efficiency and envy-freeness in allocating indivisible goods without money often requires randomization. However, existing combinatorial assignment mechanisms (for applications such as course allocation, food banks, and refugee resettlement) guarantee these properties either ex ante or ex post, but not both. We propose a new class of mechanisms based on Competitive Equilibrium from Random Incomes (CERI): Agents receive random token budgets and select optimal lotteries at competitive prices that clear markets in expectation. Our main insight is to let the CERI price vector guide all ex-post allocations. We show that all ordinally efficient allocations are CERI allocations, which can be implemented as lotteries over near-feasible Pareto-efficient outcomes. With identical budget distributions, CERI allocations are ordinally envy-free; with budget distributions on small supports, ex-post allocations are envy-free up to one good. Moreover, we design an asymptotically efficient implementation of CERI that satisfies a strong new non-manipulability property in large markets.
23. TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving
Authors: Jiawei Wang, Haowei Sun, Xintao Yan, Shuo Feng, Jun Gao, Henry X. Liu β’
Published: 2025-09-16 β’
Source: arXiv
Safe and scalable deployment of end-to-end (E2E) autonomous driving requires extensive and diverse data, particularly safety-critical events. Existing data are mostly generated from simulators with a significant sim-to-real gap or collected from on-road testing that is costly and unsafe. This paper presents TeraSim-World, an automated pipeline that synthesizes realistic and geographically diverse safety-critical data for E2E autonomous driving at anywhere in the world. Starting from an arbitrary location, TeraSim-World retrieves real-world maps and traffic demand from geospatial data sources. Then, it simulates agent behaviors from naturalistic driving datasets, and orchestrates diverse adversities to create corner cases. Informed by street views of the same location, it achieves photorealistic, geographically grounded sensor rendering via the frontier video generation model Cosmos-Drive. By bridging agent and sensor simulations, TeraSim-World provides a scalable and critical~data synthesis framework for training and evaluation of E2E autonomous driving systems.
24. Weakly and Self-Supervised Class-Agnostic Motion Prediction for Autonomous Driving
Authors: Ruibo Li, Hanyu Shi, Zhe Wang, Guosheng Lin β’
Published: 2025-09-16 β’
Source: arXiv
Understanding motion in dynamic environments is critical for autonomous driving, thereby motivating research on class-agnostic motion prediction. In this work, we investigate weakly and self-supervised class-agnostic motion prediction from LiDAR point clouds. Outdoor scenes typically consist of mobile foregrounds and static backgrounds, allowing motion understanding to be associated with scene parsing. Based on this observation, we propose a novel weakly supervised paradigm that replaces motion annotations with fully or partially annotated (1%, 0.1%) foreground/background masks for supervision. To this end, we develop a weakly supervised approach utilizing foreground/background cues to guide the self-supervised learning of motion prediction models. Since foreground motion generally occurs in non-ground regions, non-ground/ground masks can serve as an alternative to foreground/background masks, further reducing annotation effort. Leveraging non-ground/ground cues, we propose two additional approaches: a weakly supervised method requiring fewer (0.01%) foreground/background annotations, and a self-supervised method without annotations. Furthermore, we design a Robust Consistency-aware Chamfer Distance loss that incorporates multi-frame information and robust penalty functions to suppress outliers in self-supervised learning. Experiments show that our weakly and self-supervised models outperform existing self-supervised counterparts, and our weakly supervised models even rival some supervised ones. This demonstrates that our approaches effectively balance annotation effort and performance.
25. Accelerating Discovery: Rapid Literature Screening with LLMs
Authors: Santiago Matalonga, Domenico Amalfitano, Jean Carlo Rossa Hauck, MartΓn Solari, Guilherme H. Travassos β’
Published: 2025-09-16 β’
Source: arXiv
Background: Conducting Multi Vocal Literature Reviews (MVLRs) is often time and effort-intensive. Researchers must review and filter a large number of unstructured sources, which frequently contain sparse information and are unlikely to be included in the final study. Our experience conducting an MVLR on Context-Aware Software Systems (CASS) Testing in the avionics domain exemplified this challenge, with over 8,000 highly heterogeneous documents requiring review. Therefore, we developed a Large Language Model (LLM) assistant to support the search and filtering of documents. Aims: To develop and validate an LLM based tool that can support researchers in performing the search and filtering of documents for an MVLR without compromising the rigor of the research protocol. Method: We applied sound engineering practices to develop an on-premises LLM-based tool incorporating Retrieval Augmented Generation (RAG) to process candidate sources. Progress towards the aim was quantified using the Positive Percent Agreement (PPA) as the primary metric to ensure the performance of the LLM based tool. Convenience sampling, supported by human judgment and statistical sampling, were used to verify and validate the tool's quality-in-use. Results: The tool currently demonstrates a PPA agreement with human researchers of 90% for sources that are not relevant to the study. Development details are shared to support domain-specific adaptation of the tool. Conclusions: Using LLM-based tools to support academic researchers in rigorous MVLR is feasible. These tools can free valuable time for higher-level, abstract tasks. However, researcher participation remains essential to ensure that the tool supports thorough research.
26. Cyclic Variational Quantum Eigensolver: Escaping Barren Plateaus through Staircase Descent
Authors: Hao Zhang, Ayush Asthana β’
Published: 2025-09-16 β’
Source: arXiv
We introduce the Cyclic Variational Quantum Eigensolver (CVQE), a hardware-efficient framework for accurate ground-state quantum simulation on noisy intermediate-scale quantum (NISQ) devices. CVQE departs from conventional VQE by incorporating a measurement-driven feedback cycle: Slater determinants with significant sampling probability are iteratively added to the reference superposition, while a fixed entangler (e.g., single-layer UCCSD) is reused throughout. This adaptive reference growth systematically enlarges the variational space in most promising directions, avoiding manual ansatz or operator-pool design, costly searches, and preserving compile-once circuits. The strategy parallels multi-reference methods in quantum chemistry, while remaining fully automated on quantum hardware. Remarkably, CVQE exhibits a distinctive staircase-like descent pattern, where successive energy drops sharply signal efficient escape from barren plateaus. Benchmarks show that CVQE consistently maintains chemical precision across correlation regimes, outperforms fixed UCCSD by several orders of magnitude, and achieves favorable accuracy-cost trade-offs compared to the Selected Configuration Interaction. These results position CVQE as a scalable, interpretable, and resource-efficient paradigm for near-term quantum simulation.
27. A Synthetic Data Pipeline for Supporting Manufacturing SMEs in Visual Assembly Control
Authors: Jonas Werheid, Shengjie He, Aymen Gannouni, Anas Abdelrazeq, Robert H. Schmitt β’
Published: 2025-09-16 β’
Source: arXiv
Quality control of assembly processes is essential in manufacturing to ensure not only the quality of individual components but also their proper integration into the final product. To assist in this matter, automated assembly control using computer vision methods has been widely implemented. However, the costs associated with image acquisition, annotation, and training of computer vision algorithms pose challenges for integration, especially for small- and medium-sized enterprises (SMEs), which often lack the resources for extensive training, data collection, and manual image annotation. Synthetic data offers the potential to reduce manual data collection and labeling. Nevertheless, its practical application in the context of assembly quality remains limited. In this work, we present a novel approach for easily integrable and data-efficient visual assembly control. Our approach leverages simulated scene generation based on computer-aided design (CAD) data and object detection algorithms. The results demonstrate a time-saving pipeline for generating image data in manufacturing environments, achieving a mean Average Precision (mAP@0.5:0.95) up to 99,5% for correctly identifying instances of synthetic planetary gear system components within our simulated training data, and up to 93% when transferred to real-world camera-captured testing data. This research highlights the effectiveness of synthetic data generation within an adaptable pipeline and underscores its potential to support SMEs in implementing resource-efficient visual assembly control solutions.
28. A Design Co-Pilot for Task-Tailored Manipulators
Authors: Jonathan KΓΌlz, Sehoon Ha, Matthias Althoff β’
Published: 2025-09-16 β’
Source: arXiv
Although robotic manipulators are used in an ever-growing range of applications, robot manufacturers typically follow a ``one-fits-all'' philosophy, employing identical manipulators in various settings. This often leads to suboptimal performance, as general-purpose designs fail to exploit particularities of tasks. The development of custom, task-tailored robots is hindered by long, cost-intensive development cycles and the high cost of customized hardware. Recently, various computational design methods have been devised to overcome the bottleneck of human engineering. In addition, a surge of modular robots allows quick and economical adaptation to changing industrial settings. This work proposes an approach to automatically designing and optimizing robot morphologies tailored to a specific environment. To this end, we learn the inverse kinematics for a wide range of different manipulators. A fully differentiable framework realizes gradient-based fine-tuning of designed robots and inverse kinematics solutions. Our generative approach accelerates the generation of specialized designs from hours with optimization-based methods to seconds, serving as a design co-pilot that enables instant adaptation and effective human-AI collaboration. Numerical experiments show that our approach finds robots that can navigate cluttered environments, manipulators that perform well across a specified workspace, and can be adapted to different hardware constraints. Finally, we demonstrate the real-world applicability of our method by setting up a modular robot designed in simulation that successfully moves through an obstacle course.
29. Automating Code Generation for Semiconductor Equipment Control from Developer Utterances with LLMs
Authors: Youngkyoung Kim, Sanghyeok Park, Misoo Kim, Gangho Yoon, Eunseok Lee, Simon S. Woo β’
Published: 2025-09-16 β’
Source: arXiv
Semiconductors form the backbone of modern electronics, with their manufacturing and testing relying on highly specialized equipment and domain-specific programming languages. Equipment languages such as the Algorithmic Pattern Generator (ALPG) are critical for precise hardware control but are challenging to program due to their low-level syntax and steep learning curve. While large language models (LLMs) have shown promise in generating high-level code from natural language, their effectiveness on low-level equipment languages remains limited. To address this, we propose Progressive Knowledge Enhancement (PKE), a novel multi-stage prompting framework that progressively extracts and activates the latent knowledge within LLMs, guiding them from simple to complex examples without extensive fine-tuning. Empirical evaluation on an industrial ALPG dataset shows that PKE significantly outperforms standard prompting and surpasses state-of-the-art methods in generating correct ALPG code, achieving 11.1\% and 15.2\% higher exact match scores compared to the second-best technique. Further analysis of individual components confirms that progressive knowledge extraction based on difficulty enhances accuracy. Our study offer a practical approach to boosting LLM capabilities for specialized low-level programming, supporting greater productivity in semiconductor software development.
30. ORCA: A Comprehensive AI-Driven Platform for Digital Pathology Analysis and Biomarker Discovery
Authors: Noor Shaker, Mohamed AbouZleikha, Nuha Shaker β’
Published: 2025-09-16 β’
Source: arXiv
Digital pathology has emerged as a transformative approach to tissue analysis, offering unprecedented opportunities for objective, quantitative assessment of histopathological features. However, the complexity of implementing artificial intelligence (AI) solutions in pathology workflows has limited widespread adoption. Here we present ORCA (Optimized Research and Clinical Analytics), a comprehensive no-code AI platform specifically designed for digital pathology applications. ORCA addresses critical barriers to AI adoption by providing an intuitive interface that enables pathologists and researchers to train, deploy, and validate custom AI models without programming expertise. The platform integrates advanced deep learning architectures with clinical workflow management, supporting applications from tissue classification and cell segmentation to spatial distribution scoring and novel biomarker discovery. We demonstrate ORCA's capabilities through validation studies across multiple cancer types, showing significant improvements in analytical speed, reproducibility, and clinical correlation compared to traditional manual assessment methods. Our results indicate that ORCA successfully democratizes access to state-of-the-art AI tools in pathology, potentially accelerating biomarker discovery and enhancing precision medicine initiatives.
31. Bridging Threat Models and Detections: Formal Verification via CADP
Authors: Dumitru-Bogdan Prelipcean, CΔtΔlin Dima β’
Published: 2025-09-16 β’
Source: arXiv
Threat detection systems rely on rule-based logic to identify adversarial behaviors, yet the conformance of these rules to high-level threat models is rarely verified formally. We present a formal verification framework that models both detection logic and attack trees as labeled transition systems (LTSs), enabling automated conformance checking via bisimulation and weak trace inclusion. Detection rules specified in the Generic Threat Detection Language (GTDL, a general-purpose detection language we formalize in this work) are assigned a compositional operational semantics, and threat models expressed as attack trees are interpreted as LTSs through a structural trace semantics. Both representations are translated to LNT, a modeling language supported by the CADP toolbox. This common semantic domain enables systematic and automated verification of detection coverage. We evaluate our approach on real-world malware scenarios such as LokiBot and Emotet and provide scalability analysis through parametric synthetic models. Results confirm that our methodology identifies semantic mismatches between threat models and detection rules, supports iterative refinement, and scales to realistic threat landscapes.
32. Drone Detection Using a Low-Power Neuromorphic Virtual Tripwire
Authors: Anton Eldeborg Lundin, Rasmus Winzell, Hanna Hamrell, David Gustafsson, Hannes OvrΓ©n β’
Published: 2025-09-16 β’
Source: arXiv
Small drones are an increasing threat to both military personnel and civilian infrastructure, making early and automated detection crucial. In this work we develop a system that uses spiking neural networks and neuromorphic cameras (event cameras) to detect drones. The detection model is deployed on a neuromorphic chip making this a fully neuromorphic system. Multiple detection units can be deployed to create a virtual tripwire which detects when and where drones enter a restricted zone. We show that our neuromorphic solution is several orders of magnitude more energy efficient than a reference solution deployed on an edge GPU, allowing the system to run for over a year on battery power. We investigate how synthetically generated data can be used for training, and show that our model most likely relies on the shape of the drone rather than the temporal characteristics of its propellers. The small size and low power consumption allows easy deployment in contested areas or locations that lack power infrastructure.
33. Toward PDDL Planning Copilot
Authors: Yarin Benyamin, Argaman Mordoch, Shahaf S. Shperberg, Roni Stern β’
Published: 2025-09-16 β’
Source: arXiv
Large Language Models (LLMs) are increasingly being used as autonomous agents capable of performing complicated tasks. However, they lack the ability to perform reliable long-horizon planning on their own. This paper bridges this gap by introducing the Planning Copilot, a chatbot that integrates multiple planning tools and allows users to invoke them through instructions in natural language. The Planning Copilot leverages the Model Context Protocol (MCP), a recently developed standard for connecting LLMs with external tools and systems. This approach allows using any LLM that supports MCP without domain-specific fine-tuning. Our Planning Copilot supports common planning tasks such as checking the syntax of planning problems, selecting an appropriate planner, calling it, validating the plan it generates, and simulating their execution. We empirically evaluate the ability of our Planning Copilot to perform these tasks using three open-source LLMs. The results show that the Planning Copilot highly outperforms using the same LLMs without the planning tools. We also conducted a limited qualitative comparison of our tool against Chat GPT-5, a very recent commercial LLM. Our results shows that our Planning Copilot significantly outperforms GPT-5 despite relying on a much smaller LLM. This suggests dedicated planning tools may be an effective way to enable LLMs to perform planning tasks.
34. Jailbreaking Large Language Models Through Content Concretization
Authors: Johan WahrΓ©us, Ahmed Hussain, Panos Papadimitratos β’
Published: 2025-09-16 β’
Source: arXiv
Large Language Models (LLMs) are increasingly deployed for task automation and content generation, yet their safety mechanisms remain vulnerable to circumvention through different jailbreaking techniques. In this paper, we introduce \textit{Content Concretization} (CC), a novel jailbreaking technique that iteratively transforms abstract malicious requests into concrete, executable implementations. CC is a two-stage process: first, generating initial LLM responses using lower-tier, less constrained safety filters models, then refining them through higher-tier models that process both the preliminary output and original prompt. We evaluate our technique using 350 cybersecurity-specific prompts, demonstrating substantial improvements in jailbreak Success Rates (SRs), increasing from 7\% (no refinements) to 62\% after three refinement iterations, while maintaining a cost of 7.5\textcent~per prompt. Comparative A/B testing across nine different LLM evaluators confirms that outputs from additional refinement steps are consistently rated as more malicious and technically superior. Moreover, manual code analysis reveals that generated outputs execute with minimal modification, although optimal deployment typically requires target-specific fine-tuning. With eventual improved harmful code generation, these results highlight critical vulnerabilities in current LLM safety frameworks.
35. InfoGain-RAG: Boosting Retrieval-Augmented Generation via Document Information Gain-based Reranking and Filtering
Authors: Zihan Wang, Zihan Liang, Zhou Shao, Yufei Ma, Huangyu Dai, Ben Chen, Lingtao Mao, Chenyi Lei, Yuqing Ding, Han Li β’
Published: 2025-09-16 β’
Source: arXiv
Retrieval-Augmented Generation (RAG) has emerged as a promising approach to address key limitations of Large Language Models (LLMs), such as hallucination, outdated knowledge, and lacking reference. However, current RAG frameworks often struggle with identifying whether retrieved documents meaningfully contribute to answer generation. This shortcoming makes it difficult to filter out irrelevant or even misleading content, which notably impacts the final performance. In this paper, we propose Document Information Gain (DIG), a novel metric designed to quantify the contribution of retrieved documents to correct answer generation. DIG measures a document's value by computing the difference of LLM's generation confidence with and without the document augmented. Further, we introduce InfoGain-RAG, a framework that leverages DIG scores to train a specialized reranker, which prioritizes each retrieved document from exact distinguishing and accurate sorting perspectives. This approach can effectively filter out irrelevant documents and select the most valuable ones for better answer generation. Extensive experiments across various models and benchmarks demonstrate that InfoGain-RAG can significantly outperform existing approaches, on both single and multiple retrievers paradigm. Specifically on NaturalQA, it achieves the improvements of 17.9%, 4.5%, 12.5% in exact match accuracy against naive RAG, self-reflective RAG and modern ranking-based RAG respectively, and even an average of 15.3% increment on advanced proprietary model GPT-4o across all datasets. These results demonstrate the feasibility of InfoGain-RAG as it can offer a reliable solution for RAG in multiple applications.