1. RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
Authors: Zhilin Wang, Jiaqi Zeng, Olivier Delalleau, Ellie Evans, Daniel Egert, Hoo-Chang Shin, Felipe Soares, Yi Dong, Oleksii Kuchaiev β’
Published: 2025-09-25 β’
Source: arXiv
Reinforcement Learning with Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) are the main RL paradigms used in LLM post-training, each offering distinct advantages. However, RLHF struggles with interpretability and reward hacking because it relies on human judgments that usually lack explicit criteria, whereas RLVR is limited in scope by its focus on correctness-based verifiers. We propose Reinforcement Learning with Binary Flexible Feedback (RLBFF), which combines the versatility of human-driven preferences with the precision of rule-based verification, enabling reward models to capture nuanced aspects of response quality beyond mere correctness. RLBFF extracts principles that can be answered in a binary fashion (e.g. accuracy of information: yes, or code readability: no) from natural language feedback. Such principles can then be used to ground Reward Model training as an entailment task (response satisfies or does not satisfy an arbitrary principle). We show that Reward Models trained in this manner can outperform Bradley-Terry models when matched for data and achieve top performance on RM-Bench (86.2%) and JudgeBench (81.4%, #1 on leaderboard as of September 24, 2025). Additionally, users can specify principles of interest at inference time to customize the focus of our reward models, in contrast to Bradley-Terry models. Finally, we present a fully open source recipe (including data) to align Qwen3-32B using RLBFF and our Reward Model, to match or exceed the performance of o3-mini and DeepSeek R1 on general alignment benchmarks of MT-Bench, WildBench, and Arena Hard v2 (at <5% of the inference cost).
2. Air Quality and Greenhouse Gas Emissions Assessment of Data Centers in Texas: Quantifying Impacts and Environmental Tradeoffs
Authors: Ebrahim Eslami β’
Published: 2025-09-25 β’
Source: arXiv
This study assesses air quality (AQ) and greenhouse gas (GHG) emissions from the rapid expansion of data centers in Texas, a major hub due to infrastructure, electricity markets, and business conditions. AQ impacts were separated from GHG emissions to clarify sources, regulations, and mitigation strategies. Electricity consumption and cooling systems dominate GHG emissions, with a 10 megawatt data center generating about 37,668 metric tons CO2 annually, while construction materials and IT equipment add substantial embodied emissions. Local AQ impacts, often overlooked, arise from diesel backup generators, construction equipment, and commuting. Generator testing alone can emit about 12 metric tons of NOx annually per facility, worsening ozone issues in regions such as Houston and Dallas-Fort Worth. Mitigation strategies include advanced cooling, renewable energy procurement, cleaner backup power (fuel cells, batteries), sustainable construction, and standardized reporting. ERCOT forecasts project 39 to 78 gigawatts of new data center load by 2030, potentially leading to 170 to 205 million metric tons of annual CO2 emissions. Aggressive adoption of renewables and advanced technologies could cut emissions by 50 to 80 percent, avoiding 85 to 165 million metric tons of CO2. The study identifies research and policy gaps, including the need for cumulative air dispersion modeling, AQ-specific regulations, and mandatory efficiency standards. Findings underscore the importance of aligning Texas digital infrastructure growth with environmental and community health protections.
3. NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics
Authors: Yu Yuan, Xijun Wang, Tharindu Wickremasinghe, Zeeshan Nadir, Bole Ma, Stanley H. Chan β’
Published: 2025-09-25 β’
Source: arXiv
A primary bottleneck in large-scale text-to-video generation today is physical consistency and controllability. Despite recent advances, state-of-the-art models often produce unrealistic motions, such as objects falling upward, or abrupt changes in velocity and direction. Moreover, these models lack precise parameter control, struggling to generate physically consistent dynamics under different initial conditions. We argue that this fundamental limitation stems from current models learning motion distributions solely from appearance, while lacking an understanding of the underlying dynamics. In this work, we propose NewtonGen, a framework that integrates data-driven synthesis with learnable physical principles. At its core lies trainable Neural Newtonian Dynamics (NND), which can model and predict a variety of Newtonian motions, thereby injecting latent dynamical constraints into the video generation process. By jointly leveraging data priors and dynamical guidance, NewtonGen enables physically consistent video synthesis with precise parameter control.
4. Einstein@Home Searches for Gamma-ray Pulsars in the Inner Galaxy
Authors: C. J. Clark, M. Di Mauro, J. Wu, B. Allen, O. Behnke, H. B. Eggenstein, B. Machenschalk, L. Nieder, P. M. Saz Parkinson, A. Ashok, P. Bruel, B. McGloughlin, M. A. Papa, F. Camilo, M. Kerr, P. Voraganti Padmanabh, S. M. Ransom β’
Published: 2025-09-25 β’
Source: arXiv
The Fermi Large Area Telescope (LAT) has revealed a mysterious extended excess of GeV gamma-ray emission around the Galactic Center, which can potentially be explained by unresolved emission from a population of pulsars, particularly millisecond pulsars (MSPs), in the Galactic bulge. We used the distributed volunteer computing system Einstein@Home to search the Fermi-LAT data for gamma-ray pulsations from sources in the inner Galaxy, to try to identify the brightest members of this putative population. We discovered four new pulsars, including one new MSP and one young pulsar whose angular separation to the Galactic Center of 0.93{\deg} is the smallest of any known gamma-ray pulsar. We demonstrate a phase-resolved difference imaging technique that allows the flux from this pulsar to be disentangled from the diffuse Galactic Center emission. No radio pulsations were detected from the four new pulsars in archival radio observations or during the MPIfR-MeerKAT Galactic Plane Survey. While the distances to these pulsars remain uncertain, we find that it is more likely that they are all foreground sources from the Galactic disk, rather than pulsars originating from the predicted bulge population. Nevertheless, our results are not incompatible with an MSP explanation for the GC excess, as only one or two members of this population would have been detectable in our searches.
5. J-PLUS: Understanding outlier white dwarfs in the third data release via dimensionality reduction
Authors: C. LΓ³pez-Sanjuan, P. -E. Tremblay, A. del Pino, H. DomΓnguez SΓ‘nchez, H. VΓ‘zquez RamiΓ³, A. Ederoclite, A. J. Cenarro, A. MarΓn-Franch, B. Anguiano, T. Civera, P. Cruz, J. A. FernΓ‘ndez-Ontiveros, F. M. JimΓ©nez-Esteban, A. Rebassa-Mansergas, J. Vega-Ferrero, J. Alcaniz, R. E. Angulo, D. CristΓ³bal-Hornillos, R. A. Dupke, C. HernΓ‘ndez-Monteagudo, M. Moles, L. SodrΓ© Jr., J. Varela β’
Published: 2025-09-25 β’
Source: arXiv
We present the white dwarf catalog derived from the third data release of the Javalambre Photometric Local Universe Survey (J-PLUS DR3), which covers 3284 deg2 using 12 optical filters. A particular focus is given to the classification of outlier sources. We applied a Bayesian fitting process to the 12-band J-PLUS photometry of white dwarf candidates from Gaia EDR3. The derived parameters were effective temperature, surface gravity, and parallax. We used theoretical models from H- and He-dominated atmospheres, with priors applied to parallax and spectral type. From the posteriors, we derived the probability of an H-dominated atmosphere and of calcium absorption for each source. Outliers were identified as sources with chi2 > 23.2, indicating significant deviations from the best-fitting model. We analyzed the residuals from the fits using the UMAP technique, which enables the classification of outliers into distinct categories. The catalog includes 14844 white dwarfs with r < 20 mag and 1 < parallax < 100 mas, with 72% of the sources lacking spectroscopic (R > 500) classification. The application of UMAP identified three main types of outliers: random measurement fluctuations (391 sources), metal-polluted white dwarfs (98 sources), and two-component systems (282 sources). The last category also includes white dwarfs with strong carbon absorption lines. We validated the J-PLUS classifications by comparison with spectroscopy from SDSS and DESI, and with Gaia BP/RP spectra, confirming a one-to-one correspondence between J-PLUS photometric and spectroscopic classifications. The J-PLUS DR3 white dwarf catalog provides a robust dataset for statistical studies. The use of dimensionality reduction techniques enhances the identification of peculiar objects, making this catalog a valuable resource for the selection of interesting targets such as metal-polluted white dwarfs or binary systems.
6. Quantized Visual Geometry Grounded Transformer
Authors: Weilun Feng, Haotong Qin, Mingqiang Wu, Chuanguang Yang, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Yulun Zhang, Michele Magno, Yongjun Xu β’
Published: 2025-09-25 β’
Source: arXiv
Learning-based 3D reconstruction models, represented by Visual Geometry Grounded Transformers (VGGTs), have made remarkable progress with the use of large-scale transformers. Their prohibitive computational and memory costs severely hinder real-world deployment. Post-Training Quantization (PTQ) has become a common practice for compressing and accelerating models. However, we empirically observe that PTQ faces unique obstacles when compressing billion-scale VGGTs: the data-independent special tokens induce heavy-tailed activation distributions, while the multi-view nature of 3D data makes calibration sample selection highly unstable. This paper proposes the first Quantization framework for VGGTs, namely QuantVGGT. This mainly relies on two technical contributions: First, we introduce Dual-Smoothed Fine-Grained Quantization, which integrates pre-global Hadamard rotation and post-local channel smoothing to mitigate heavy-tailed distributions and inter-channel variance robustly. Second, we design Noise-Filtered Diverse Sampling, which filters outliers via deep-layer statistics and constructs frame-aware diverse calibration clusters to ensure stable quantization ranges. Comprehensive experiments demonstrate that QuantVGGT achieves the state-of-the-art results across different benchmarks and bit-width, surpassing the previous state-of-the-art generic quantization method with a great margin. We highlight that our 4-bit QuantVGGT can deliver a 3.7$\times$ memory reduction and 2.5$\times$ acceleration in real-hardware inference, while maintaining reconstruction accuracy above 98\% of its full-precision counterpart. This demonstrates the vast advantages and practicality of QuantVGGT in resource-constrained scenarios. Our code is released in https://github.com/wlfeng0509/QuantVGGT.
7. Outflow-cloud interaction as the possible origin of the peculiar radio emission in the tidal disruption event AT2018cqh
Authors: Lei Yang, Xinwen Shu, Goubin Mou, Yongquan Xue, Luming Sun, Fabao Zhang, Zhumao Zhang, Yibo Wang, Tao Wu, Ning Jiang, Hucheng Ding, Tinggui Wang β’
Published: 2025-09-25 β’
Source: arXiv
AT2018cqh is a unique optical tidal disruption event (TDE) discovered in a dwarf galaxy exhibiting delayed X-ray and radio flares. We present the results from high-resolution VLBA and e-MERLIN radio observations of AT2018cqh extending to $\delta$t $\sim$ 2250 days post discovery, which reveal a compact radio emission, unresolved at a scale of <~ 0.13 pc at 7.6 GHz, with a high brightness temperature of $T_b$ ~> 4.03 $\times$ 10$^{9}$ K. The radio spectral energy distribution (SED) is found to gradually shift towards a higher peak flux density and frequency over a period of $\sim$1000 days. An equipartition analysis suggests that there is a little change in the radio emitting region over this period, while the electron density increases by a factor of 3. The radio light curve at 0.89 GHz continues to rise, with a bump feature lasting for 240 days. These properties are in contrast to the predictions of standard shockwave model from a diffuse circumnuclear medium, but could be explained if dense clouds exist in the circumnuclear environment. The latter scenario is supported by our hydrodynamic simulations of the interaction of TDE outflow with a cloud, which can reproduce the temporal evolution in the radio SED. This work highlights the importance of the outflow-cloud interaction in explaining the delayed, fast-rising radio emission observed in some TDEs, especially those occurring in galaxies with pre-existing AGN activity.
8. Emission line tracers of galactic outflows driven by stellar feedback in simulations of isolated disk galaxies
Authors: Elliot L. Howatson, Alexander J. Richings, Elke Roediger, Claude-Andre Faucher-Giguere, Tom Theuns, Yuankang Liu, Tsang Keung Chan, Oliver Thompson, Cody Carr, Daniel Angles-Alcazar β’
Published: 2025-09-25 β’
Source: arXiv
Hydrodynamic simulations can connect outflow observables to the physical conditions of outflowing gas. Here, we use simulations of isolated disk galaxies ranging from dwarf mass ($M_{200} = 10^{10}\mathrm{M}_{\odot}$) to Milky Way mass ($M_{200} = 10^{12}\mathrm{M}_{\odot}$), based on the FIRE-2 subgrid models to investigate multiphase galactic outflows. We use the CHIMES non-equilibrium chemistry module to create synthetic spectra of common outflow tracers ([CII]$_{158\rm{\mu m}}$, $\mathrm{CO}_{J(1-0)}$, H$\alpha$ and $[\mathrm{OIII}]_{5007\text{A}}$). Using our synthetic spectra we measure the mass outflow rate, kinetic power and momentum flux using observational techniques. In [CII]$_{158\rm{\mu m}}$ we measure outflow rates of $10^{-4}$ to $1$ $\mathrm{M_{\odot}yr^{-1}}$ across an SFR range of $10^{-3}$ to $1$ $\text{M}_{\odot}\text{yr}^{-1}$, which is in reasonable agreement with observations. The significant discrepancy is in $\mathrm{CO}_{J(1-0)}$, with the simulations lying $\approx1$ dex below the observational sample. We test observational assumptions used to derive outflow properties from synthetic spectra. We find the greatest uncertainty lies in measurements of electron density, as estimates using the SII doublet can overestimate the actual electron density by up to 2 dex, which changes mass outflow rates by up to 4 dex. We also find that molecular outflows are especially sensitive to the conversion factor between CO luminosity and H2 mass, with outflow rates changing by up to 4 dex in our least massive galaxy. Comparing the outflow properties derived from the synthetic spectra to those derived directly from the simulation, we find that [CII]$_{158\rm{\mu m}}$ probes outflows at greater distances from the disk, whilst we find that molecular gas does not survive at large distances within outflows within our modestly star-forming disk galaxies simulated in this work.
9. The role of synthetic data in Multilingual, Multi-cultural AI systems: Lessons from Indic Languages
Authors: Pranjal A. Chitale, Varun Gumma, Sanchit Ahuja, Prashant Kodali, Manan Uppadhyay, Deepthi Sudharsan, Sunayana Sitaram β’
Published: 2025-09-25 β’
Source: arXiv
Developing AI systems that operate effectively across languages while remaining culturally grounded is a long-standing challenge, particularly in low-resource settings. Synthetic data provides a promising avenue, yet its effectiveness in multilingual and multicultural contexts remains underexplored. We investigate the creation and impact of synthetic, culturally contextualized datasets for Indian languages through a bottom-up generation strategy that prompts large open-source LLMs (>= 235B parameters) to ground data generation in language-specific Wikipedia content. This approach complements the dominant top-down paradigm of translating synthetic datasets from high-resource languages such as English. We introduce Updesh, a high-quality large-scale synthetic instruction-following dataset comprising 9.5M data points across 13 Indian languages, encompassing diverse reasoning and generative tasks with an emphasis on long-context, multi-turn capabilities, and alignment with Indian cultural contexts. A comprehensive evaluation incorporating both automated metrics and human annotation across 10k assessments indicates that generated data is high quality; though, human evaluation highlights areas for further improvement. Additionally, we perform downstream evaluations by fine-tuning models on our dataset and assessing the performance across 15 diverse multilingual datasets. Models trained on Updesh consistently achieve significant gains on generative tasks and remain competitive on multiple-choice style NLU tasks. Notably, relative improvements are most pronounced in low and medium-resource languages, narrowing their gap with high-resource languages. These findings provide empirical evidence that effective multilingual AI requires multi-faceted data curation and generation strategies that incorporate context-aware, culturally grounded methodologies.
10. Modeling Ferrimagnets in MuMax3: Temperature-Dependent Skyrmion Dynamics
Authors: Valerii Antonov, Mikhail Letushev, Michail Bazrov, Zhimba Namsaraev, Ekaterina Steblii, Aleksey Kozlov, Aleksandr Davydenko, Maksim Stebliy β’
Published: 2025-09-25 β’
Source: arXiv
In this work, we propose an approach to modeling ferrimagnets in MuMax3. We show that by specifying two interacting magnetic sublattices as separate layers, it is possible to reproduce a sperimagnetic-like ordering of magnetization. In such a system, magnetic and angular momentum compensation states can be achieved only by varying the temperature while keeping other parameters fixed. This behavior arises from the different temperature dependencies of the magnetization projections in the sublattices, as determined by their thermal functions. We also investigated the motion of a skyrmion under the action of a spin-polarized current. By changing the temperature, we observed both the disappearance of the skyrmion Hall effect at the angular compensation point and the maximum velocity of translational motion. The latter effect requires a modified version of MuMax3 that allows the g-factor to be specified for different regions. The proposed approach can also be applied to study other phenomena in ferrimagnets, including the influence of composition on the magnetic and angular compensation temperatures, the tilted phase, domain-wall motion, and effects arising from non-uniform current or temperature distributions.
11. DisCoCLIP: A Distributional Compositional Tensor Network Encoder for Vision-Language Understanding
Authors: Kin Ian Lo, Hala Hawashin, Mina Abbaszadeh, Tilen Limback-Stokin, Hadi Wazni, Mehrnoosh Sadrzadeh β’
Published: 2025-09-25 β’
Source: arXiv
Recent vision-language models excel at large-scale image-text alignment but often neglect the compositional structure of language, leading to failures on tasks that hinge on word order and predicate-argument structure. We introduce DisCoCLIP, a multimodal encoder that combines a frozen CLIP vision transformer with a novel tensor network text encoder that explicitly encodes syntactic structure. Sentences are parsed with a Combinatory Categorial Grammar parser to yield distributional word tensors whose contractions mirror the sentence's grammatical derivation. To keep the model efficient, high-order tensors are factorized with tensor decompositions, reducing parameter count from tens of millions to under one million. Trained end-to-end with a self-supervised contrastive loss, DisCoCLIP markedly improves sensitivity to verb semantics and word order: it raises CLIP's SVO-Probes verb accuracy from 77.6% to 82.4%, boosts ARO attribution and relation scores by over 9% and 4%, and achieves 93.7% on a newly introduced SVO-Swap benchmark. These results demonstrate that embedding explicit linguistic structure via tensor networks yields interpretable, parameter-efficient representations that substantially improve compositional reasoning in vision-language tasks.
12. Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
Authors: Dingzirui Wang, Xuanliang Zhang, Keyan Xu, Qingfu Zhu, Wanxiang Che, Yang Deng β’
Published: 2025-09-25 β’
Source: arXiv
Existing research indicates that the output of Chain-of-Thought (CoT) is significantly affected by input perturbations. Although many methods aim to mitigate such impact by optimizing prompts, a theoretical explanation of how these perturbations influence CoT outputs remains an open area of research. This gap limits our in-depth understanding of how input perturbations propagate during the reasoning process and hinders further improvements in prompt optimization methods. Therefore, in this paper, we theoretically analyze the effect of input perturbations on the fluctuation of CoT outputs. We first derive an upper bound for input perturbations under the condition that the output fluctuation is within an acceptable range, based on which we prove that: (i) This upper bound is positively correlated with the number of reasoning steps in the CoT; (ii) Even an infinitely long reasoning process cannot eliminate the impact of input perturbations. We then apply these conclusions to the Linear Self-Attention (LSA) model, which can be viewed as a simplified version of the Transformer. For the LSA model, we prove that the upper bound for input perturbation is negatively correlated with the norms of the input embedding and hidden state vectors. To validate this theoretical analysis, we conduct experiments on three mainstream datasets and four mainstream models. The experimental results align with our theoretical analysis, empirically demonstrating the correctness of our findings.
13. Taxonomy-aware Dynamic Motion Generation on Hyperbolic Manifolds
Authors: Luis Augenstein, NoΓ©mie Jaquier, Tamim Asfour, Leonel Rozo β’
Published: 2025-09-25 β’
Source: arXiv
Human-like motion generation for robots often draws inspiration from biomechanical studies, which often categorize complex human motions into hierarchical taxonomies. While these taxonomies provide rich structural information about how movements relate to one another, this information is frequently overlooked in motion generation models, leading to a disconnect between the generated motions and their underlying hierarchical structure. This paper introduces the \ac{gphdm}, a novel approach that learns latent representations preserving both the hierarchical structure of motions and their temporal dynamics to ensure physical consistency. Our model achieves this by extending the dynamics prior of the Gaussian Process Dynamical Model (GPDM) to the hyperbolic manifold and integrating it with taxonomy-aware inductive biases. Building on this geometry- and taxonomy-aware frameworks, we propose three novel mechanisms for generating motions that are both taxonomically-structured and physically-consistent: two probabilistic recursive approaches and a method based on pullback-metric geodesics. Experiments on generating realistic motion sequences on the hand grasping taxonomy show that the proposed GPHDM faithfully encodes the underlying taxonomy and temporal dynamics, and generates novel physically-consistent trajectories.
14. Does FLUX Already Know How to Perform Physically Plausible Image Composition?
Authors: Shilin Lu, Zhuming Lian, Zihan Zhou, Shaocong Zhang, Chen Zhao, Adams Wai-Kin Kong β’
Published: 2025-09-25 β’
Source: arXiv
Image composition aims to seamlessly insert a user-specified object into a new scene, but existing models struggle with complex lighting (e.g., accurate shadows, water reflections) and diverse, high-resolution inputs. Modern text-to-image diffusion models (e.g., SD3.5, FLUX) already encode essential physical and resolution priors, yet lack a framework to unleash them without resorting to latent inversion, which often locks object poses into contextually inappropriate orientations, or brittle attention surgery. We propose SHINE, a training-free framework for Seamless, High-fidelity Insertion with Neutralized Errors. SHINE introduces manifold-steered anchor loss, leveraging pretrained customization adapters (e.g., IP-Adapter) to guide latents for faithful subject representation while preserving background integrity. Degradation-suppression guidance and adaptive background blending are proposed to further eliminate low-quality outputs and visible seams. To address the lack of rigorous benchmarks, we introduce ComplexCompo, featuring diverse resolutions and challenging conditions such as low lighting, strong illumination, intricate shadows, and reflective surfaces. Experiments on ComplexCompo and DreamEditBench show state-of-the-art performance on standard metrics (e.g., DINOv2) and human-aligned scores (e.g., DreamSim, ImageReward, VisionReward). Code and benchmark will be publicly available upon publication.
15. Modelling the effect of stellar metallicity on the XUV evolution of low-mass stars and its impact on exoplanet atmospheres/habitability
Authors: Victor See, Charlotte Fairman, Louis Amard, Oliver Hall β’
Published: 2025-09-25 β’
Source: arXiv
Understanding how exoplanet atmospheres evolve is a key question in the context of habitability. One key process governing this evolution is atmospheric evaporation by stellar X-ray and EUV emission (collectively, XUV). As such, the evolution of exoplanet atmospheres is closely tied to the evolution of the host star's magnetic activity. Many studies have modelled the combined evolution of exoplanet atmospheres and their host stars. However, to date, the impact of the host star's metallicity on stellar activity/exoplanet atmosphere evolution has not been explored. In this work, we investigate how stellar metallicity affects the rotation and activity evolution of solar-like stars as well as the corresponding exoplanet atmospheric evolution. We reconfirm previous results that metal-rich stars spin down more rapidly than metal-poor stars. We also find that the XUV flux that an exoplanet in the habitable zone of its host star receives is larger when the host star is more metal-rich. As such, the atmospheres of exoplanets in the habitable zones of metal-rich stars are evaporated more rapidly than exoplanets in the habitable zones of metal-poor stars. Lastly, we find that the atmospheric evolution is most sensitive to the host star metallicity when the host star has a higher mass. In the highest mass solar-stars, the metallicity can have a larger influence on the atmospheric evolution than the initial rotation period of the star.
16. Asymptotic instability for the forced Navier--Stokes equations in critical Besov spaces
Authors: Mikihiro Fujii, Hiroyuki Tsurumi β’
Published: 2025-09-25 β’
Source: arXiv
The asymptotic stability is one of the classical problems in the field of mathematical analysis of fluid mechanics. In $\mathbb{R}^n$ with $n \geq 3$, it is easily proved by the standard argument that if the given small external force decays at temporal infinity, then the small forced Navier--Stokes flow also strongly converges to zero as time tends to infinity in the framework of the critical Besov spaces $\dot{B}_{p,q}^{n/p-1}(\mathbb{R}^n)$ with $1 \leq p < n$ and $1 \leq q < \infty$. In the present paper, we show that this asymptotic stability fails for $p \geq n$ with $n \geq 3$ in the sense that there exist arbitrary small external forces whose critical Besov norm decays in large time, whereas the corresponding Navier--Stokes flows oscillate and do not strongly converge as $t \to \infty$ in the framework of the critical Besov spaces $\dot{B}_{p,q}^{n/p-1}(\mathbb{R}^n)$. Moreover, we find that the situation is different in the two-dimensional case $n=2$ and show the forced Navier--Stokes flow is asymptotically unstable in $\dot{B}_{p,1}^{2/p-1}(\mathbb{R}^2)$ for all $1 \leq p \leq \infty$. Our instability does not appear in the linear level but is caused by the nonlinear interaction from external forces.
17. SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips
Authors: Xinyu Lian, Masahiro Tanaka, Olatunji Ruwase, Minjia Zhang β’
Published: 2025-09-25 β’
Source: arXiv
The emergence of Superchips represents a significant advancement in next-generation AI hardware. These Superchips employ a tightly coupled heterogeneous architecture that integrates GPU and CPU on the same package, which offers unprecedented computational power. However, there has been scant research investigating how LLM training benefits from this new architecture. In this work, for the first time, we study LLM training solutions based on offloading for Superchips. We observe important differences between Superchips and traditional loosely-coupled GPU-CPU architecture, which necessitate revisiting prevailing assumptions about offloading. Based on that, we present SuperOffload, a Superchip-centric offloading system that simultaneously uses Hopper GPU, Grace CPU, and NVLink-C2C interconnect more efficiently. SuperOffload accomplishes this via a combination of techniques, such as adaptive weight offloading, bucketization repartitioning, Superchip-aware casting, speculative execution, and a highly optimized Adam optimizer for Grace CPUs. Our evaluation of SuperOffload on NVIDIA GH200 demonstrates up to 2.5x throughput improvement compared to state-of-the-art offloading-based systems, enabling training of up to 25B model on a single Superchip while achieving high training throughput. We also extend SuperOffload with ZeRO-style data parallelism and DeepSpeed-Ulysses sequence parallelism, enabling training of 13B model with sequence lengths up to 1 million tokens on 8 GH200 while achieving 55% MFU.
18. LLM Output Homogenization is Task Dependent
Authors: Shomik Jain, Jack Lanchantin, Maximilian Nickel, Karen Ullrich, Ashia Wilson, Jamelle Watson-Daniels β’
Published: 2025-09-25 β’
Source: arXiv
A large language model can be less helpful if it exhibits output response homogenization. But whether two responses are considered homogeneous, and whether such homogenization is problematic, both depend on the task category. For instance, in objective math tasks, we often expect no variation in the final answer but anticipate variation in the problem-solving strategy. Whereas, for creative writing tasks, we may expect variation in key narrative components (e.g. plot, genre, setting, etc), beyond the vocabulary or embedding diversity produced by temperature-sampling. Previous work addressing output homogenization often fails to conceptualize diversity in a task-dependent way. We address this gap in the literature directly by making the following contributions. (1) We present a task taxonomy comprised of eight task categories that each have distinct conceptualizations of output homogenization. (2) We introduce task-anchored functional diversity to better evaluate output homogenization. (3) We propose a task-anchored sampling technique that increases functional diversity for task categories where homogenization is undesired, while preserving homogenization where it is desired. (4) We challenge the perceived existence of a diversity-quality trade-off by increasing functional diversity while maintaining response quality. Overall, we demonstrate how task dependence improves the evaluation and mitigation of output homogenization.
19. Semantic Edge-Cloud Communication for Real-Time Urban Traffic Surveillance with ViT and LLMs over Mobile Networks
Authors: Murat Arda Onsu, Poonam Lohan, Burak Kantarci, Aisha Syed, Matthew Andrews, Sean Kennedy β’
Published: 2025-09-25 β’
Source: arXiv
Real-time urban traffic surveillance is vital for Intelligent Transportation Systems (ITS) to ensure road safety, optimize traffic flow, track vehicle trajectories, and prevent collisions in smart cities. Deploying edge cameras across urban environments is a standard practice for monitoring road conditions. However, integrating these with intelligent models requires a robust understanding of dynamic traffic scenarios and a responsive interface for user interaction. Although multimodal Large Language Models (LLMs) can interpret traffic images and generate informative responses, their deployment on edge devices is infeasible due to high computational demands. Therefore, LLM inference must occur on the cloud, necessitating visual data transmission from edge to cloud, a process hindered by limited bandwidth, leading to potential delays that compromise real-time performance. To address this challenge, we propose a semantic communication framework that significantly reduces transmission overhead. Our method involves detecting Regions of Interest (RoIs) using YOLOv11, cropping relevant image segments, and converting them into compact embedding vectors using a Vision Transformer (ViT). These embeddings are then transmitted to the cloud, where an image decoder reconstructs the cropped images. The reconstructed images are processed by a multimodal LLM to generate traffic condition descriptions. This approach achieves a 99.9% reduction in data transmission size while maintaining an LLM response accuracy of 89% for reconstructed cropped images, compared to 93% accuracy with original cropped images. Our results demonstrate the efficiency and practicality of ViT and LLM-assisted edge-cloud semantic communication for real-time traffic surveillance.
20. Instruction-tuned Self-Questioning Framework for Multimodal Reasoning
Authors: You-Won Jang, Yu-Jung Heo, Jaeseok Kim, Minsu Lee, Du-Seong Chang, Byoung-Tak Zhang β’
Published: 2025-09-25 β’
Source: arXiv
The field of vision-language understanding has been actively researched in recent years, thanks to the development of Large Language Models~(LLMs). However, it still needs help with problems requiring multi-step reasoning, even for very simple questions. Recent studies adopt LLMs to tackle this problem by iteratively generating sub-questions and answers. However, there are disadvantages such as 1) the fine-grained visual contents of images are not available using LLMs that cannot read visual information, 2) internal mechanisms are inaccessible and difficult to reproduce by using black-box LLMs. To solve these problems, we propose the SQ (Self-Questioning)-InstructBLIP, which improves inference performance by generating image-aware informative sub-questions and sub-answers iteratively. The SQ-InstructBLIP, which consists of a Questioner, Answerer, and Reasoner that share the same architecture. Questioner and Answerer generate sub-questions and sub-answers to help infer the main-question, and Reasoner performs reasoning on the main-question considering the generated sub-question information. Our experiments show that the proposed method SQ-InstructBLIP, which uses the generated sub-questions as additional information when solving the VQA task, performs more accurate reasoning than the previous works.
21. Explaining Fine Tuned LLMs via Counterfactuals A Knowledge Graph Driven Framework
Authors: Yucheng Wang, Ziyang Chen, Md Faisal Kabir β’
Published: 2025-09-25 β’
Source: arXiv
The widespread adoption of Low-Rank Adaptation (LoRA) has enabled large language models (LLMs) to acquire domain-specific knowledge with remarkable efficiency. However, understanding how such a fine-tuning mechanism alters a model's structural reasoning and semantic behavior remains an open challenge. This work introduces a novel framework that explains fine-tuned LLMs via counterfactuals grounded in knowledge graphs. Specifically, we construct BioToolKG, a domain-specific heterogeneous knowledge graph in bioinformatics tools and design a counterfactual-based fine-tuned LLMs explainer (CFFTLLMExplainer) that learns soft masks over graph nodes and edges to generate minimal structural perturbations that induce maximum semantic divergence. Our method jointly optimizes structural sparsity and semantic divergence while enforcing interpretability preserving constraints such as entropy regularization and edge smoothness. We apply this framework to a fine-tuned LLaMA-based LLM and reveal that counterfactual masking exposes the model's structural dependencies and aligns with LoRA-induced parameter shifts. This work provides new insights into the internal mechanisms of fine-tuned LLMs and highlights counterfactual graphs as a potential tool for interpretable AI.
22. A Converse For the Capacity of the Shotgun Sequencing Channel with Erasures
Authors: Mohammed Ihsan Ali, Hrishi Narayanan, Prasad Krishnan β’
Published: 2025-09-25 β’
Source: arXiv
The shotgun sequencing process involves fragmenting a long DNA sequence (input string) into numerous shorter, unordered, and overlapping segments (referred to as \emph{reads}). The reads are sequenced, and later aligned to reconstruct the original string. Viewing the sequencing process as the read-phase of a DNA storage system, the information-theoretic capacity of noise-free shotgun sequencing has been characterized in literature. Motivated by the base-wise quality scores available in practical sequencers, a recent work considered the \emph{shotgun sequencing channel with erasures}, in which the symbols in the reads are assumed to contain random erasures. Achievable rates for this channel were identified. In the present work, we obtain a converse for this channel. The arguments for the proof involve a careful analysis of a genie-aided decoder, which knows the correct locations of the reads. The converse is not tight in general. However, it meets the achievability result asymptotically in some channel parameters.
23. Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning
Authors: Xiangru Tang, Wanghan Xu, Yujie Wang, Zijie Guo, Daniel Shao, Jiapeng Chen, Cixuan Zhang, Ziyi Wang, Lixin Zhang, Guancheng Wan, Wenlong Zhang, Lei Bai, Zhenfei Yin, Philip Torr, Hanrui Wang, Di Jin β’
Published: 2025-09-25 β’
Source: arXiv
Large language models (LLMs) have recently shown strong progress on scientific reasoning, yet two major bottlenecks remain. First, explicit retrieval fragments reasoning, imposing a hidden "tool tax" of extra tokens and steps. Second, multi-agent pipelines often dilute strong solutions by averaging across all candidates. We address these challenges with a unified framework that combines implicit retrieval and structured collaboration. At its foundation, a Monitor-based retrieval module operates at the token level, integrating external knowledge with minimal disruption to reasoning. On top of this substrate, Hierarchical Solution Refinement (HSR) iteratively designates each candidate as an anchor to be repaired by its peers, while Quality-Aware Iterative Reasoning (QAIR) adapts refinement to solution quality. On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3\% accuracy -- the highest reported to date, surpassing the strongest agent baseline by 13.4 points and leading frontier LLMs by up to 18.1 points, while simultaneously reducing token usage by 53.5\% and agent steps by 43.7\%. Results on SuperGPQA and TRQA confirm robustness across domains. Error analysis shows that reasoning failures and knowledge gaps co-occur in over 85\% of cases, while diversity analysis reveals a clear dichotomy: retrieval tasks benefit from solution variety, whereas reasoning tasks favor consensus. Together, these findings demonstrate how implicit augmentation and structured refinement overcome the inefficiencies of explicit tool use and uniform aggregation. Code is available at: https://github.com/tangxiangru/Eigen-1.
24. Adoption, usability and perceived clinical value of a UK AI clinical reference platform (iatroX): a mixed-methods formative evaluation of real-world usage and a 1,223-respondent user survey
Authors: Kolawole Tytler β’
Published: 2025-09-25 β’
Source: arXiv
Clinicians face growing information overload from biomedical literature and guidelines, hindering evidence-based care. Retrieval-augmented generation (RAG) with large language models may provide fast, provenance-linked answers, but requires real-world evaluation. We describe iatroX, a UK-centred RAG-based clinical reference platform, and report early adoption, usability, and perceived clinical value from a formative implementation evaluation. Methods comprised a retrospective analysis of usage across web, iOS, and Android over 16 weeks (8 April-31 July 2025) and an in-product intercept survey. Usage metrics were drawn from web and app analytics with bot filtering. A client-side script randomized single-item prompts to approx. 10% of web sessions from a predefined battery assessing usefulness, reliability, and adoption intent. Proportions were summarized with Wilson 95% confidence intervals; free-text comments underwent thematic content analysis. iatroX reached 19,269 unique web users, 202,660 engagement events, and approx. 40,000 clinical queries. Mobile uptake included 1,960 iOS downloads and Android growth (peak >750 daily active users). The survey yielded 1,223 item-level responses: perceived usefulness 86.2% (95% CI 74.8-93.9%; 50/58); would use again 93.3% (95% CI 68.1-99.8%; 14/15); recommend to a colleague 88.4% (95% CI 75.1-95.9%; 38/43); perceived accuracy 75.0% (95% CI 58.8-87.3%; 30/40); reliability 79.4% (95% CI 62.1-91.3%; 27/34). Themes highlighted speed, guideline-linked answers, and UK specificity. Early real-world use suggests iatroX can mitigate information overload and support timely answers for UK clinicians. Limitations include small per-item samples and early-adopter bias; future work will include accuracy audits and prospective studies on workflow and care quality.
25. Detecting disease progression from animal movement using hidden Markov models
Authors: Dongmin Kim, ThΓ©o Michelot, Katherine Mertes, Jared A. Stabach, John Fieberg β’
Published: 2025-09-25 β’
Source: arXiv
Understanding disease dynamics is crucial for managing wildlife populations and assessing spillover risk to domestic animals and humans, but infection data on free-ranging animals are difficult to obtain. Because pathogen and parasite infections can alter host movement, infection status may be inferred from animal trajectories. We present a hidden Markov model (HMM) framework that links observed movement behaviors to unobserved infection states, consistent with epidemiological compartmental models (e.g., susceptible, infected, recovered, dead). Using movement data from 84 reintroduced scimitar-horned oryx (Oryx dammah), 38 confirmed dead in the field and 6 sampled for disease testing, we demonstrate how HMMs can incorporate epidemiological structure through (1) constrained transition probabilities (e.g., to preclude or allow recovery), (2) covariate effects on transmission, and (3) hierarchically structured HMMs (HHMMs) for multi-scale transitions. Comparing veterinary diagnostic reports with model outputs, we found that HMMs with epidemiological constraints successfully identified infection-associated reductions in movement, whereas unconstrained models failed to capture disease progression. Simulations further showed that constrained HMMs accurately classified susceptible, infected, and recovered states. By illustrating flexible formulations and a workflow for model selection, we provide a transferable approach for detecting infection from movement data. This framework can enhance wildlife disease surveillance, guide population management, and improve understanding of disease dynamics.
26. Acoustic-based Gender Differentiation in Speech-aware Language Models
Authors: Junhyuk Choi, Jihwan Seol, Nayeon Kim, Chanhee Cho, EunBin Cho, Bugeun Kim β’
Published: 2025-09-25 β’
Source: arXiv
Speech-aware Language Models (SpeechLMs) have fundamentally transformed human-AI interaction by enabling voice-based communication, yet they may exhibit acoustic-based gender differentiation where identical questions lead to different responses based on the speaker's gender. This paper propose a new dataset that enables systematic analysis of this phenomenon, containing 9,208 speech samples across three categories: Gender-Independent, Gender-Stereotypical, and Gender-Dependent. We further evaluated LLaMA-Omni series and discovered a paradoxical pattern; while overall responses seems identical regardless of gender, the pattern is far from unbiased responses. Specifically, in Gender-Stereotypical questions, all models consistently exhibited male-oriented responses; meanwhile, in Gender-Dependent questions where gender differentiation would be contextually appropriate, models exhibited responses independent to gender instead. We also confirm that this pattern does not result from neutral options nor perceived gender of a voice. When we allow neutral response, models tends to respond neutrally also in Gender-Dependent questions. The paradoxical pattern yet retains when we applied gender neutralization methods on speech. Through comparison between SpeechLMs with corresponding backbone LLMs, we confirmed that these paradoxical patterns primarily stem from Whisper speech encoders, which generates male-oriented acoustic tokens. These findings reveal that current SpeechLMs may not successfully remove gender biases though they prioritized general fairness principles over contextual appropriateness, highlighting the need for more sophisticated techniques to utilize gender information properly in speech technology.
27. SoM-1K: A Thousand-Problem Benchmark Dataset for Strength of Materials
Authors: Qixin Wan, Zilong Wang, Jingwen Zhou, Wanting Wang, Ziheng Geng, Jiachen Liu, Ran Cao, Minghui Cheng, Lu Cheng β’
Published: 2025-09-25 β’
Source: arXiv
Foundation models have shown remarkable capabilities in various domains, but their performance on complex, multimodal engineering problems remains largely unexplored. We introduce SoM-1K, the first large-scale multimodal benchmark dataset dedicated to evaluating foundation models on problems in the strength of materials (SoM). The dataset, which contains 1,065 annotated SoM problems, mirrors real-world engineering tasks by including both textual problem statements and schematic diagrams. Due to the limited capabilities of current foundation models in understanding complicated visual information, we propose a novel prompting strategy called Descriptions of Images (DoI), which provides rigorous expert-generated text descriptions of the visual diagrams as the context. We evaluate eight representative foundation models, including both large language models (LLMs) and vision language models (VLMs). Our results show that current foundation models struggle significantly with these engineering problems, with the best-performing model achieving only 56.6% accuracy. Interestingly, we found that LLMs, when provided with DoI, often outperform VLMs provided with visual diagrams. A detailed error analysis reveals that DoI plays a crucial role in mitigating visual misinterpretation errors, suggesting that accurate text-based descriptions can be more effective than direct image input for current foundation models. This work establishes a rigorous benchmark for engineering AI and highlights a critical need for developing more robust multimodal reasoning capabilities in foundation models, particularly in scientific and engineering contexts.
28. RePro: Leveraging Large Language Models for Semi-Automated Reproduction of Networking Research Results
Authors: Yining Jiang, Wenyun Xu, Qingyu Song, Yuling Lin, Xuanhao Liu, Xiaoqiang Zheng, Qiang Su, Lizhao You, Lu Tang, Wangjian Feng, Linghe Kong, Qiao Xiang, Jiwu Shu β’
Published: 2025-09-25 β’
Source: arXiv
Reproducing networking research is a critical but challenging task due to the scarcity of open-source code. While Large Language Models (LLMs) can automate code generation, current approaches lack the generalizability required for the diverse networking field. To address this, we propose RePro, a semi-automated reproduction framework that leverages advanced prompt engineering to reproduce network systems from their research papers. RePro combines few-shot in-context learning with Structured and Semantic Chain of Thought (SCoT/SeCoT) techniques to systematically translate a paper's description into an optimized, executable implementation. The framework operates through a three-stage pipeline: system description extraction, structural code generation, and code optimization. Our evaluation with five state-of-the-art LLMs across diverse network sub-domains demonstrates that RePro significantly reduces reproduction time compared to manual efforts while achieving comparable system performance, validating its effectiveness and efficiency.
29. Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution
Authors: Kaiwen He, Zhiwei Wang, Chenyi Zhuang, Jinjie Gu β’
Published: 2025-09-25 β’
Source: arXiv
Recent years, multimodal models have made remarkable strides and pave the way for intelligent browser use agents. However, when solving tasks on real world webpages in multi-turn, long-horizon trajectories, current agents still suffer from disordered action sequencing and excessive trial and error during execution. This paper introduces Recon-Act, a self-evolving multi-agent framework grounded in Reconnaissance-Action behavioral paradigm. The system comprises a Reconnaissance Team and an Action Team: the former conducts comparative analysis and tool generation, while the latter handles intent decomposition, tool orchestration, and execution. By contrasting the erroneous trajectories with successful ones, the Reconnaissance Team infers remedies, and abstracts them into a unified notion of generalized tools, either expressed as hints or as rule-based codes, and register to the tool archive in real time. The Action Team reinference the process empowered with these targeting tools, thus establishing a closed-loop training pipeline of data-tools-action-feedback. Following the 6 level implementation roadmap proposed in this work, we have currently reached Level 3 (with limited human-in-the-loop intervention). Leveraging generalized tools obtained through reconnaissance, Recon-Act substantially improves adaptability to unseen websites and solvability on long-horizon tasks, and achieves state-of-the-art performance on the challenging VisualWebArena dataset.
30. Disagreements in Reasoning: How a Model's Thinking Process Dictates Persuasion in Multi-Agent Systems
Authors: Haodong Zhao, Jidong Li, Zhaomin Wu, Tianjie Ju, Zhuosheng Zhang, Bingsheng He, Gongshen Liu β’
Published: 2025-09-25 β’
Source: arXiv
The rapid proliferation of recent Multi-Agent Systems (MAS), where Large Language Models (LLMs) and Large Reasoning Models (LRMs) usually collaborate to solve complex problems, necessitates a deep understanding of the persuasion dynamics that govern their interactions. This paper challenges the prevailing hypothesis that persuasive efficacy is primarily a function of model scale. We propose instead that these dynamics are fundamentally dictated by a model's underlying cognitive process, especially its capacity for explicit reasoning. Through a series of multi-agent persuasion experiments, we uncover a fundamental trade-off we term the Persuasion Duality. Our findings reveal that the reasoning process in LRMs exhibits significantly greater resistance to persuasion, maintaining their initial beliefs more robustly. Conversely, making this reasoning process transparent by sharing the "thinking content" dramatically increases their ability to persuade others. We further consider more complex transmission persuasion situations and reveal complex dynamics of influence propagation and decay within multi-hop persuasion between multiple agent networks. This research provides systematic evidence linking a model's internal processing architecture to its external persuasive behavior, offering a novel explanation for the susceptibility of advanced models and highlighting critical implications for the safety, robustness, and design of future MAS.
31. MPC-based Deep Reinforcement Learning Method for Space Robotic Control with Fuel Sloshing Mitigation
Authors: Mahya Ramezani, M. Amin Alandihallaj, BarΔ±Ε Can YalΓ§Δ±n, Miguel Angel Olivares Mendez, Holger Voos β’
Published: 2025-09-25 β’
Source: arXiv
This paper presents an integrated Reinforcement Learning (RL) and Model Predictive Control (MPC) framework for autonomous satellite docking with a partially filled fuel tank. Traditional docking control faces challenges due to fuel sloshing in microgravity, which induces unpredictable forces affecting stability. To address this, we integrate Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) RL algorithms with MPC, leveraging MPC's predictive capabilities to accelerate RL training and improve control robustness. The proposed approach is validated through Zero-G Lab of SnT experiments for planar stabilization and high-fidelity numerical simulations for 6-DOF docking with fuel sloshing dynamics. Simulation results demonstrate that SAC-MPC achieves superior docking accuracy, higher success rates, and lower control effort, outperforming standalone RL and PPO-MPC methods. This study advances fuel-efficient and disturbance-resilient satellite docking, enhancing the feasibility of on-orbit refueling and servicing missions.
32. Combinatorial Creativity: A New Frontier in Generalization Abilities
Authors: Samuel Schapiro, Sumuk Shashidhar, Alexi Gladstone, Jonah Black, Royce Moon, Dilek Hakkani-Tur, Lav R. Varshney β’
Published: 2025-09-25 β’
Source: arXiv
Artificial intelligence (AI) systems, and large language models (LLMs) in particular, are increasingly employed for creative tasks like scientific idea generation, constituting a form of generalization from training data unaddressed by existing conceptual frameworks. Though in many ways similar to forms of compositional generalization (CG), combinatorial creativity (CC) is an open-ended ability. Instead of evaluating for accuracy or correctness against fixed targets, which would contradict the open-ended nature of CC, we propose a theoretical framework and algorithmic task for evaluating outputs by their degrees of novelty and utility. From here, we make several important empirical contributions: (1) We obtain the first insights into the scaling behavior of creativity for LLMs. (2) We discover that, for fixed compute budgets, there exist optimal model depths and widths for creative ability. (3) We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a more fundamental novelty-utility tradeoff characteristic of creativity algorithms in general. Importantly, this tradeoff remains persistent even at scale, casting doubt on the long-term creative potential of LLMs in their current form. Together, our conceptual framework and empirical findings provide a foundation for understanding and improving creativity in modern AI models, marking a new frontier in generalization abilities.
33. Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem
Authors: William F. Godoy, Tatiana Melnichenko, Pedro Valero-Lara, Wael Elwasif, Philip Fackler, Rafael Ferreira Da Silva, Keita Teranishi, Jeffrey S. Vetter β’
Published: 2025-09-25 β’
Source: arXiv
We explore the performance and portability of the novel Mojo language for scientific computing workloads on GPUs. As the first language based on the LLVM's Multi-Level Intermediate Representation (MLIR) compiler infrastructure, Mojo aims to close performance and productivity gaps by combining Python's interoperability and CUDA-like syntax for compile-time portable GPU programming. We target four scientific workloads: a seven-point stencil (memory-bound), BabelStream (memory-bound), miniBUDE (compute-bound), and Hartree-Fock (compute-bound with atomic operations); and compare their performance against vendor baselines on NVIDIA H100 and AMD MI300A GPUs. We show that Mojo's performance is competitive with CUDA and HIP for memory-bound kernels, whereas gaps exist on AMD GPUs for atomic operations and for fast-math compute-bound kernels on both AMD and NVIDIA GPUs. Although the learning curve and programming requirements are still fairly low-level, Mojo can close significant gaps in the fragmented Python ecosystem in the convergence of scientific computing and AI.
34. A Novel Integrated Architecture for Intent Based Approach and Zero Touch Networks
Authors: Neelam Gupta, Dibakar Das, Tamizhelakkiya K, Uma Maheswari Natarajan, Sharvari Ravindran, Komal Sharma, Jyotsna Bapat, Debabrata Das β’
Published: 2025-09-25 β’
Source: arXiv
The transition to Sixth Generation (6G) networks presents challenges in managing quality of service (QoS) of diverse applications and achieving Service Level Agreements (SLAs) under varying network conditions. Hence, network management must be automated with the help of Machine Learning (ML) and Artificial Intelligence (AI) to achieve real-time requirements. Zero touch network (ZTN) is one of the frameworks to automate network management with mechanisms such as closed loop control to ensure that the goals are met perpetually. Intent- Based Networking (IBN) specifies the user intents with diverse network requirements or goals which are then translated into specific network configurations and actions. This paper presents a novel architecture for integrating IBN and ZTN to serve the intent goals. Users provides the intent in the form of natural language, e.g., English, which is then translated using natural language processing (NLP) techniques (e.g., retrieval augmented generation (RAG)) into Network Intent LanguagE (Nile). The Nile intent is then passed on to the BiLSTM and Q-learning based ZTN closed loop framework as a goal which maintains the intent under varying network conditions. Thus, the proposed architecture can work autonomously to ensure the network performance goal is met by just specifying the user intent in English. The integrated architecture is also implemented on a testbed using OpenAirInterface (OAI). Additionally, to evaluate the architecture, an optimization problem is formulated which evaluated with Monte Carlo simulations. Results demonstrate how ZTN can help achieve the bandwidth goals autonomously set by user intent. The simulation and the testbed results are compared and they show similar trend. Mean Opinion Score (MOS) for Quality of Experience (QoE) is also measured to indicate the user satisfaction of the intent.
35. A Real-Time On-Device Defect Detection Framework for Laser Power-Meter Sensors via Unsupervised Learning
Authors: Dongqi Zheng, Wenjin Fu, Guangzong Chen β’
Published: 2025-09-25 β’
Source: arXiv
We present an automated vision-based system for defect detection and classification of laser power meter sensor coatings. Our approach addresses the critical challenge of identifying coating defects such as thermal damage and scratches that can compromise laser energy measurement accuracy in medical and industrial applications. The system employs an unsupervised anomaly detection framework that trains exclusively on ``good'' sensor images to learn normal coating distribution patterns, enabling detection of both known and novel defect types without requiring extensive labeled defect datasets. Our methodology consists of three key components: (1) a robust preprocessing pipeline using Laplacian edge detection and K-means clustering to segment the area of interest, (2) synthetic data augmentation via StyleGAN2, and (3) a UFlow-based neural network architecture for multi-scale feature extraction and anomaly map generation. Experimental evaluation on 366 real sensor images demonstrates $93.8\%$ accuracy on defective samples and $89.3\%$ accuracy on good samples, with image-level AUROC of 0.957 and pixel-level AUROC of 0.961. The system provides potential annual cost savings through automated quality control and processing times of 0.5 seconds per image in on-device implementation.