1. Data-driven analyses and model-independent fits for present $b\to s \ell \ell$ results
Authors: T. Hurth, F. Mahmoudi, Y. Monceaux, S. Neshatpour β’
Published: 2025-08-13 β’
Source: arXiv
We present a critical assessment of the present $B$ anomalies in the exclusive $b \to s \ell\ell$ mode based on the QCD factorisation approach (QCDf). In particular, we analyse the impact of different local form factor calculations and of the largest bin in the low-$q^2$ region. We also present a model-independent analysis of the new results of the CMS experiment on the $B \to K^* \mu^+\mu^-$ angular observables and compare them with the corresponding LHCb data. In addition, we update the global fit by including all $b \to s$ observables incorporating the new data from CMS. In these analyses, we use 10% or higher guesstimates of the non-factorisable power corrections as additional uncertainties, serving as a placeholder for robust estimates of these contributions. Updating earlier results, we also analyse the combined LHCb and CMS data on the $B \to K^* \mu^+\mu^-$ angular observables using data-driven approaches to find indications whether these tensions between the QCDf predictions and the present data are due to underestimated subleading hadronic contributions or due to new physics effects.
2. 2D bilayer electron-hole superfluidity with unequal and anisotropic masses
Authors: Jihang Zhu, Sankar Das Sarma β’
Published: 2025-08-13 β’
Source: arXiv
We investigate the stability of electron-hole superfluidity in two-dimensional bilayers with unequal and anisotropic effective masses. Using a zero-temperature, self-consistent Hartree-Fock approach, we study two experimentally relevant deviations from the ideal equal-mass isotropic case: (i) isotropic but unequal conduction and valence band masses ($m_c^* \neq m_v^*$), and (ii) equal average masses with orthogonal in-plane anisotropies $(m_{c,x}^*, m^*_{c,y}) = (m_1^*, m_2^*)$ and $(m^*_{v,x}, m^*_{v,y}) = (m_2^*, m_1^*)$. For both scenarios, we compute the order parameter and analyze the BEC-BCS crossover as a function of layer separation and mass ratio. We find that both mass imbalance and mass anisotropy reduce the pairing strength and suppress the inferred critical temperature $T_c$ by breaking perfect Fermi surface nesting, and shift the BEC-BCS crossover. Despite these effects, superfluidity remains robust across the full range of densities and interlayer separations considered, with no transition to an unpaired plasma state in the absence of screening. Our results provide a baseline for understanding the interplay of mass mismatch and anisotropy in current and emerging bilayer platforms, including van der Waals heterostructures and anisotropic two-dimensional semiconductors. Our work also establishes that Fermi surface nesting is not a key ingredient for the bilayer superfluidity, which is always the ground state for all electron-hole bilayers although the resultant $T_c$ depends on the parameter details and may very well be unmeasurably low for large interlayer separations.
3. PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image
Authors: Geonhee Sim, Gyeongsik Moon β’
Published: 2025-08-13 β’
Source: arXiv
Two major approaches exist for creating animatable human avatars. The first, a 3D-based approach, optimizes a NeRF- or 3DGS-based avatar from videos of a single person, achieving personalization through a disentangled identity representation. However, modeling pose-driven deformations, such as non-rigid cloth deformations, requires numerous pose-rich videos, which are costly and impractical to capture in daily life. The second, a diffusion-based approach, learns pose-driven deformations from large-scale in-the-wild videos but struggles with identity preservation and pose-dependent identity entanglement. We present PERSONA, a framework that combines the strengths of both approaches to obtain a personalized 3D human avatar with pose-driven deformations from a single image. PERSONA leverages a diffusion-based approach to generate pose-rich videos from the input image and optimizes a 3D avatar based on them. To ensure high authenticity and sharp renderings across diverse poses, we introduce balanced sampling and geometry-weighted optimization. Balanced sampling oversamples the input image to mitigate identity shifts in diffusion-generated training videos. Geometry-weighted optimization prioritizes geometry constraints over image loss, preserving rendering quality in diverse poses.
4. Vision-driven River Following of UAV via Safe Reinforcement Learning using Semantic Dynamics Model
Authors: Zihan Wang, Nina Mahmoudian β’
Published: 2025-08-13 β’
Source: arXiv
Vision-driven autonomous river following by Unmanned Aerial Vehicles is critical for applications such as rescue, surveillance, and environmental monitoring, particularly in dense riverine environments where GPS signals are unreliable. We formalize river following as a coverage control problem in which the reward function is submodular, yielding diminishing returns as more unique river segments are visited, thereby framing the task as a Submodular Markov Decision Process. First, we introduce Marginal Gain Advantage Estimation, which refines the reward advantage function by using a sliding window baseline computed from historical episodic returns, thus aligning the advantage estimation with the agent's evolving recognition of action value in non-Markovian settings. Second, we develop a Semantic Dynamics Model based on patchified water semantic masks that provides more interpretable and data-efficient short-term prediction of future observations compared to latent vision dynamics models. Third, we present the Constrained Actor Dynamics Estimator architecture, which integrates the actor, the cost estimator, and SDM for cost advantage estimation to form a model-based SafeRL framework capable of solving partially observable Constrained Submodular Markov Decision Processes. Simulation results demonstrate that MGAE achieves faster convergence and superior performance over traditional critic-based methods like Generalized Advantage Estimation. SDM provides more accurate short-term state predictions that enable the cost estimator to better predict potential violations. Overall, CADE effectively integrates safety regulation into model-based RL, with the Lagrangian approach achieving the soft balance of reward and safety during training, while the safety layer enhances performance during inference by hard action overlay.
5. MOC: Meta-Optimized Classifier for Few-Shot Whole Slide Image Classification
Authors: Tianqi Xiang, Yi Li, Qixiang Zhang, Xiaomeng Li β’
Published: 2025-08-13 β’
Source: arXiv
Recent advances in histopathology vision-language foundation models (VLFMs) have shown promise in addressing data scarcity for whole slide image (WSI) classification via zero-shot adaptation. However, these methods remain outperformed by conventional multiple instance learning (MIL) approaches trained on large datasets, motivating recent efforts to enhance VLFM-based WSI classification through fewshot learning paradigms. While existing few-shot methods improve diagnostic accuracy with limited annotations, their reliance on conventional classifier designs introduces critical vulnerabilities to data scarcity. To address this problem, we propose a Meta-Optimized Classifier (MOC) comprising two core components: (1) a meta-learner that automatically optimizes a classifier configuration from a mixture of candidate classifiers and (2) a classifier bank housing diverse candidate classifiers to enable a holistic pathological interpretation. Extensive experiments demonstrate that MOC outperforms prior arts in multiple few-shot benchmarks. Notably, on the TCGA-NSCLC benchmark, MOC improves AUC by 10.4% over the state-of-the-art few-shot VLFM-based methods, with gains up to 26.25% under 1-shot conditions, offering a critical advancement for clinical deployments where diagnostic training data is severely limited. Code is available at https://github.com/xmed-lab/MOC.
6. January Food Benchmark (JFB): A Public Benchmark Dataset and Evaluation Suite for Multimodal Food Analysis
Authors: Amir Hosseinian, Ashkan Dehghani Zahedani, Umer Mansoor, Noosheen Hashemi, Mark Woodward β’
Published: 2025-08-13 β’
Source: arXiv
Progress in AI for automated nutritional analysis is critically hampered by the lack of standardized evaluation methodologies and high-quality, real-world benchmark datasets. To address this, we introduce three primary contributions. First, we present the January Food Benchmark (JFB), a publicly available collection of 1,000 food images with human-validated annotations. Second, we detail a comprehensive benchmarking framework, including robust metrics and a novel, application-oriented overall score designed to assess model performance holistically. Third, we provide baseline results from both general-purpose Vision-Language Models (VLMs) and our own specialized model, january/food-vision-v1. Our evaluation demonstrates that the specialized model achieves an Overall Score of 86.2, a 12.1-point improvement over the best-performing general-purpose configuration. This work offers the research community a valuable new evaluation dataset and a rigorous framework to guide and benchmark future developments in automated nutritional analysis.
7. Deep and diverse population synthesis for multi-person households using generative models
Authors: Hai Yang, Hongying Wu, Linfei Yuan, Xiyuan Ren, Joseph Y. J. Chow, Jinqin Gao, Kaan Ozbay β’
Published: 2025-08-13 β’
Source: arXiv
Synthetic population is an increasingly important material used in numerous areas such as urban and transportation analysis. Traditional methods such as iterative proportional fitting (IPF) is not capable of generating high-quality data when facing datasets with high dimension. Latest population synthesis methods using deep learning techniques can resolve such curse of dimensionality. However, few controls are placed when using these methods, and few of the methods are used to generate synthetic population capturing associations among members in one household. In this study, we propose a framework that tackles these issues. The framework uses a novel population synthesis model, called conditional input directed acyclic tabular generative adversarial network (ciDATGAN), as its core, and a basket of methods are employed to enhance the population synthesis performance. We apply the model to generate a synthetic population for the whole New York State as a public resource for researchers and policymakers. The synthetic population includes nearly 20 million individuals and 7.5 million households. The marginals obtained from the synthetic population match the census marginals well while maintaining similar associations among household members to the sample. Compared to the PUMS data, the synthetic population provides data that is 17% more diverse; when compared against a benchmark approach based on Popgen, the proposed method is 13% more diverse. This study provides an approach that encompasses multiple methods to enhance the population synthesis procedure with greater equity- and diversity-awareness.
8. LIA-X: Interpretable Latent Portrait Animator
Authors: Yaohui Wang, Di Yang, Xinyuan Chen, Francois Bremond, Yu Qiao, Antitza Dantcheva β’
Published: 2025-08-13 β’
Source: arXiv
We introduce LIA-X, a novel interpretable portrait animator designed to transfer facial dynamics from a driving video to a source portrait with fine-grained control. LIA-X is an autoencoder that models motion transfer as a linear navigation of motion codes in latent space. Crucially, it incorporates a novel Sparse Motion Dictionary that enables the model to disentangle facial dynamics into interpretable factors. Deviating from previous 'warp-render' approaches, the interpretability of the Sparse Motion Dictionary allows LIA-X to support a highly controllable 'edit-warp-render' strategy, enabling precise manipulation of fine-grained facial semantics in the source portrait. This helps to narrow initial differences with the driving video in terms of pose and expression. Moreover, we demonstrate the scalability of LIA-X by successfully training a large-scale model with approximately 1 billion parameters on extensive datasets. Experimental results show that our proposed method outperforms previous approaches in both self-reenactment and cross-reenactment tasks across several benchmarks. Additionally, the interpretable and controllable nature of LIA-X supports practical applications such as fine-grained, user-guided image and video editing, as well as 3D-aware portrait video manipulation.
9. Which one Performs Better? Wav2Vec or Whisper? Applying both in Badini Kurdish Speech to Text (BKSTT)
Authors: Renas Adnan, Hossein Hassani β’
Published: 2025-08-13 β’
Source: arXiv
Speech-to-text (STT) systems have a wide range of applications. They are available in many languages, albeit at different quality levels. Although Kurdish is considered a less-resourced language from a processing perspective, SST is available for some of the Kurdish dialects, for instance, Sorani (Central Kurdish). However, that is not applied to other Kurdish dialects, Badini and Hawrami, for example. This research is an attempt to address this gap. Bandin, approximately, has two million speakers, and STT systems can help their community use mobile and computer-based technologies while giving their dialect more global visibility. We aim to create a language model based on Badini's speech and evaluate its performance. To cover a conversational aspect, have a proper confidence level of grammatical accuracy, and ready transcriptions, we chose Badini kids' stories, eight books including 78 stories, as the textual input. Six narrators narrated the books, which resulted in approximately 17 hours of recording. We cleaned, segmented, and tokenized the input. The preprocessing produced nearly 15 hours of speech, including 19193 segments and 25221 words. We used Wav2Vec2-Large-XLSR-53 and Whisper-small to develop the language models. The experiments indicate that the transcriptions process based on the Wav2Vec2-Large-XLSR-53 model provides a significantly more accurate and readable output than the Whisper-small model, with 90.38% and 65.45% readability, and 82.67% and 53.17% accuracy, respectively.
10. Performance of GPT-5 Frontier Models in Ophthalmology Question Answering
Authors: Fares Antaki, David Mikhail, Daniel Milad, Danny A Mammo, Sumit Sharma, Sunil K Srivastava, Bing Yu Chen, Samir Touma, Mertcan Sevgi, Jonathan El-Khoury, Pearse A Keane, Qingyu Chen, Yih Chung Tham, Renaud Duval β’
Published: 2025-08-13 β’
Source: arXiv
Large language models (LLMs) such as GPT-5 integrate advanced reasoning capabilities that may improve performance on complex medical question-answering tasks. For this latest generation of reasoning models, the configurations that maximize both accuracy and cost-efficiency have yet to be established. We evaluated 12 configurations of OpenAI's GPT-5 series (three model tiers across four reasoning effort settings) alongside o1-high, o3-high, and GPT-4o, using 260 closed-access multiple-choice questions from the American Academy of Ophthalmology Basic Clinical Science Course (BCSC) dataset. The primary outcome was multiple-choice accuracy; secondary outcomes included head-to-head ranking via a Bradley-Terry model, rationale quality assessment using a reference-anchored, pairwise LLM-as-a-judge framework, and analysis of accuracy-cost trade-offs using token-based cost estimates. GPT-5-high achieved the highest accuracy (0.965; 95% CI, 0.942-0.985), outperforming all GPT-5-nano variants (P < .001), o1-high (P = .04), and GPT-4o (P < .001), but not o3-high (0.958; 95% CI, 0.931-0.981). GPT-5-high ranked first in both accuracy (1.66x stronger than o3-high) and rationale quality (1.11x stronger than o3-high). Cost-accuracy analysis identified several GPT-5 configurations on the Pareto frontier, with GPT-5-mini-low offering the most favorable low-cost, high-performance balance. These results benchmark GPT-5 on a high-quality ophthalmology dataset, demonstrate the influence of reasoning effort on accuracy, and introduce an autograder framework for scalable evaluation of LLM-generated answers against reference standards in ophthalmology.
11. PPL: Point Cloud Supervised Proprioceptive Locomotion Reinforcement Learning for Legged Robots in Crawl Spaces
Authors: Bida Ma, Nuo Xu, Chenkun Qi, Xin Liu, Yule Mo, Jinkai Wang, Chunpeng Lu β’
Published: 2025-08-13 β’
Source: arXiv
The legged locomotion in spatially constrained structures (called crawl spaces) is challenging. In crawl spaces, current exteroceptive locomotion learning methods are limited by large noises and errors of the sensors in possible low visibility conditions, and current proprioceptive locomotion learning methods are difficult in traversing crawl spaces because only ground features are inferred. In this study, a point cloud supervised proprioceptive locomotion reinforcement learning method for legged robots in crawl spaces is proposed. A state estimation network is designed to estimate the robot's surrounding ground and spatial features as well as the robot's collision states using historical proprioceptive sensor data. The point cloud is represented in polar coordinate frame and a point cloud processing method is proposed to efficiently extract the ground and spatial features that are used to supervise the state estimation network learning. Comprehensive reward functions that guide the robot to traverse through crawl spaces after collisions are designed. Experiments demonstrate that, compared to existing methods, our method exhibits more agile locomotion in crawl spaces. This study enhances the ability of legged robots to traverse spatially constrained environments without requiring exteroceptive sensors.
12. Multi-TeV Gamma Rays from GRB 221009A: Challenges for Emission Mechanisms, EBL Opacity, and Fundamental Physics
Authors: Hassan Abdalla β’
Published: 2025-08-13 β’
Source: arXiv
The detection of gamma-ray burst GRB~221009A has attracted significant attention due to its record brightness and the first-ever detection of multi-TeV $\gamma$-rays from a GRB. Located at redshift $z=0.151$, this event is relatively nearby by GRB standards yet remains cosmologically distant, making the survival of multi-TeV photons surprising. The Large High Altitude Air Shower Observatory detected photons with energies up to $\sim 13$~TeV during the early afterglow phase, challenging standard EBL models. We investigate whether several theoretical frameworks can explain this anomalous emission: reduced EBL opacity due to cosmic voids along the line of sight, novel emission mechanisms within the GRB environment, secondary $\gamma$-ray production through cosmic-ray cascades, and new-physics scenarios involving Lorentz-invariance violation or axion-like particles. Our analysis highlights the exceptional nature of this event and ongoing theoretical tensions about the dominant physical processes. We discuss the limitations of current models and identify specific observational signatures that future multiwavelength and multi-messenger observations could provide to discriminate between competing explanations. Continued study of similar events with next-generation facilities will be crucial for resolving these challenges and advancing our understanding of extreme particle acceleration in astrophysical environments.
13. AST-n: A Fast Sampling Approach for Low-Dose CT Reconstruction using Diffusion Models
Authors: TomΓ‘s de la Sotta, JosΓ© M. Saavedra, HΓ©ctor HenrΓquez, Violeta Chang, Aline Xavier β’
Published: 2025-08-13 β’
Source: arXiv
Low-dose CT (LDCT) protocols reduce radiation exposure but increase image noise, compromising diagnostic confidence. Diffusion-based generative models have shown promise for LDCT denoising by learning image priors and performing iterative refinement. In this work, we introduce AST-n, an accelerated inference framework that initiates reverse diffusion from intermediate noise levels, and integrate high-order ODE solvers within conditioned models to further reduce sampling steps. We evaluate two acceleration paradigms--AST-n sampling and standard scheduling with high-order solvers -- on the Low Dose CT Grand Challenge dataset, covering head, abdominal, and chest scans at 10-25 % of standard dose. Conditioned models using only 25 steps (AST-25) achieve peak signal-to-noise ratio (PSNR) above 38 dB and structural similarity index (SSIM) above 0.95, closely matching standard baselines while cutting inference time from ~16 seg to under 1 seg per slice. Unconditional sampling suffers substantial quality loss, underscoring the necessity of conditioning. We also assess DDIM inversion, which yields marginal PSNR gains at the cost of doubling inference time, limiting its clinical practicality. Our results demonstrate that AST-n with high-order samplers enables rapid LDCT reconstruction without significant loss of image fidelity, advancing the feasibility of diffusion-based methods in clinical workflows.
14. Beam Cross Sections Create Mixtures: Improving Feature Localization in Secondary Electron Imaging
Authors: Vaibhav Choudhary, Akshay Agarwal, Vivek K Goyal β’
Published: 2025-08-13 β’
Source: arXiv
Secondary electron (SE) imaging techniques, such as scanning electron microscopy and helium ion microscopy (HIM), use electrons emitted by a sample in response to a focused beam of charged particles incident at a grid of raster scan positions. Spot size -- the diameter of the incident beam's spatial profile -- is one of the limiting factors for resolution, along with various sources of noise in the SE signal. The effect of the beam spatial profile is commonly understood as convolutional. We show that under a simple and plausible physical abstraction for the beam, though convolution describes the mean of the SE counts, the full distribution of SE counts is a mixture. We demonstrate that this more detailed modeling can enable resolution improvements over conventional estimators through a stylized application in semiconductor inspection of localizing the edge in a two-valued sample. We derive Fisher information about edge location in conventional and time-resolved measurements (TRM) and also derive the maximum likelihood estimate (MLE) from the latter. Empirically, the MLE computed from TRM is approximately efficient except at very low beam diameter, so Fisher information comparisons are predictive of performance and can be used to optimize the beam diameter relative to the raster scan spacing. Monte Carlo simulations show that the MLE gives a 5-fold reduction in root mean-squared error (RMSE) of edge localization as compared to conventional interpolation-based estimation. Applied to three real HIM datasets, the average RMSE reduction factor is 5.4.
15. A Comprehensive Evaluation framework of Alignment Techniques for LLMs
Authors: Muneeza Azmat, Momin Abbas, Maysa Malfiza Garcia de Macedo, Marcelo Carpinette Grave, Luan Soares de Souza, Tiago Machado, Rogerio A de Paula, Raya Horesh, Yixin Chen, Heloisa Caroline de Souza Pereira Candello, Rebecka Nordenlow, Aminat Adebiyi β’
Published: 2025-08-13 β’
Source: arXiv
As Large Language Models (LLMs) become increasingly integrated into real-world applications, ensuring their outputs align with human values and safety standards has become critical. The field has developed diverse alignment approaches including traditional fine-tuning methods (RLHF, instruction tuning), post-hoc correction systems, and inference-time interventions, each with distinct advantages and limitations. However, the lack of unified evaluation frameworks makes it difficult to systematically compare these paradigms and guide deployment decisions. This paper introduces a multi-dimensional evaluation of alignment techniques for LLMs, a comprehensive evaluation framework that provides a systematic comparison across all major alignment paradigms. Our framework assesses methods along four key dimensions: alignment detection, alignment quality, computational efficiency, and robustness. Through experiments across diverse base models and alignment strategies, we demonstrate the utility of our framework in identifying strengths and limitations of current state-of-the-art models, providing valuable insights for future research directions.
16. Quantum recurrences and the arithmetic of Floquet dynamics
Authors: Amit Anand, Dinesh Valluri, Jack Davis, Shohini Ghose β’
Published: 2025-08-13 β’
Source: arXiv
The Poincar\'e recurrence theorem shows that conservative systems in a bounded region of phase space eventually return arbitrarily close to their initial state after a finite amount of time. An analogous behavior occurs in certain quantum systems where quantum states can recur after sufficiently long unitary evolution, a phenomenon known as quantum recurrence. Periodically driven (i.e. Floquet) quantum systems in particular exhibit complex dynamics even in small dimensions, motivating the study of how interactions and Hamiltonian structure affect recurrence behavior. While most existing studies treat recurrence in an approximate, distance-based sense, here we address the problem of exact, state-independent recurrences in a broad class of finite-dimensional Floquet systems, spanning both integrable and non-integrable models. Leveraging techniques from algebraic field theory, we construct an arithmetic framework that identifies all possible recurrence times by analyzing the cyclotomic structure of the Floquet unitary's spectrum. This computationally efficient approach yields both positive results, enumerating all candidate recurrence times and definitive negative results, rigorously ruling out exact recurrences for given Hamiltonian parameters. We further prove that rational Hamiltonian parameters do not, in general, guarantee exact recurrence, revealing a subtle interplay between system parameters and long-time dynamics. Our findings sharpen the theoretical understanding of quantum recurrences, clarify their relationship to quantum chaos, and highlight parameter regimes of special interest for quantum metrology and control.
17. Fault tolerant Operations in Majorana-based Quantum Codes: Gates, Measurements and High Rate Constructions
Authors: Maryam Mudassar, Alexander Schuckert, Daniel Gottesman β’
Published: 2025-08-13 β’
Source: arXiv
Majorana-based quantum computation in nanowires and neutral atoms has gained prominence as a promising platform to encode qubits and protect them against noise. In order to run computations reliably on such devices, a fully fault-tolerant scheme is needed for state preparation, gates, and measurements. However, current fault-tolerant schemes have either been limited to specific code families or have not been developed fully. In this work, we develop a general framework for fault-tolerant computation with logical degrees encoded into Majorana hardware. We emphasize the division between even and odd Majorana codes and how it manifests when constructing fault tolerant gadgets for these families. We provide transversal constructions and supplement them with measurements to obtain several examples of fault tolerant Clifford gadgets. For the case of odd codes, we give a novel construction for gadgets using quantum reference frames, that allows to implement operations that are forbidden due to parity superselection. We also provide a fault-tolerant measurement scheme for Majorana codes inspired by Steane error correction, enabling state preparation, measurement of logical operations and error correction. We also point out a construction for odd Majorana codes with transversal T gates. Finally, we construct an asympotically good quantum LDPC Majorana code with qubit degrees of freedom. Our work shows that all necessary elements of fault-tolerant quantum computation can be consistently implemented in fermionic hardware such as Majorana nanowires and fermionic neutral atoms.
18. Nonlinear periodic orbit solutions and their bifurcation structure at the origin of soliton hopping in coupled microresonators
Authors: Savyaraj Deshmukh, Aleksandr Tusnin, Alexey Tikan, Tobias J. Kippenberg, Tobias M. Schneider β’
Published: 2025-08-13 β’
Source: arXiv
Microresonator frequency combs, essential for future integrated optical systems, rely on dissipative Kerr solitons generated in a single microresonator to achieve coherent frequency comb generation. Recent advances in the nanofabrication of low-loss integrated nonlinear microresonators have paved the way for the exploration of coupled-resonator systems. These systems provide significant technological advantages, including higher conversion efficiency and the generation of dual dispersive waves. Beyond their practical benefits, coupled-resonator systems also reveal novel emergent nonlinear phenomena, such as soliton hopping, a dynamic process in which solitons periodically transfer between coupled resonators. In this study, we employ a dynamical system approach and the corresponding well-established numerical techniques, extensively developed within the context of hydrodynamics and transitional turbulence, to investigate the bifurcation structure of periodic orbit solutions of the coupled Lugiato-Lefever equations that underlie soliton hopping in photonic dimers and trimers. Our main finding uncovers a fundamental difference in the origin of the hopping process in dimers and trimers. We demonstrate that in dimers, hopping emerges from a branch of stable soliton solutions, whereas in trimers, it originates from an unstable branch. This distinction leads to a significant difference in pump power requirements. We relate the bifurcation structure of the periodic orbits including their stability to the observed dynamics in simulated laser scans mimicking typical experimental investigations. Subcritical Hopf bifurcations of unstable equilibrium branches specifically explain observed hysteresis, the coexistence of multiple attractors at the same parameter values, and the importance of choosing a specific path in parameter space to reliably achieve a desired dynamical regime.
19. An integrated photonics platform for high-speed, ultrahigh-extinction, many-channel quantum control
Authors: Mengdi Zhao, Manuj Singh, Anshuman Singh, Henry Thoreen, Robert J. DeAngelo, Daniel Dominguez, Andrew Leenheer, FrΓ©dΓ©ric Peyskens, Alexander Lukin, Dirk Englund, Matt Eichenfield, Nathan Gemelke, Noel H. Wan β’
Published: 2025-08-13 β’
Source: arXiv
High-fidelity control of the thousands to millions of programmable qubits needed for utility-scale quantum computers presents a formidable challenge for control systems. In leading atomic systems, control is optical: UV-NIR beams must be fanned out over numerous spatial channels and modulated to implement gates. While photonic integrated circuits (PICs) offer a potentially scalable solution, they also need to simultaneously feature high-speed and high-extinction modulation, strong inter-channel isolation, and broad wavelength compatibility. Here, we introduce and experimentally validate a foundry-fabricated PIC platform that overcomes these limitations. Designed for Rubidium-87 neutral atom quantum computers, our 8-channel PICs, fabricated on a 200-mm wafer process, demonstrate an advanced combination of performance metrics. At the 795 nm single-qubit gate wavelength, we achieve a mean extinction ratio (ER) of 71.4 $\pm$ 1.1 dB, nearest-neighbor on-chip crosstalk of -68.0 $\pm$ 1.0 dB, and -50.8 $\pm$ 0.2 dB after parallel beam delivery in free-space. This high-performance operation extends to the 420 nm and 1013 nm wavelengths for two-qubit Rydberg gates, showing ERs of 42.4 dB (detector-limited) and 61.5 dB, respectively. The devices exhibit 10-90% rise times of 26 $\pm$ 7 ns, achieve dynamic switching to -60 dB levels within microsecond timescales, and show pulse stability errors at the $10^{-3}$ level. This work establishes a scalable platform for developing advanced large-scale optical control required in fault-tolerant quantum computers and other precision technologies.
20. Wisdom of the Crowd, Without the Crowd: A Socratic LLM for Asynchronous Deliberation on Perspectivist Data
Authors: Malik Khadar, Daniel Runningen, Julia Tang, Stevie Chancellor, Harmanpreet Kaur β’
Published: 2025-08-13 β’
Source: arXiv
Data annotation underpins the success of modern AI, but the aggregation of crowd-collected datasets can harm the preservation of diverse perspectives in data. Difficult and ambiguous tasks cannot easily be collapsed into unitary labels. Prior work has shown that deliberation and discussion improve data quality and preserve diverse perspectives -- however, synchronous deliberation through crowdsourcing platforms is time-intensive and costly. In this work, we create a Socratic dialog system using Large Language Models (LLMs) to act as a deliberation partner in place of other crowdworkers. Against a benchmark of synchronous deliberation on two tasks (Sarcasm and Relation detection), our Socratic LLM encouraged participants to consider alternate annotation perspectives, update their labels as needed (with higher confidence), and resulted in higher annotation accuracy (for the Relation task where ground truth is available). Qualitative findings show that our agent's Socratic approach was effective at encouraging reasoned arguments from our participants, and that the intervention was well-received. Our methodology lays the groundwork for building scalable systems that preserve individual perspectives in generating more representative datasets.
21. RAGulating Compliance: A Multi-Agent Knowledge Graph for Regulatory QA
Authors: Bhavik Agarwal, Hemant Sunil Jomraj, Simone Kaplunov, Jack Krolick, Viktoria Rojkova β’
Published: 2025-08-13 β’
Source: arXiv
Regulatory compliance question answering (QA) requires precise, verifiable information, and domain-specific expertise, posing challenges for Large Language Models (LLMs). In this work, we present a novel multi-agent framework that integrates a Knowledge Graph (KG) of Regulatory triplets with Retrieval-Augmented Generation (RAG) to address these demands. First, agents build and maintain an ontology-free KG by extracting subject--predicate--object (SPO) triplets from regulatory documents and systematically cleaning, normalizing, deduplicating, and updating them. Second, these triplets are embedded and stored along with their corresponding textual sections and metadata in a single enriched vector database, allowing for both graph-based reasoning and efficient information retrieval. Third, an orchestrated agent pipeline leverages triplet-level retrieval for question answering, ensuring high semantic alignment between user queries and the factual "who-did-what-to-whom" core captured by the graph. Our hybrid system outperforms conventional methods in complex regulatory queries, ensuring factual correctness with embedded triplets, enabling traceability through a unified vector database, and enhancing understanding through subgraph visualization, providing a robust foundation for compliance-driven and broader audit-focused applications.
22. Human-Aligned Procedural Level Generation Reinforcement Learning via Text-Level-Sketch Shared Representation
Authors: In-Chang Baek, Seoyoung Lee, Sung-Hyun Kim, Geumhwan Hwang, KyungJoong Kim β’
Published: 2025-08-13 β’
Source: arXiv
Human-aligned AI is a critical component of co-creativity, as it enables models to accurately interpret human intent and generate controllable outputs that align with design goals in collaborative content creation. This direction is especially relevant in procedural content generation via reinforcement learning (PCGRL), which is intended to serve as a tool for human designers. However, existing systems often fall short of exhibiting human-centered behavior, limiting the practical utility of AI-driven generation tools in real-world design workflows. In this paper, we propose VIPCGRL (Vision-Instruction PCGRL), a novel deep reinforcement learning framework that incorporates three modalities-text, level, and sketches-to extend control modality and enhance human-likeness. We introduce a shared embedding space trained via quadruple contrastive learning across modalities and human-AI styles, and align the policy using an auxiliary reward based on embedding similarity. Experimental results show that VIPCGRL outperforms existing baselines in human-likeness, as validated by both quantitative metrics and human evaluations. The code and dataset will be available upon publication.
23. PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
Authors: Mo Yu, Tsz Ting Chung, Chulun Zhou, Tong Li, Rui Lu, Jiangnan Li, Liyan Xu, Haoshu Lu, Ning Zhang, Jing Li, Jie Zhou β’
Published: 2025-08-13 β’
Source: arXiv
We introduce PRELUDE, a benchmark for evaluating long-context understanding through the task of determining whether a character's prequel story is consistent with the canonical narrative of the original book. Our task poses a stronger demand for global comprehension and deep reasoning than existing benchmarks -- as the prequels are not part of the original story, assessing their plausibility typically requires searching and integrating information that is only indirectly related. Empirically, 88% of instances require evidence from multiple parts of the narrative. Experimental results highlight the challenge of our task: in-context learning, RAG and in-domain training with state-of-the-art LLMs, and commercial DeepResearch services, lag behind humans by >15%. A further human study reveals that models often produce correct answers with flawed reasoning, leading to an over 30% gap in reasoning accuracy compared to humans. These findings underscore the substantial room for improvement in long-context understanding and reasoning.
24. Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
Authors: Weigao Sun, Jiaxi Hu, Yucheng Zhou, Jusen Du, Disen Lan, Kexin Wang, Tong Zhu, Xiaoye Qu, Yu Zhang, Xiaoyu Mo, Daizong Liu, Yuxuan Liang, Wenliang Chen, Guoqi Li, Yu Cheng β’
Published: 2025-08-13 β’
Source: arXiv
Large Language Models (LLMs) have delivered impressive results in language understanding, generation, reasoning, and pushes the ability boundary of multimodal models. Transformer models, as the foundation of modern LLMs, offer a strong baseline with excellent scaling properties. However, the traditional transformer architecture requires substantial computations and poses significant obstacles for large-scale training and practical deployment. In this survey, we offer a systematic examination of innovative LLM architectures that address the inherent limitations of transformers and boost the efficiency. Starting from language modeling, this survey covers the background and technical details of linear and sparse sequence modeling methods, efficient full attention variants, sparse mixture-of-experts, hybrid model architectures incorporating the above techniques, and emerging diffusion LLMs. Additionally, we discuss applications of these techniques to other modalities and consider their wider implications for developing scalable, resource-aware foundation models. By grouping recent studies into the above category, this survey presents a blueprint of modern efficient LLM architectures, and we hope this could help motivate future research toward more efficient, versatile AI systems.
25. Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification
Authors: Linh Nguyen, Chunhua Liu, Hong Yi Lin, Patanamon Thongtanunam β’
Published: 2025-08-13 β’
Source: arXiv
Code review is a crucial practice in software development. As code review nowadays is lightweight, various issues can be identified, and sometimes, they can be trivial. Research has investigated automated approaches to classify review comments to gauge the effectiveness of code reviews. However, previous studies have primarily relied on supervised machine learning, which requires extensive manual annotation to train the models effectively. To address this limitation, we explore the potential of using Large Language Models (LLMs) to classify code review comments. We assess the performance of LLMs to classify 17 categories of code review comments. Our results show that LLMs can classify code review comments, outperforming the state-of-the-art approach using a trained deep learning model. In particular, LLMs achieve better accuracy in classifying the five most useful categories, which the state-of-the-art approach struggles with due to low training examples. Rather than relying solely on a specific small training data distribution, our results show that LLMs provide balanced performance across high- and low-frequency categories. These results suggest that the LLMs could offer a scalable solution for code review analytics to improve the effectiveness of the code review process.
26. From Self-Crafted to Engineered Prompts: Student Evaluations of AI-Generated Feedback in Introductory Physics
Authors: Amogh Sirnoorkar, N. Sanjay Rebello β’
Published: 2025-08-13 β’
Source: arXiv
The abilities of Generative-Artificial Intelligence (AI) to produce real-time, sophisticated responses across diverse contexts has promised a huge potential in physics education, particularly in providing customized feedback. In this study, we investigate around 1200 introductory students' preferences about AI-feedback generated from three distinct prompt types: (a) self-crafted, (b) entailing foundational prompt-engineering techniques, and (c) entailing foundational prompt-engineering techniques along with principles of effective-feedback. The results highlight an overwhelming fraction of students preferring feedback generated using structured prompts, with those entailing combined features of prompt engineering and effective feedback to be favored most. However, the popular choice also elicited stronger preferences with students either liking or disliking the feedback. Students also ranked the feedback generated using their self-crafted prompts as the least preferred choice. Students' second preferences given their first choice and implications of the results such as the need to incorporate prompt engineering in introductory courses are discussed.
27. LibRec: Benchmarking Retrieval-Augmented LLMs for Library Migration Recommendations
Authors: Junxiao Han, Yarong Wang, Xiaodong Gu, Cuiyun Gao, Yao Wan, Song Han, David Lo, Shuiguang Deng β’
Published: 2025-08-13 β’
Source: arXiv
In this paper, we propose LibRec, a novel framework that integrates the capabilities of LLMs with retrieval-augmented generation(RAG) techniques to automate the recommendation of alternative libraries. The framework further employs in-context learning to extract migration intents from commit messages to enhance the accuracy of its recommendations. To evaluate the effectiveness of LibRec, we introduce LibEval, a benchmark designed to assess the performance in the library migration recommendation task. LibEval comprises 2,888 migration records associated with 2,368 libraries extracted from 2,324 Python repositories. Each migration record captures source-target library pairs, along with their corresponding migration intents and intent types. Based on LibEval, we evaluated the effectiveness of ten popular LLMs within our framework, conducted an ablation study to examine the contributions of key components within our framework, explored the impact of various prompt strategies on the framework's performance, assessed its effectiveness across various intent types, and performed detailed failure case analyses.
28. Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study
Authors: Mahdi Dhaini, Juraj Vladika, Ege Erdogan, Zineb Attaoui, Gjergji Kasneci β’
Published: 2025-08-13 β’
Source: arXiv
In the rapidly evolving field of Explainable Natural Language Processing (NLP), textual explanations, i.e., human-like rationales, are pivotal for explaining model predictions and enriching datasets with interpretable labels. Traditional approaches rely on human annotation, which is costly, labor-intensive, and impedes scalability. In this work, we present an automated framework that leverages multiple state-of-the-art large language models (LLMs) to generate high-quality textual explanations. We rigorously assess the quality of these LLM-generated explanations using a comprehensive suite of Natural Language Generation (NLG) metrics. Furthermore, we investigate the downstream impact of these explanations on the performance of pre-trained language models (PLMs) and LLMs across natural language inference tasks on two diverse benchmark datasets. Our experiments demonstrate that automated explanations exhibit highly competitive effectiveness compared to human-annotated explanations in improving model performance. Our findings underscore a promising avenue for scalable, automated LLM-based textual explanation generation for extending NLP datasets and enhancing model performance.
29. Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Authors: Lin Long, Yichen He, Wentao Ye, Yiyuan Pan, Yuan Lin, Hang Li, Junbo Zhao, Wei Li β’
Published: 2025-08-13 β’
Source: arXiv
We introduce M3-Agent, a novel multimodal agent framework equipped with long-term memory. Like humans, M3-Agent can process real-time visual and auditory inputs to build and update its long-term memory. Beyond episodic memory, it also develops semantic memory, enabling it to accumulate world knowledge over time. Its memory is organized in an entity-centric, multimodal format, allowing deeper and more consistent understanding of the environment. Given an instruction, M3-Agent autonomously performs multi-turn, iterative reasoning and retrieves relevant information from memory to accomplish the task. To evaluate memory effectiveness and memory-based reasoning in multimodal agents, we develop M3-Bench, a new long-video question answering benchmark. M3-Bench comprises 100 newly recorded real-world videos captured from a robot's perspective (M3-Bench-robot) and 929 web-sourced videos across diverse scenarios (M3-Bench-web). We annotate question-answer pairs designed to test key capabilities essential for agent applications, such as human understanding, general knowledge extraction, and cross-modal reasoning. Experimental results show that M3-Agent, trained via reinforcement learning, outperforms the strongest baseline, a prompting agent using Gemini-1.5-pro and GPT-4o, achieving 6.7%, 7.7%, and 5.3% higher accuracy on M3-Bench-robot, M3-Bench-web and VideoMME-long, respectively. Our work advances the multimodal agents toward more human-like long-term memory and provides insights into their practical design. Model, code and data are available at https://github.com/bytedance-seed/m3-agent
30. Besondere Anforderungen des automatisierten Fahrens an den Entwurf
Authors: Robert Graubohm, Markus Maurer β’
Published: 2025-08-13 β’
Source: arXiv
The development of automated vehicles and automated driving functions is an exceptionally complex task that requires the integration of numerous, sometimes conflicting interests and various constraints already in the early stages of system design. This chapter explains important challenges in concept specifications for automated driving and presents a systematic process model that contributes to overcoming the special requirements in this field. In addition, it describes the successful implementation of a structured concept specification for an automated vehicle guidance system. -- Die Entwicklung automatisierter Fahrzeuge und Fahrfunktionen stellt eine ausgesprochen komplexe Aufgabe dar, die bereits im Zuge des Systementwurfs die Einbeziehung einer Vielzahl teilweise konflikt\"arer Interessen und diverser Randbedingungen erfordert. Dieses Kapitel erl\"autert wichtige Herausforderungen bei Konzeptspezifikationen im Themenfeld des automatisierten Fahrens und stellt ein systematisches Prozessmodell vor, das einen Beitrag zur Erf\"ullung der besonderen Anforderungen des automatisierten Fahrens an den Entwurf leistet. Dar\"uber hinaus wird die erfolgreiche Durchf\"uhrung einer strukturierten Konzeptspezifikation f\"ur ein automatisiertes Fahrzeugf\"uhrungssystem beschrieben.
31. Closing the HPC-Cloud Convergence Gap: Multi-Tenant Slingshot RDMA for Kubernetes
Authors: Philipp A. Friese, Ahmed Eleliemy, Utz-Uwe Haus, Martin Schulz β’
Published: 2025-08-13 β’
Source: arXiv
Converged HPC-Cloud computing is an emerging computing paradigm that aims to support increasingly complex and multi-tenant scientific workflows. These systems require reconciliation of the isolation requirements of native cloud workloads and the performance demands of HPC applications. In this context, networking hardware is a critical boundary component: it is the conduit for high-throughput, low-latency communication and enables isolation across tenants. HPE Slingshot is a high-speed network interconnect that provides up to 200 Gbps of throughput per port and targets high-performance computing (HPC) systems. The Slingshot host software, including hardware drivers and network middleware libraries, is designed to meet HPC deployments, which predominantly use single-tenant access modes. Hence, the Slingshot stack is not suited for secure use in multi-tenant deployments, such as converged HPC-Cloud deployments. In this paper, we design and implement an extension to the Slingshot stack targeting converged deployments on the basis of Kubernetes. Our integration provides secure, container-granular, and multi-tenant access to Slingshot RDMA networking capabilities at minimal overhead.
32. ReqInOne: A Large Language Model-Based Agent for Software Requirements Specification Generation
Authors: Taohong Zhu, Lucas C. Cordeiro, Youcheng Sun β’
Published: 2025-08-13 β’
Source: arXiv
Software Requirements Specification (SRS) is one of the most important documents in software projects, but writing it manually is time-consuming and often leads to ambiguity. Existing automated methods rely heavily on manual analysis, while recent Large Language Model (LLM)-based approaches suffer from hallucinations and limited controllability. In this paper, we propose ReqInOne, an LLM-based agent that follows the common steps taken by human requirements engineers when writing an SRS to convert natural language into a structured SRS. ReqInOne adopts a modular architecture by decomposing SRS generation into three tasks: summary, requirement extraction, and requirement classification, each supported by tailored prompt templates to improve the quality and consistency of LLM outputs. We evaluate ReqInOne using GPT-4o, LLaMA 3, and DeepSeek-R1, and compare the generated SRSs against those produced by the holistic GPT-4-based method from prior work as well as by entry-level requirements engineers. Expert evaluations show that ReqInOne produces more accurate and well-structured SRS documents. The performance advantage of ReqInOne benefits from its modular design, and experimental results further demonstrate that its requirement classification component achieves comparable or even better results than the state-of-the-art requirement classification model.
33. VisFinEval: A Scenario-Driven Chinese Multimodal Benchmark for Holistic Financial Understanding
Authors: Zhaowei Liu, Xin Guo, Haotian Xia, Lingfeng Zeng, Fangqi Lou, Jinyi Niu, Mengping Li, Qi Qi, Jiahuan Li, Wei Zhang, Yinglong Wang, Weige Cai, Weining Shen, Liwen Zhang β’
Published: 2025-08-13 β’
Source: arXiv
Multimodal large language models (MLLMs) hold great promise for automating complex financial analysis. To comprehensively evaluate their capabilities, we introduce VisFinEval, the first large-scale Chinese benchmark that spans the full front-middle-back office lifecycle of financial tasks. VisFinEval comprises 15,848 annotated question-answer pairs drawn from eight common financial image modalities (e.g., K-line charts, financial statements, official seals), organized into three hierarchical scenario depths: Financial Knowledge & Data Analysis, Financial Analysis & Decision Support, and Financial Risk Control & Asset Optimization. We evaluate 21 state-of-the-art MLLMs in a zero-shot setting. The top model, Qwen-VL-max, achieves an overall accuracy of 76.3%, outperforming non-expert humans but trailing financial experts by over 14 percentage points. Our error analysis uncovers six recurring failure modes-including cross-modal misalignment, hallucinations, and lapses in business-process reasoning-that highlight critical avenues for future research. VisFinEval aims to accelerate the development of robust, domain-tailored MLLMs capable of seamlessly integrating textual and visual financial information. The data and the code are available at https://github.com/SUFE-AIFLM-Lab/VisFinEval.
34. Artificial Intelligence, Domain AI Readiness, and Firm Productivity
Authors: Sipeng Zeng, Xiaoning Wang, Tianshu Sun β’
Published: 2025-08-13 β’
Source: arXiv
Although Artificial Intelligence (AI) holds great promise for enhancing innovation and productivity, many firms struggle to realize its benefits. We investigate why some firms and industries succeed with AI while others do not, focusing on the degree to which an industrial domain is technologically integrated with AI, which we term "domain AI readiness". Using panel data on Chinese listed firms from 2016 to 2022, we examine how the interaction between firm-level AI capabilities and domain AI readiness affects firm performance. We create novel constructs from patent data and measure the domain AI readiness of a specific domain by analyzing the co-occurrence of four-digit International Patent Classification (IPC4) codes related to AI with the specific domain across all patents in that domain. Our findings reveal a strong complementarity: AI capabilities yield greater productivity and innovation gains when deployed in domains with higher AI readiness, whereas benefits are limited in domains that are technologically unprepared or already obsolete. These results remain robust when using local AI policy initiatives as instrumental variables. Further analysis shows that this complementarity is driven by external advances in domain-AI integration, rather than firms' own strategic pivots. Time-series analysis of IPC4 co-occurrence patterns further suggests that improvements in domain AI readiness stem primarily from the academic advancements of AI in specific domains.
35. AINL-Eval 2025 Shared Task: Detection of AI-Generated Scientific Abstracts in Russian
Authors: Tatiana Batura, Elena Bruches, Milana Shvenk, Valentin Malykh β’
Published: 2025-08-13 β’
Source: arXiv
The rapid advancement of large language models (LLMs) has revolutionized text generation, making it increasingly difficult to distinguish between human- and AI-generated content. This poses a significant challenge to academic integrity, particularly in scientific publishing and multilingual contexts where detection resources are often limited. To address this critical gap, we introduce the AINL-Eval 2025 Shared Task, specifically focused on the detection of AI-generated scientific abstracts in Russian. We present a novel, large-scale dataset comprising 52,305 samples, including human-written abstracts across 12 diverse scientific domains and AI-generated counterparts from five state-of-the-art LLMs (GPT-4-Turbo, Gemma2-27B, Llama3.3-70B, Deepseek-V3, and GigaChat-Lite). A core objective of the task is to challenge participants to develop robust solutions capable of generalizing to both (i) previously unseen scientific domains and (ii) models not included in the training data. The task was organized in two phases, attracting 10 teams and 159 submissions, with top systems demonstrating strong performance in identifying AI-generated content. We also establish a continuous shared task platform to foster ongoing research and long-term progress in this important area. The dataset and platform are publicly available at https://github.com/iis-research-team/AINL-Eval-2025.