1. Sound-Horizon-Agnostic Inference of the Hubble Constant and Neutrino Mass from BAO, CMB Lensing, and Galaxy Weak Lensing and Clustering
Authors: Helena GarcΓa Escudero, Seyed Hamidreza Mirpoorian, Levon Pogosian β’
Published: 2025-09-19 β’
Source: arXiv
We present a sound-horizon-agnostic determination of the Hubble constant, $H_0$, by combining DESI DR2 baryon acoustic oscillation (BAO) data with the latest cosmic microwave background (CMB) lensing measurements from Planck, ACT, and SPT-3G, the angular size of the CMB acoustic scale, Dark Energy Survey Year-3 ($3\times2$-pt) galaxy weak lensing and clustering correlations, and the Pantheon+ supernova sample. In this analysis, the sound horizon at the drag epoch, $r_d$, is treated as a free parameter, avoiding assumptions about early-Universe physics. By combining uncalibrated comoving distances from BAO and supernovae with constraints on the matter density $\Omega_m h^2$ from CMB and galaxy lensing/clustering, we break the $r_d$-$H_0$ degeneracy and obtain $H_0 = 70.0 \pm 1.7$ km/s/Mpc when the sum of the neutrino masses is fixed at $\Sigma m_\nu = 0.06$ eV. With a conservative prior on the amplitude of primordial fluctuations, $A_s$, we find $H_0 = 70.03 \pm 0.97$ km/s/Mpc and $r_d = 144.8 \pm 1.6$ Mpc. Allowing $\Sigma m_\nu$ to vary yields $H_0 = 75.3^{+3.3}_{-4.0}$ km/s/Mpc and $\Sigma m_\nu = 0.55^{+0.23}_{-0.37}$ ($<1.11$ eV) at 68% (95%) CL, and $H_0 = 73.9 \pm 2.2$ km/s/Mpc with $\Sigma m_\nu = 0.46^{+0.21}_{-0.25}$ ($=0.46^{+0.40}_{-0.45}$ eV) at 68% (95%) CL when a prior on $A_s$ is applied. Forecasts for the completed DESI BAO program, combined with Simons-Observatory-like CMB lensing, next-generation $3\times2$-pt data, and expanded supernova samples predict $\sigma(H_0) \simeq 0.67$ km/s/Mpc with fixed $\Sigma m_\nu$, and $\sigma(H_0) \simeq 1.1$ km/s/Mpc with $\Sigma m_\nu < 0.133$ ($<0.263$) eV at 68% (95%) CL when the neutrino mass is varied. As the precision of BAO, CMB lensing, and galaxy lensing/clustering improve, this $r_d$-agnostic framework will provide an independent test of the need for new physics at recombination.
2. Kinematical analysis of PNe with high ADF: Hf 2-2 and M 1-42
Authors: Lesly CastaΓ±eda-Carlos, Michael G. Richer, Silvia Torres-Peimbert, Anabel Arrieta, Lorena Arias β’
Published: 2025-09-19 β’
Source: arXiv
We use deep Echelle spectroscopy of the planetary nebulae Hf 2-2 and M1-42 to study the characteristics of the plasma that gives rise to their high abundance discrepancy factors (70 and 20, respectively). We analyze position-velocity diagrams for forbidden and permitted lines (92 and 93 lines in Hf 2-2 and M 1-42, respectively), to compare their kinematic behaviour and to determine the physical characteristics of the emitting plasma. We confirm that there are two plasma components in both nebulae: a normal nebular plasma that emits both forbidden and permitted lines and an additional plasma component that emits the permitted lines of O I, C II, N II, O II, and Ne II. These plasma components have different spatial distributions, with the additional plasma component being the more centrally concentrated. Their physical conditions are also different, with the additional plasma component being denser and cooler. We find that, in these objects, the additional plasma component contains masses of N$^{2}$ and O$^{2}$ ions that are at least as large as the normal nebular plasma. In both objects, we find strong gradients in the electron temperature in small volumes near the central star. Compared to NGC 6153, we find that the larger ADFs in Hf 2-2 and M 1-42 are due to larger masses of ions that emit only in the permitted lines, and not due to the physical conditions.
3. FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
Authors: Luca Della Libera, Cem Subakan, Mirco Ravanelli β’
Published: 2025-09-19 β’
Source: arXiv
Neural audio codecs are a fundamental component of modern generative audio pipelines. Although recent codecs achieve strong low-bitrate reconstruction and provide powerful representations for downstream tasks, most are non-streamable, limiting their use in real-time applications. We present FocalCodec-Stream, a hybrid codec based on focal modulation that compresses speech into a single binary codebook at 0.55 - 0.80 kbps with a theoretical latency of 80 ms. Our approach combines multi-stage causal distillation of WavLM with targeted architectural improvements, including a lightweight refiner module that enhances quality under latency constraints. Experiments show that FocalCodec-Stream outperforms existing streamable codecs at comparable bitrates, while preserving both semantic and acoustic information. The result is a favorable trade-off between reconstruction quality, downstream task performance, latency, and efficiency. Code and checkpoints will be released at https://github.com/lucadellalib/focalcodec.
4. Binary-lens Microlensing Degeneracy: Impact on Planetary Sensitivity and Mass-ratio Function
Authors: Yuxin Shang, Hongjing Yang, Jiyuan Zhang, Shude Mao, Andrew Gould, Weicheng Zang, Qiyue Qian, Jennifer C. Yee β’
Published: 2025-09-19 β’
Source: arXiv
Gravitational microlensing is a unique method for discovering cold planets across a broad mass range. Reliable statistics of the microlensing planets require accurate sensitivity estimates. However, the impact of the degeneracies in binary-lens single-source (2L1S) models that affect many actual planet detections is often omitted in sensitivity estimates, leading to potential self-inconsistency of the statistics studies. In this work, we evaluate the effect of the 2L1S degeneracies on planetary sensitivity by simulating a series of typical microlensing events and comprehensively replicating a realistic planet detection pipeline, including the anomaly identification, global 2L1S model search, and degenerate model comparison. We find that for a pure-survey statistical sample, the 2L1S degeneracies reduce the overall planetary sensitivity by $5\sim10\%$, with the effect increasing at higher planet-host mass ratios. This bias leads to an underestimation of planet occurrence rates and a flattening of the inferred mass-ratio function slope. This effect will be critical for upcoming space-based microlensing surveys like the Roman or Earth 2.0 missions, which are expected to discover $\mathcal{O}(10^3)$ planets. We also discuss the computational challenges and propose potential approaches for future applications.
5. Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences
Authors: Andrew Kyle Lampinen, Martin Engelcke, Yuxuan Li, Arslan Chaudhry, James L. McClelland β’
Published: 2025-09-19 β’
Source: arXiv
When do machine learning systems fail to generalize, and what mechanisms could improve their generalization? Here, we draw inspiration from cognitive science to argue that one weakness of machine learning systems is their failure to exhibit latent learning -- learning information that is not relevant to the task at hand, but that might be useful in a future task. We show how this perspective links failures ranging from the reversal curse in language modeling to new findings on agent-based navigation. We then highlight how cognitive science points to episodic memory as a potential part of the solution to these issues. Correspondingly, we show that a system with an oracle retrieval mechanism can use learning experiences more flexibly to generalize better across many of these challenges. We also identify some of the essential components for effectively using retrieval, including the importance of within-example in-context learning for acquiring the ability to use information across retrieved examples. In summary, our results illustrate one possible contributor to the relative data inefficiency of current machine learning systems compared to natural intelligence, and help to understand how retrieval methods can complement parametric learning to improve generalization.
6. MatchFixAgent: Language-Agnostic Autonomous Repository-Level Code Translation Validation and Repair
Authors: Ali Reza Ibrahimzada, Brandon Paulsen, Reyhaneh Jabbarvand, Joey Dodds, Daniel Kroening β’
Published: 2025-09-19 β’
Source: arXiv
Code translation transforms source code from one programming language (PL) to another. Validating the functional equivalence of translation and repairing, if necessary, are critical steps in code translation. Existing automated validation and repair approaches struggle to generalize to many PLs due to high engineering overhead, and they rely on existing and often inadequate test suites, which results in false claims of equivalence and ineffective translation repair. We develop MatchFixAgent, a large language model (LLM)-based, PL-agnostic framework for equivalence validation and repair of translations. MatchFixAgent features a multi-agent architecture that divides equivalence validation into several sub-tasks to ensure thorough and consistent semantic analysis of the translation. Then it feeds this analysis to test agent to write and execute tests. Upon observing a test failure, the repair agent attempts to fix the translation bug. The final (in)equivalence decision is made by the verdict agent, considering semantic analyses and test execution results. We compare MatchFixAgent's validation and repair results with four repository-level code translation techniques. We use 2,219 translation pairs from their artifacts, which cover 6 PL pairs, and are collected from 24 GitHub projects totaling over 900K lines of code. Our results demonstrate that MatchFixAgent produces (in)equivalence verdicts for 99.2% of translation pairs, with the same equivalence validation result as prior work on 72.8% of them. When MatchFixAgent's result disagrees with prior work, we find that 60.7% of the time MatchFixAgent's result is actually correct. In addition, we show that MatchFixAgent can repair 50.6% of inequivalent translation, compared to prior work's 18.5%. This demonstrates that MatchFixAgent is far more adaptable to many PL pairs than prior work, while producing highly accurate validation results.
7. Agentic Aerial Cinematography: From Dialogue Cues to Cinematic Trajectories
Authors: Yifan Lin, Sophie Ziyu Liu, Ran Qi, George Z. Xue, Xinping Song, Chao Qin, Hugh H. -T. Liu β’
Published: 2025-09-19 β’
Source: arXiv
We present Agentic Aerial Cinematography: From Dialogue Cues to Cinematic Trajectories (ACDC), an autonomous drone cinematography system driven by natural language communication between human directors and drones. The main limitation of previous drone cinematography workflows is that they require manual selection of waypoints and view angles based on predefined human intent, which is labor-intensive and yields inconsistent performance. In this paper, we propose employing large language models (LLMs) and vision foundation models (VFMs) to convert free-form natural language prompts directly into executable indoor UAV video tours. Specifically, our method comprises a vision-language retrieval pipeline for initial waypoint selection, a preference-based Bayesian optimization framework that refines poses using aesthetic feedback, and a motion planner that generates safe quadrotor trajectories. We validate ACDC through both simulation and hardware-in-the-loop experiments, demonstrating that it robustly produces professional-quality footage across diverse indoor scenes without requiring expertise in robotics or cinematography. These results highlight the potential of embodied AI agents to close the loop from open-vocabulary dialogue to real-world autonomous aerial cinematography.
8. DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
Authors: Yuen Chen, Yian Wang, Hari Sundaram β’
Published: 2025-09-19 β’
Source: arXiv
The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants are widely used to train deep neural networks. In contrast to traditional approaches that focus on tuning the learning rate, we propose a novel adaptive batch size SGD algorithm, DiveBatch, that dynamically adjusts the batch size. Adapting the batch size is challenging: using large batch sizes is more efficient due to parallel computation, but small-batch training often converges in fewer epochs and generalizes better. To address this challenge, we introduce a data-driven adaptation based on gradient diversity, enabling DiveBatch to maintain the generalization performance of small-batch training while improving convergence speed and computational efficiency. Gradient diversity has a strong theoretical justification: it emerges from the convergence analysis of SGD. Evaluations of DiveBatch on synthetic and CiFar-10, CiFar-100, and Tiny-ImageNet demonstrate that DiveBatch converges significantly faster than standard SGD and AdaBatch (1.06 -- 5.0x), with a slight trade-off in performance.
9. UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation
Authors: Xiaoqi Zhao, Youwei Pang, Chenyang Yu, Lihe Zhang, Huchuan Lu, Shijian Lu, Georges El Fakhri, Xiaofeng Liu β’
Published: 2025-09-19 β’
Source: arXiv
Multi-modal image segmentation faces real-world deployment challenges from incomplete/corrupted modalities degrading performance. While existing methods address training-inference modality gaps via specialized per-combination models, they introduce high deployment costs by requiring exhaustive model subsets and model-modality matching. In this work, we propose a unified modality-relax segmentation network (UniMRSeg) through hierarchical self-supervised compensation (HSSC). Our approach hierarchically bridges representation gaps between complete and incomplete modalities across input, feature and output levels. % First, we adopt modality reconstruction with the hybrid shuffled-masking augmentation, encouraging the model to learn the intrinsic modality characteristics and generate meaningful representations for missing modalities through cross-modal fusion. % Next, modality-invariant contrastive learning implicitly compensates the feature space distance among incomplete-complete modality pairs. Furthermore, the proposed lightweight reverse attention adapter explicitly compensates for the weak perceptual semantics in the frozen encoder. Last, UniMRSeg is fine-tuned under the hybrid consistency constraint to ensure stable prediction under all modality combinations without large performance fluctuations. Without bells and whistles, UniMRSeg significantly outperforms the state-of-the-art methods under diverse missing modality scenarios on MRI-based brain tumor segmentation, RGB-D semantic segmentation, RGB-D/T salient object segmentation. The code will be released at https://github.com/Xiaoqi-Zhao-DLUT/UniMRSeg.
10. Symmetry extension by condensation defects in five-dimensional gauge theories
Authors: Matteo Bertolini, Lorenzo Di Pietro, Stefano C. Lanza, Pierluigi Niro, Antonio Santaniello β’
Published: 2025-09-19 β’
Source: arXiv
We investigate the symmetry structure of five-dimensional Yang-Mills theories with $\mathfrak{su}(N)$ gauge algebra. These theories feature intertwined 0-, 1-, and 2-form symmetries, depending on the global variant one is considering. In the $SU(N)$ theory, there is a mixed 't Hooft anomaly between the instantonic 0-form symmetry and the electric 1-form symmetry. We show that in the $PSU(N)$ theory this translates into a $\mathbb{Z}_N$ extension of the instantonic symmetry, generated by an invertible condensation defect of the magnetic 2-form symmetry. We identify the charged configurations as linked 't Hooft surfaces, while pointlike instanton operators remain insensitive to the extension. We generalize our analysis to the $SU(N)/\mathbb{Z}_k$ global form and show that similar results hold, embedded now in a 3-group structure for generic $k$. We then apply our findings to $SO(3)$ supersymmetric Yang-Mills theory. We determine the global form of the enhanced instantonic symmetry of its superconformal UV completion, showing that it arises through a similar symmetry extension mechanism from the parent $E_1$ theory, which is the UV completion of $SU(2)$ supersymmetric Yang-Mills theory. Finally, we recast our results in the language of the symmetry topological field theory. As a warm-up, we also analyze Maxwell theory, highlighting analogous features involving continuous symmetries and composite currents.
11. Who Pays, Who Benefits? Producer-Insurer Games in Life-Saving Medicines
Authors: Delia Coculescu, Maximilian Janisch, Thomas LehΓ©ricy β’
Published: 2025-09-19 β’
Source: arXiv
Pharmaceutical markets for life-saving therapies combine monopoly power with insurance coverage. We build a tractable sequential game in which a patent-holder chooses the drug price, a profit-maximising insurer sets its premium, and a population of heterogeneous agents decide whether to insure and, conditional on diagnosis, whether to purchase treatment. Two sufficient statistics - subjective illness probability and reservation price - capture heterogeneity and nest risk-aversion and liquidity-constraint motives within a unified framework. We prove existence of subgame-perfect Nash equilibria and show that entry of an insurer strictly raises producer profits but may raise or lower both drug prices and treatment uptake, depending on the joint distribution of the population statistics. Numerical experiments calibrated to flexible parametric families illustrate non-monotone comparative statics and quantify conditions under which insurance reduces access. Our results provide benchmarks for evaluating price negotiations, price caps, and subsidy schemes in high-cost drug markets.
12. Equitably Coloring Planar and Outerplanar Graphs
Authors: Daniel W. Cranston, Reem Mahmoud β’
Published: 2025-09-19 β’
Source: arXiv
A proper $s$-coloring of an $n$-vertex graph is \emph{equitable} if every color class has size $\lfloor{n/s}\rfloor$ or $\lceil{n/s}\rceil$. A necessary condition to have an equitable $s$-coloring is that every vertex $v$ appears in an independent set of size at least $\lfloor{n/s}\rfloor$. That is $\min_{v\in V(G)}\alpha_v\ge \lfloor{n/s}\rfloor$. Various authors showed that when $G$ is a tree and $s\ge 3$ this obvious necessary condition is also sufficient. Kierstead, Kostochka, and Xiang asked whether this result holds more generally for all outerplanar graphs. We show that the answer is No when $s=3$, but that the answer is Yes when $s\ge 6$. The case $s\in\{4,5\}$ remains open. We also prove an analogous result for planar graphs, with a necessary and sufficient hypothesis. Fix $s\ge 40$. Let $G$ be a planar graph, and let $w_0,w_1$ be its $2$ vertices with largest degrees. If there exist disjoint independent sets $I_0, I_1$ such that $|I_0|=\lfloor{n/s}\rfloor$ and $|I_1| = \lfloor{(n+1)/s}\rfloor$ and $w_0,w_1\in I_0\cup I_1$, then $G$ has an equitable $s$-coloring.
13. RadarGaussianDet3D: An Efficient and Effective Gaussian-based 3D Detector with 4D Automotive Radars
Authors: Weiyi Xiong, Bing Zhu, Tao Huang, Zewei Zheng β’
Published: 2025-09-19 β’
Source: arXiv
4D automotive radars have gained increasing attention for autonomous driving due to their low cost, robustness, and inherent velocity measurement capability. However, existing 4D radar-based 3D detectors rely heavily on pillar encoders for BEV feature extraction, where each point contributes to only a single BEV grid, resulting in sparse feature maps and degraded representation quality. In addition, they also optimize bounding box attributes independently, leading to sub-optimal detection accuracy. Moreover, their inference speed, while sufficient for high-end GPUs, may fail to meet the real-time requirement on vehicle-mounted embedded devices. To overcome these limitations, an efficient and effective Gaussian-based 3D detector, namely RadarGaussianDet3D is introduced, leveraging Gaussian primitives and distributions as intermediate representations for radar points and bounding boxes. In RadarGaussianDet3D, a novel Point Gaussian Encoder (PGE) is designed to transform each point into a Gaussian primitive after feature aggregation and employs the 3D Gaussian Splatting (3DGS) technique for BEV rasterization, yielding denser feature maps. PGE exhibits exceptionally low latency, owing to the optimized algorithm for point feature aggregation and fast rendering of 3DGS. In addition, a new Box Gaussian Loss (BGL) is proposed, which converts bounding boxes into 3D Gaussian distributions and measures their distance to enable more comprehensive and consistent optimization. Extensive experiments on TJ4DRadSet and View-of-Delft demonstrate that RadarGaussianDet3D achieves state-of-the-art detection accuracy while delivering substantially faster inference, highlighting its potential for real-time deployment in autonomous driving.
14. A generalized canonical metric for optimization on the indefinite Stiefel manifold
Authors: Dinh Van Tiep, Duong Thi Viet An, Nguyen Thi Ngoc Oanh, Nguyen Thanh Son β’
Published: 2025-09-19 β’
Source: arXiv
Various tasks in scientific computing can be modeled as an optimization problem on the indefinite Stiefel manifold. We address this using the Riemannian approach, which basically consists of equipping the feasible set with a Riemannian metric, preparing geometric tools such as orthogonal projections, formulae for Riemannian gradient, retraction and then extending an unconstrained optimization algorithm on the Euclidean space to the established manifold. The choice for the metric undoubtedly has a great influence on the method. In the previous work [D.V. Tiep and N.T. Son, A Riemannian gradient descent method for optimization on the indefinite Stiefel manifold, arXiv:2410.22068v2[math.OC]], a tractable metric, which is indeed a family of Riemannian metrics defined by a symmetric positive-definite matrix depending on the contact point, has been used. In general, it requires solving a Lyapunov matrix equation every time when the gradient of the cost function is needed, which might significantly contribute to the computational cost. To address this issue, we propose a new Riemannian metric for the indefinite Stiefel manifold. Furthermore, we construct the associated geometric structure, including a so-called quasi-geodesic and propose a retraction based on this curve. We then numerically verify the performance of the Riemannian gradient descent method associated with the new geometry and compare it with the previous work.
15. Geodesic clustering of zeros of Eisenstein series for congruence groups
Authors: SebastiΓ‘n Carrillo Santana, Gunther Cornelissen, Berend Ringeling β’
Published: 2025-09-19 β’
Source: arXiv
We consider a set of generators for the space of Eisenstein series of even weight $k$ for any congruence group $\Gamma$ and study the set of all of their zeros taken for $\Gamma(1)$-conjugates of $\Gamma$ in the standard fundamental domain for $\Gamma(1)$. We describe (a) an upper bound $\kappa_\Gamma + O(1/k)$ for their imaginary part; (b) a finite configuration of geodesics segments to which all zeros converge in Hausdorff distance as $k \rightarrow \infty$; (c) a finite set containing all algebraic zeros for all weights. The bound in (a) depends on the (non-)vanishing of a new generalization of Ramanujan sums. The proof of (b) originates in a method used to study phase transitions in statistical physics. The proof of (c) relies on the theory of complex multiplication. The results can be made quantitative for specific groups. For $\Gamma=\Gamma(N)$ with $4 \nmid N$, $\kappa_\Gamma=1$ and the zeros tend to the unit circle, whereas if $4 \mid N$, $\kappa_\Gamma=2$ and the limit configuration includes parts of vertical geodesics and circles of radius $2$. In both cases, the only algebraic zeros are at $\mathrm{i}$ and $\exp(2\pi \mathrm{i}/3)$ for sufficiently large $k$. For $\Gamma(N)$ with $N$ odd, we use finer estimates to prove a trichotomy for the exact `convergence speed' of the zeros to the unit circle, as well as angular equidistribution of the zeros as $k \rightarrow \infty$.
16. SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features
Authors: Jinyuan Qu, Hongyang Li, Xingyu Chen, Shilong Liu, Yukai Shi, Tianhe Ren, Ruitao Jing, Lei Zhang β’
Published: 2025-09-19 β’
Source: arXiv
In this paper, we present SegDINO3D, a novel Transformer encoder-decoder framework for 3D instance segmentation. As 3D training data is generally not as sufficient as 2D training images, SegDINO3D is designed to fully leverage 2D representation from a pre-trained 2D detection model, including both image-level and object-level features, for improving 3D representation. SegDINO3D takes both a point cloud and its associated 2D images as input. In the encoder stage, it first enriches each 3D point by retrieving 2D image features from its corresponding image views and then leverages a 3D encoder for 3D context fusion. In the decoder stage, it formulates 3D object queries as 3D anchor boxes and performs cross-attention from 3D queries to 2D object queries obtained from 2D images using the 2D detection model. These 2D object queries serve as a compact object-level representation of 2D images, effectively avoiding the challenge of keeping thousands of image feature maps in the memory while faithfully preserving the knowledge of the pre-trained 2D model. The introducing of 3D box queries also enables the model to modulate cross-attention using the predicted boxes for more precise querying. SegDINO3D achieves the state-of-the-art performance on the ScanNetV2 and ScanNet200 3D instance segmentation benchmarks. Notably, on the challenging ScanNet200 dataset, SegDINO3D significantly outperforms prior methods by +8.7 and +6.8 mAP on the validation and hidden test sets, respectively, demonstrating its superiority.
17. AdaSports-Traj: Role- and Domain-Aware Adaptation for Multi-Agent Trajectory Modeling in Sports
Authors: Yi Xu, Yun Fu β’
Published: 2025-09-19 β’
Source: arXiv
Trajectory prediction in multi-agent sports scenarios is inherently challenging due to the structural heterogeneity across agent roles (e.g., players vs. ball) and dynamic distribution gaps across different sports domains. Existing unified frameworks often fail to capture these structured distributional shifts, resulting in suboptimal generalization across roles and domains. We propose AdaSports-Traj, an adaptive trajectory modeling framework that explicitly addresses both intra-domain and inter-domain distribution discrepancies in sports. At its core, AdaSports-Traj incorporates a Role- and Domain-Aware Adapter to conditionally adjust latent representations based on agent identity and domain context. Additionally, we introduce a Hierarchical Contrastive Learning objective, which separately supervises role-sensitive and domain-aware representations to encourage disentangled latent structures without introducing optimization conflict. Experiments on three diverse sports datasets, Basketball-U, Football-U, and Soccer-U, demonstrate the effectiveness of our adaptive design, achieving strong performance in both unified and cross-domain trajectory prediction settings.
18. Blind-Spot Guided Diffusion for Self-supervised Real-World Denoising
Authors: Shen Cheng, Haipeng Li, Haibin Huang, Xiaohong Liu, Shuaicheng Liu β’
Published: 2025-09-19 β’
Source: arXiv
In this work, we present Blind-Spot Guided Diffusion, a novel self-supervised framework for real-world image denoising. Our approach addresses two major challenges: the limitations of blind-spot networks (BSNs), which often sacrifice local detail and introduce pixel discontinuities due to spatial independence assumptions, and the difficulty of adapting diffusion models to self-supervised denoising. We propose a dual-branch diffusion framework that combines a BSN-based diffusion branch, generating semi-clean images, with a conventional diffusion branch that captures underlying noise distributions. To enable effective training without paired data, we use the BSN-based branch to guide the sampling process, capturing noise structure while preserving local details. Extensive experiments on the SIDD and DND datasets demonstrate state-of-the-art performance, establishing our method as a highly effective self-supervised solution for real-world denoising. Code and pre-trained models are released at: https://github.com/Sumching/BSGD.
19. Misspecified learning and evolutionary stability
Authors: Kevin He, Jonathan Libgober β’
Published: 2025-09-19 β’
Source: arXiv
We extend the indirect evolutionary approach to the selection of (possibly misspecified) models. Agents with different models match in pairs to play a stage game, where models define feasible beliefs about game parameters and about others' strategies. In equilibrium, each agent adopts the feasible belief that best fits their data and plays optimally given their beliefs. We define the stability of the resident model by comparing its equilibrium payoff with that of the entrant model, and provide conditions under which the correctly specified resident model can only be destabilized by misspecified entrant models that contain multiple feasible beliefs (that is, entrant models that permit inference). We also show that entrants may do well in their matches against the residents only when the entrant population is large, due to the endogeneity of misspecified beliefs. Applications include the selection of demand-elasticity misperception in Cournot duopoly and the emergence of analogy-based reasoning in centipede games.
20. Inverse Optimization Latent Variable Models for Learning Costs Applied to Route Problems
Authors: Alan A. Lahoud, Erik Schaffernicht, Johannes A. Stork β’
Published: 2025-09-19 β’
Source: arXiv
Learning representations for solutions of constrained optimization problems (COPs) with unknown cost functions is challenging, as models like (Variational) Autoencoders struggle to enforce constraints when decoding structured outputs. We propose an Inverse Optimization Latent Variable Model (IO-LVM) that learns a latent space of COP cost functions from observed solutions and reconstructs feasible outputs by solving a COP with a solver in the loop. Our approach leverages estimated gradients of a Fenchel-Young loss through a non-differentiable deterministic solver to shape the latent space. Unlike standard Inverse Optimization or Inverse Reinforcement Learning methods, which typically recover a single or context-specific cost function, IO-LVM captures a distribution over cost functions, enabling the identification of diverse solution behaviors arising from different agents or conditions not available during the training process. We validate our method on real-world datasets of ship and taxi routes, as well as paths in synthetic graphs, demonstrating its ability to reconstruct paths and cycles, predict their distributions, and yield interpretable latent representations.
21. Uncertainty-Based Smooth Policy Regularisation for Reinforcement Learning with Few Demonstrations
Authors: Yujie Zhu, Charles A. Hepburn, Matthew Thorpe, Giovanni Montana β’
Published: 2025-09-19 β’
Source: arXiv
In reinforcement learning with sparse rewards, demonstrations can accelerate learning, but determining when to imitate them remains challenging. We propose Smooth Policy Regularisation from Demonstrations (SPReD), a framework that addresses the fundamental question: when should an agent imitate a demonstration versus follow its own policy? SPReD uses ensemble methods to explicitly model Q-value distributions for both demonstration and policy actions, quantifying uncertainty for comparisons. We develop two complementary uncertainty-aware methods: a probabilistic approach estimating the likelihood of demonstration superiority, and an advantage-based approach scaling imitation by statistical significance. Unlike prevailing methods (e.g. Q-filter) that make binary imitation decisions, SPReD applies continuous, uncertainty-proportional regularisation weights, reducing gradient variance during training. Despite its computational simplicity, SPReD achieves remarkable gains in experiments across eight robotics tasks, outperforming existing approaches by up to a factor of 14 in complex tasks while maintaining robustness to demonstration quality and quantity. Our code is available at https://github.com/YujieZhu7/SPReD.
22. Simultaneous real-time multispectral fluorescence and reflectance imaging for enhanced intraoperative guidance
Authors: Kyriakos Pentarakis, George Kakavelakis, Constantinos Petridis, George Themelis β’
Published: 2025-09-19 β’
Source: arXiv
Intraoperative optical imaging is essential for surgical precision and patient safety, but current systems present anatomical and fluorescence information separately, causing delays and increasing cognitive load. A unified system for simultaneous visualization is critically needed to enhance guidance and efficiency. To develop and validate a multispectral imaging system that overcomes fragmented intraoperative imaging by simultaneously capturing and integrating white light reflectance and multiple fluorescence signals in real-time, enhancing surgical precision and safety. The system integrates two synchronized color cameras with optimized optical components, including custom filters and a beam splitter, tailored for clinically relevant fluorophores protoporphyrin IX (PpIX), fluorescein sodium, and indocyanine green (ICG). Advanced image processing techniques enhance visualization accuracy. Performance was evaluated using phantoms simulating clinical conditions. The system provided real time, simultaneous visualization of white light and multiple fluorescence signals without mode switching. Fluorescence emissions from PpIX, fluorescein sodium, and ICG were accurately separated using linear unmixing algorithms. Reflectance images showed high color fidelity, with Delta E values indicating imperceptible to barely perceptible differences compared to a conventional camera. Latency was minimal, ensuring immediate visual feedback suitable for intraoperative use. This multispectral imaging system addresses the limitations of fragmented intraoperative imaging by enabling continuous, real time, integrated visualization of anatomical and fluorescence information. It streamlines workflow, reduces cognitive load, and supports more precise interventions. By leveraging components similar to conventional surgical microscopes, it offers a practical, cost effective solution aligned with clinical needs.
23. CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine
Authors: Shiyu Fang, Yiming Cui, Haoyang Liang, Chen Lv, Peng Hang, Jian Sun β’
Published: 2025-09-19 β’
Source: arXiv
Autonomous Driving (AD) systems have made notable progress, but their performance in long-tail, safety-critical scenarios remains limited. These rare cases contribute a disproportionate number of accidents. Vision-Language Action (VLA) models have strong reasoning abilities and offer a potential solution, but their effectiveness is limited by the lack of high-quality data and inefficient learning in such conditions. To address these challenges, we propose CoReVLA, a continual learning end-to-end autonomous driving framework that improves the performance in long-tail scenarios through a dual-stage process of data Collection and behavior Refinement. First, the model is jointly fine-tuned on a mixture of open-source driving QA datasets, allowing it to acquire a foundational understanding of driving scenarios. Next, CoReVLA is deployed within the Cave Automatic Virtual Environment (CAVE) simulation platform, where driver takeover data is collected from real-time interactions. Each takeover indicates a long-tail scenario that CoReVLA fails to handle reliably. Finally, the model is refined via Direct Preference Optimization (DPO), allowing it to learn directly from human preferences and thereby avoid reward hacking caused by manually designed rewards. Extensive open-loop and closed-loop experiments demonstrate that the proposed CoReVLA model can accurately perceive driving scenarios and make appropriate decisions. On the Bench2Drive benchmark, CoReVLA achieves a Driving Score (DS) of 72.18 and a Success Rate (SR) of 50%, outperforming state-of-the-art methods by 7.96 DS and 15% SR under long-tail, safety-critical scenarios. Furthermore, case studies demonstrate the model's ability to continually improve its performance in similar failure-prone scenarios by leveraging past takeover experiences. All codea and preprocessed datasets are available at: https://github.com/FanGShiYuu/CoReVLA
24. RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation
Authors: Chao Yu, Yuanqing Wang, Zhen Guo, Hao Lin, Si Xu, Hongzhi Zang, Quanlu Zhang, Yongji Wu, Chunyang Zhu, Junhao Hu, Zixiao Huang, Mingjie Wei, Yuqing Xie, Ke Yang, Bo Dai, Zhexuan Xu, Xiangyuan Wang, Xu Fu, Zhihao Liu, Kang Chen, Weilin Liu, Gang Liu, Boxun Li, Jianlei Yang, Zhi Yang, Guohao Dai, Yu Wang β’
Published: 2025-09-19 β’
Source: arXiv
Reinforcement learning (RL) has demonstrated immense potential in advancing artificial general intelligence, agentic intelligence, and embodied intelligence. However, the inherent heterogeneity and dynamicity of RL workflows often lead to low hardware utilization and slow training on existing systems. In this paper, we present RLinf, a high-performance RL training system based on our key observation that the major roadblock to efficient RL training lies in system flexibility. To maximize flexibility and efficiency, RLinf is built atop a novel RL system design paradigm called macro-to-micro flow transformation (M2Flow), which automatically breaks down high-level, easy-to-compose RL workflows at both the temporal and spatial dimensions, and recomposes them into optimized execution flows. Supported by RLinf worker's adaptive communication capability, we devise context switching and elastic pipelining to realize M2Flow transformation, and a profiling-guided scheduling policy to generate optimal execution plans. Extensive evaluations on both reasoning RL and embodied RL tasks demonstrate that RLinf consistently outperforms state-of-the-art systems, achieving 1.1x-2.13x speedup in end-to-end training throughput.
25. Explainable AI for Maritime Autonomous Surface Ships (MASS): Adaptive Interfaces and Trustworthy Human-AI Collaboration
Authors: Zhuoyue Zhang, Haitong Xu β’
Published: 2025-09-19 β’
Source: arXiv
Autonomous navigation in maritime domains is accelerating alongside advances in artificial intelligence, sensing, and connectivity. Opaque decision-making and poorly calibrated human-automation interaction remain key barriers to safe adoption. This article synthesizes 100 studies on automation transparency for Maritime Autonomous Surface Ships (MASS) spanning situation awareness (SA), human factors, interface design, and regulation. We (i) map the Guidance-Navigation-Control stack to shore-based operational modes -- remote supervision (RSM) and remote control (RCM) -- and identify where human unsafe control actions (Human-UCAs) concentrate in handover and emergency loops; (ii) summarize evidence that transparency features (decision rationales, alternatives, confidence/uncertainty, and rule-compliance indicators) improve understanding and support trust calibration, though reliability and predictability often dominate trust; (iii) distill design strategies for transparency at three layers: sensor/SA acquisition and fusion, HMI/eHMI presentation (textual/graphical overlays, color coding, conversational and immersive UIs), and engineer-facing processes (resilient interaction design, validation, and standardization). We integrate methods for Human-UCA identification (STPA-Cog + IDAC), quantitative trust/SA assessment, and operator workload monitoring, and outline regulatory and rule-based implications including COLREGs formalization and route exchange. We conclude with an adaptive transparency framework that couples operator state estimation with explainable decision support to reduce cognitive overload and improve takeover timeliness. The review highlights actionable figure-of-merit displays (e.g., CPA/TCPA risk bars, robustness heatmaps), transparent model outputs (rule traceability, confidence), and training pipelines (HIL/MIL, simulation) as near-term levers for safer MASS operations.
26. EHR-MCP: Real-world Evaluation of Clinical Information Retrieval by Large Language Models via Model Context Protocol
Authors: Kanato Masayoshi, Masahiro Hashimoto, Ryoichi Yokoyama, Naoki Toda, Yoshifumi Uwamino, Shogo Fukuda, Ho Namkoong, Masahiro Jinzaki β’
Published: 2025-09-19 β’
Source: arXiv
Background: Large language models (LLMs) show promise in medicine, but their deployment in hospitals is limited by restricted access to electronic health record (EHR) systems. The Model Context Protocol (MCP) enables integration between LLMs and external tools. Objective: To evaluate whether an LLM connected to an EHR database via MCP can autonomously retrieve clinically relevant information in a real hospital setting. Methods: We developed EHR-MCP, a framework of custom MCP tools integrated with the hospital EHR database, and used GPT-4.1 through a LangGraph ReAct agent to interact with it. Six tasks were tested, derived from use cases of the infection control team (ICT). Eight patients discussed at ICT conferences were retrospectively analyzed. Agreement with physician-generated gold standards was measured. Results: The LLM consistently selected and executed the correct MCP tools. Except for two tasks, all tasks achieved near-perfect accuracy. Performance was lower in the complex task requiring time-dependent calculations. Most errors arose from incorrect arguments or misinterpretation of tool results. Responses from EHR-MCP were reliable, though long and repetitive data risked exceeding the context window. Conclusions: LLMs can retrieve clinical data from an EHR via MCP tools in a real hospital setting, achieving near-perfect performance in simple tasks while highlighting challenges in complex ones. EHR-MCP provides an infrastructure for secure, consistent data access and may serve as a foundation for hospital AI agents. Future work should extend beyond retrieval to reasoning, generation, and clinical impact assessment, paving the way for effective integration of generative AI into clinical practice.
27. Swarm Oracle: Trustless Blockchain Agreements through Robot Swarms
Authors: Alexandre Pacheco, Hanqing Zhao, Volker Strobel, Tarik Roukny, Gregory Dudek, Andreagiovanni Reina, Marco Dorigo β’
Published: 2025-09-19 β’
Source: arXiv
Blockchain consensus, rooted in the principle ``don't trust, verify'', limits access to real-world data, which may be ambiguous or inaccessible to some participants. Oracles address this limitation by supplying data to blockchains, but existing solutions may reduce autonomy, transparency, or reintroduce the need for trust. We propose Swarm Oracle: a decentralized network of autonomous robots -- that is, a robot swarm -- that use onboard sensors and peer-to-peer communication to collectively verify real-world data and provide it to smart contracts on public blockchains. Swarm Oracle leverages the built-in decentralization, fault tolerance and mobility of robot swarms, which can flexibly adapt to meet information requests on-demand, even in remote locations. Unlike typical cooperative robot swarms, Swarm Oracle integrates robots from multiple stakeholders, protecting the system from single-party biases but also introducing potential adversarial behavior. To ensure the secure, trustless and global consensus required by blockchains, we employ a Byzantine fault-tolerant protocol that enables robots from different stakeholders to operate together, reaching social agreements of higher quality than the estimates of individual robots. Through extensive experiments using both real and simulated robots, we showcase how consensus on uncertain environmental information can be achieved, despite several types of attacks orchestrated by large proportions of the robots, and how a reputation system based on blockchain tokens lets Swarm Oracle autonomously recover from faults and attacks, a requirement for long-term operation.
28. Right-Side-Out: Learning Zero-Shot Sim-to-Real Garment Reversal
Authors: Chang Yu, Siyu Ma, Wenxin Du, Zeshun Zong, Han Xue, Wendi Chen, Cewu Lu, Yin Yang, Xuchen Han, Joseph Masterjohn, Alejandro Castro, Chenfanfu Jiang β’
Published: 2025-09-19 β’
Source: arXiv
Turning garments right-side out is a challenging manipulation task: it is highly dynamic, entails rapid contact changes, and is subject to severe visual occlusion. We introduce Right-Side-Out, a zero-shot sim-to-real framework that effectively solves this challenge by exploiting task structures. We decompose the task into Drag/Fling to create and stabilize an access opening, followed by Insert&Pull to invert the garment. Each step uses a depth-inferred, keypoint-parameterized bimanual primitive that sharply reduces the action space while preserving robustness. Efficient data generation is enabled by our custom-built, high-fidelity, GPU-parallel Material Point Method (MPM) simulator that models thin-shell deformation and provides robust and efficient contact handling for batched rollouts. Built on the simulator, our fully automated pipeline scales data generation by randomizing garment geometry, material parameters, and viewpoints, producing depth, masks, and per-primitive keypoint labels without any human annotations. With a single depth camera, policies trained entirely in simulation deploy zero-shot on real hardware, achieving up to 81.3% success rate. By employing task decomposition and high fidelity simulation, our framework enables tackling highly dynamic, severely occluded tasks without laborious human demonstrations.
29. Quantum Metric Spaces: Replacing Fuzzy Metrics with the Hilbert Space Structure of Quantum States
Authors: Nicola Fabiano β’
Published: 2025-09-19 β’
Source: arXiv
Fuzzy metric spaces, grounded in t-norms and membership functions, have been widely proposed to model uncertainty in machine learning, decision systems, and artificial intelligence. Yet these frameworks treat uncertainty as an external layer of imprecision imposed upon classical, point-like entities - a conceptual mismatch for domains where indeterminacy is intrinsic, such as quantum systems or cognitive representations. We argue that fuzzy metrics are unnecessary for modeling such uncertainty: instead, the well-established structure of complex Hilbert spaces - the foundational language of quantum mechanics for over a century - provides a natural, rigorous, and non-contradictory metric space where the ``points'' are quantum states themselves. The distance between states is given by the Hilbert norm, which directly encodes state distinguishability via the Born rule. This framework inherently captures the non-classical nature of uncertainty without requiring fuzzy logic, t-norms, or membership degrees. We demonstrate its power by modeling AI concepts as Gaussian wavefunctions and classifying ambiguous inputs via quantum overlap integrals. Unlike fuzzy methods, our approach naturally handles interference, distributional shape, and concept compositionality through the geometry of state vectors. We conclude that fuzzy metric spaces, while historically useful, are obsolete for representing intrinsic uncertainty - superseded by the more robust, predictive, and ontologically coherent framework of quantum state geometry.
30. Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
Authors: Zhiyu Mou, Yiqin Lv, Miao Xu, Cheems Wang, Yixiu Mao, Qichen Ye, Chao Li, Rongquan Bai, Chuan Yu, Jian Xu, Bo Zheng β’
Published: 2025-09-19 β’
Source: arXiv
Auto-bidding is an essential tool for advertisers to enhance their advertising performance. Recent progress has shown that AI-Generated Bidding (AIGB), which formulates the auto-bidding as a trajectory generation task and trains a conditional diffusion-based planner on offline data, achieves superior and stable performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still encounter a performance bottleneck due to their neglect of fine-grained generation quality evaluation and inability to explore beyond static datasets. To address this, we propose AIGB-Pearl (\emph{Planning with EvAluator via RL}), a novel method that integrates generative planning and policy optimization. The key to AIGB-Pearl is to construct a non-bootstrapped \emph{trajectory evaluator} to assign rewards and guide policy search, enabling the planner to optimize its generation quality iteratively through interaction. Furthermore, to enhance trajectory evaluator accuracy in offline settings, we incorporate three key techniques: (i) a Large Language Model (LLM)-based architecture for better representational capacity, (ii) hybrid point-wise and pair-wise losses for better score learning, and (iii) adaptive integration of expert feedback for better generalization ability. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.
31. Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment
Authors: Ahmed Karim, Qiao Wang, Zheng Yuan β’
Published: 2025-09-19 β’
Source: arXiv
Automated Essay Scoring (AES) systems now reach near human agreement on some public benchmarks, yet real-world adoption, especially in high-stakes examinations, remains limited. A principal obstacle is that most models output a single score without any accompanying measure of confidence or explanation. We address this gap with conformal prediction, a distribution-free wrapper that equips any classifier with set-valued outputs and formal coverage guarantees. Two open-source large language models (Llama-3 8B and Qwen-2.5 3B) are fine-tuned on three diverse corpora (ASAP, TOEFL11, Cambridge-FCE) and calibrated at a 90 percent risk level. Reliability is assessed with UAcc, an uncertainty-aware accuracy that rewards models for being both correct and concise. To our knowledge, this is the first work to combine conformal prediction and UAcc for essay scoring. The calibrated models consistently meet the coverage target while keeping prediction sets compact, indicating that open-source, mid-sized LLMs can already support teacher-in-the-loop AES; we discuss scaling and broader user studies as future work.
32. From Data to Diagnosis: A Large, Comprehensive Bone Marrow Dataset and AI Methods for Childhood Leukemia Prediction
Authors: Henning HΓΆfener, Farina Kock, Martina Pontones, Tabita Ghete, David Pfrang, Nicholas Dickel, Meik Kunz, Daniela P. Schacherer, David A. Clunie, Andrey Fedorov, Max Westphal, Markus Metzler β’
Published: 2025-09-19 β’
Source: arXiv
Leukemia diagnosis primarily relies on manual microscopic analysis of bone marrow morphology supported by additional laboratory parameters, making it complex and time consuming. While artificial intelligence (AI) solutions have been proposed, most utilize private datasets and only cover parts of the diagnostic pipeline. Therefore, we present a large, high-quality, publicly available leukemia bone marrow dataset spanning the entire diagnostic process, from cell detection to diagnosis. Using this dataset, we further propose methods for cell detection, cell classification, and diagnosis prediction. The dataset comprises 246 pediatric patients with diagnostic, clinical and laboratory information, over 40 000 cells with bounding box annotations and more than 28 000 of these with high-quality class labels, making it the most comprehensive dataset publicly available. Evaluation of the AI models yielded an average precision of 0.96 for the cell detection, an area under the curve of 0.98, and an F1-score of 0.61 for the 33-class cell classification, and a mean F1-score of 0.90 for the diagnosis prediction using predicted cell counts. While the proposed approaches demonstrate their usefulness for AI-assisted diagnostics, the dataset will foster further research and development in the field, ultimately contributing to more precise diagnoses and improved patient outcomes.
33. Self-Supervised Cross-Modal Learning for Image-to-Point Cloud Registration
Authors: Xingmei Wang, Xiaoyu Hu, Chengkai Huang, Ziyan Zeng, Guohao Nie, Quan Z. Sheng, Lina Yao β’
Published: 2025-09-19 β’
Source: arXiv
Bridging 2D and 3D sensor modalities is critical for robust perception in autonomous systems. However, image-to-point cloud (I2P) registration remains challenging due to the semantic-geometric gap between texture-rich but depth-ambiguous images and sparse yet metrically precise point clouds, as well as the tendency of existing methods to converge to local optima. To overcome these limitations, we introduce CrossI2P, a self-supervised framework that unifies cross-modal learning and two-stage registration in a single end-to-end pipeline. First, we learn a geometric-semantic fused embedding space via dual-path contrastive learning, enabling annotation-free, bidirectional alignment of 2D textures and 3D structures. Second, we adopt a coarse-to-fine registration paradigm: a global stage establishes superpoint-superpixel correspondences through joint intra-modal context and cross-modal interaction modeling, followed by a geometry-constrained point-level refinement for precise registration. Third, we employ a dynamic training mechanism with gradient normalization to balance losses for feature alignment, correspondence refinement, and pose estimation. Extensive experiments demonstrate that CrossI2P outperforms state-of-the-art methods by 23.7% on the KITTI Odometry benchmark and by 37.9% on nuScenes, significantly improving both accuracy and robustness.
34. EvoBrain: Dynamic Multi-channel EEG Graph Modeling for Time-evolving Brain Network
Authors: Rikuto Kotoge, Zheng Chen, Tasuku Kimura, Yasuko Matsubara, Takufumi Yanagisawa, Haruhiko Kishima, Yasushi Sakurai β’
Published: 2025-09-19 β’
Source: arXiv
Dynamic GNNs, which integrate temporal and spatial features in Electroencephalography (EEG) data, have shown great potential in automating seizure detection. However, fully capturing the underlying dynamics necessary to represent brain states, such as seizure and non-seizure, remains a non-trivial task and presents two fundamental challenges. First, most existing dynamic GNN methods are built on temporally fixed static graphs, which fail to reflect the evolving nature of brain connectivity during seizure progression. Second, current efforts to jointly model temporal signals and graph structures and, more importantly, their interactions remain nascent, often resulting in inconsistent performance. To address these challenges, we present the first theoretical analysis of these two problems, demonstrating the effectiveness and necessity of explicit dynamic modeling and time-then-graph dynamic GNN method. Building on these insights, we propose EvoBrain, a novel seizure detection model that integrates a two-stream Mamba architecture with a GCN enhanced by Laplacian Positional Encoding, following neurological insights. Moreover, EvoBrain incorporates explicitly dynamic graph structures, allowing both nodes and edges to evolve over time. Our contributions include (a) a theoretical analysis proving the expressivity advantage of explicit dynamic modeling and time-then-graph over other approaches, (b) a novel and efficient model that significantly improves AUROC by 23% and F1 score by 30%, compared with the dynamic GNN baseline, and (c) broad evaluations of our method on the challenging early seizure prediction tasks.
35. A Nascent Taxonomy of Machine Learning in Intelligent Robotic Process Automation
Authors: Lukas Laakmann, Seyyid A. Ciftci, Christian Janiesch β’
Published: 2025-09-19 β’
Source: arXiv
Robotic process automation (RPA) is a lightweight approach to automating business processes using software robots that emulate user actions at the graphical user interface level. While RPA has gained popularity for its cost-effective and timely automation of rule-based, well-structured tasks, its symbolic nature has inherent limitations when approaching more complex tasks currently performed by human agents. Machine learning concepts enabling intelligent RPA provide an opportunity to broaden the range of automatable tasks. In this paper, we conduct a literature review to explore the connections between RPA and machine learning and organize the joint concept intelligent RPA into a taxonomy. Our taxonomy comprises the two meta-characteristics RPA-ML integration and RPA-ML interaction. Together, they comprise eight dimensions: architecture and ecosystem, capabilities, data basis, intelligence level, and technical depth of integration as well as deployment environment, lifecycle phase, and user-robot relation.