1. Train Once, Deploy Anywhere: Realize Data-Efficient Dynamic Object Manipulation
Authors: Zhuoling Li, Xiaoyang Wu, Zhenhua Xu, Hengshuang Zhao β’
Published: 2025-08-19 β’
Source: arXiv
Realizing generalizable dynamic object manipulation is important for enhancing manufacturing efficiency, as it eliminates specialized engineering for various scenarios. To this end, imitation learning emerges as a promising paradigm, leveraging expert demonstrations to teach a policy manipulation skills. Although the generalization of an imitation learning policy can be improved by increasing demonstrations, demonstration collection is labor-intensive. To address this problem, this paper investigates whether strong generalization in dynamic object manipulation is achievable with only a few demonstrations. Specifically, we develop an entropy-based theoretical framework to quantify the optimization of imitation learning. Based on this framework, we propose a system named Generalizable Entropy-based Manipulation (GEM). Extensive experiments in simulated and real tasks demonstrate that GEM can generalize across diverse environment backgrounds, robot embodiments, motion dynamics, and object geometries. Notably, GEM has been deployed in a real canteen for tableware collection. Without any in-scene demonstration, it achieves a success rate of over 97% across more than 10,000 operations.
2. ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Authors: Hanyu Lai, Xiao Liu, Yanxiao Zhao, Han Xu, Hanchen Zhang, Bohao Jing, Yanyu Ren, Shuntian Yao, Yuxiao Dong, Jie Tang β’
Published: 2025-08-19 β’
Source: arXiv
We introduce ComputerRL, a framework for autonomous desktop intelligence that enables agents to operate complex digital workspaces skillfully. ComputerRL features the API-GUI paradigm, which unifies programmatic API calls and direct GUI interaction to address the inherent mismatch between machine agents and human-centric desktop environments. Scaling end-to-end RL training is crucial for improvement and generalization across diverse desktop tasks, yet remains challenging due to environmental inefficiency and instability in extended training. To support scalable and robust training, we develop a distributed RL infrastructure capable of orchestrating thousands of parallel virtual desktop environments to accelerate large-scale online RL. Furthermore, we propose Entropulse, a training strategy that alternates reinforcement learning with supervised fine-tuning, effectively mitigating entropy collapse during extended training runs. We employ ComputerRL on open models GLM-4-9B-0414 and Qwen2.5-14B, and evaluate them on the OSWorld benchmark. The AutoGLM-OS-9B based on GLM-4-9B-0414 achieves a new state-of-the-art accuracy of 48.1%, demonstrating significant improvements for general agents in desktop automation. The algorithm and framework are adopted in building AutoGLM (Liu et al., 2024a)
3. Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
Authors: Omkar Thawakar, Dmitry Demidov, Ritesh Thawkar, Rao Muhammad Anwer, Mubarak Shah, Fahad Shahbaz Khan, Salman Khan β’
Published: 2025-08-19 β’
Source: arXiv
Composed video retrieval is a challenging task that strives to retrieve a target video based on a query video and a textual description detailing specific modifications. Standard retrieval frameworks typically struggle to handle the complexity of fine-grained compositional queries and variations in temporal understanding limiting their retrieval ability in the fine-grained setting. To address this issue, we introduce a novel dataset that captures both fine-grained and composed actions across diverse video segments, enabling more detailed compositional changes in retrieved video content. The proposed dataset, named Dense-WebVid-CoVR, consists of 1.6 million samples with dense modification text that is around seven times more than its existing counterpart. We further develop a new model that integrates visual and textual information through Cross-Attention (CA) fusion using grounded text encoder, enabling precise alignment between dense query modifications and target videos. The proposed model achieves state-of-the-art results surpassing existing methods on all metrics. Notably, it achieves 71.3\% Recall@1 in visual+text setting and outperforms the state-of-the-art by 3.4\%, highlighting its efficacy in terms of leveraging detailed video descriptions and dense modification texts. Our proposed dataset, code, and model are available at :https://github.com/OmkarThawakar/BSE-CoVR
4. Distilled-3DGS:Distilled 3D Gaussian Splatting
Authors: Lintao Xiang, Xinkai Chen, Jianhuang Lai, Guangcong Wang β’
Published: 2025-08-19 β’
Source: arXiv
3D Gaussian Splatting (3DGS) has exhibited remarkable efficacy in novel view synthesis (NVS). However, it suffers from a significant drawback: achieving high-fidelity rendering typically necessitates a large number of 3D Gaussians, resulting in substantial memory consumption and storage requirements. To address this challenge, we propose the first knowledge distillation framework for 3DGS, featuring various teacher models, including vanilla 3DGS, noise-augmented variants, and dropout-regularized versions. The outputs of these teachers are aggregated to guide the optimization of a lightweight student model. To distill the hidden geometric structure, we propose a structural similarity loss to boost the consistency of spatial geometric distributions between the student and teacher model. Through comprehensive quantitative and qualitative evaluations across diverse datasets, the proposed Distilled-3DGS, a simple yet effective framework without bells and whistles, achieves promising rendering results in both rendering quality and storage efficiency compared to state-of-the-art methods. Project page: https://distilled3dgs.github.io . Code: https://github.com/lt-xiang/Distilled-3DGS .
5. GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation
Authors: Ken Deng, Yunhan Yang, Jingxiang Sun, Xihui Liu, Yebin Liu, Ding Liang, Yan-Pei Cao β’
Published: 2025-08-19 β’
Source: arXiv
Modern 3D generation methods can rapidly create shapes from sparse or single views, but their outputs often lack geometric detail due to computational constraints. We present DetailGen3D, a generative approach specifically designed to enhance these generated 3D shapes. Our key insight is to model the coarse-to-fine transformation directly through data-dependent flows in latent space, avoiding the computational overhead of large-scale 3D generative models. We introduce a token matching strategy that ensures accurate spatial correspondence during refinement, enabling local detail synthesis while preserving global structure. By carefully designing our training data to match the characteristics of synthesized coarse shapes, our method can effectively enhance shapes produced by various 3D generation and reconstruction approaches, from single-view to sparse multi-view inputs. Extensive experiments demonstrate that DetailGen3D achieves high-fidelity geometric detail synthesis while maintaining efficiency in training.
6. InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
Authors: Shaoshu Yang, Zhe Kong, Feng Gao, Meng Cheng, Xiangyu Liu, Yong Zhang, Zhuoliang Kang, Wenhan Luo, Xunliang Cai, Ran He, Xiaoming Wei β’
Published: 2025-08-19 β’
Source: arXiv
Recent breakthroughs in video AIGC have ushered in a transformative era for audio-driven human animation. However, conventional video dubbing techniques remain constrained to mouth region editing, resulting in discordant facial expressions and body gestures that compromise viewer immersion. To overcome this limitation, we introduce sparse-frame video dubbing, a novel paradigm that strategically preserves reference keyframes to maintain identity, iconic gestures, and camera trajectories while enabling holistic, audio-synchronized full-body motion editing. Through critical analysis, we identify why naive image-to-video models fail in this task, particularly their inability to achieve adaptive conditioning. Addressing this, we propose InfiniteTalk, a streaming audio-driven generator designed for infinite-length long sequence dubbing. This architecture leverages temporal context frames for seamless inter-chunk transitions and incorporates a simple yet effective sampling strategy that optimizes control strength via fine-grained reference frame positioning. Comprehensive evaluations on HDTF, CelebV-HQ, and EMTD datasets demonstrate state-of-the-art performance. Quantitative metrics confirm superior visual realism, emotional coherence, and full-body motion synchronization.
7. The Promise of Large Language Models in Digital Health: Evidence from Sentiment Analysis in Online Health Communities
Authors: Xiancheng Li, Georgios D. Karampatakis, Helen E. Wood, Chris J. Griffiths, Borislava Mihaylova, Neil S. Coulson, Alessio Pasinato, Pietro Panzarasa, Marco Viviani, Anna De Simoni β’
Published: 2025-08-19 β’
Source: arXiv
Digital health analytics face critical challenges nowadays. The sophisticated analysis of patient-generated health content, which contains complex emotional and medical contexts, requires scarce domain expertise, while traditional ML approaches are constrained by data shortage and privacy limitations in healthcare settings. Online Health Communities (OHCs) exemplify these challenges with mixed-sentiment posts, clinical terminology, and implicit emotional expressions that demand specialised knowledge for accurate Sentiment Analysis (SA). To address these challenges, this study explores how Large Language Models (LLMs) can integrate expert knowledge through in-context learning for SA, providing a scalable solution for sophisticated health data analysis. Specifically, we develop a structured codebook that systematically encodes expert interpretation guidelines, enabling LLMs to apply domain-specific knowledge through targeted prompting rather than extensive training. Six GPT models validated alongside DeepSeek and LLaMA 3.1 are compared with pre-trained language models (BioBERT variants) and lexicon-based methods, using 400 expert-annotated posts from two OHCs. LLMs achieve superior performance while demonstrating expert-level agreement. This high agreement, with no statistically significant difference from inter-expert agreement levels, suggests knowledge integration beyond surface-level pattern recognition. The consistent performance across diverse LLM models, supported by in-context learning, offers a promising solution for digital health analytics. This approach addresses the critical challenge of expert knowledge shortage in digital health research, enabling real-time, expert-quality analysis for patient monitoring, intervention assessment, and evidence-based health strategies.
8. Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation
Authors: Dongyoon Hahm, Taywon Min, Woogyeol Jin, Kimin Lee β’
Published: 2025-08-19 β’
Source: arXiv
Beyond simple text generation, Large Language Models (LLMs) have evolved into agentic systems capable of planning and interacting with external tools to solve complex tasks. This evolution involves fine-tuning LLMs on agent-specific tasks to enhance their proficiency. However, safety concerns are frequently overlooked during this fine-tuning process. In this work, we show that aligned LLMs can become unintentionally misaligned, leading to a higher likelihood of executing harmful tasks and a reduced tendency to refuse them when fine-tuned to execute agentic tasks. To address these safety challenges, we propose Prefix INjection Guard (PING), a simple yet effective method that prepends automatically generated natural language prefixes to agent responses, guiding them to refuse harmful requests while preserving performance on benign tasks. Specifically, we introduce an iterative approach that alternates between (1) generating candidate prefixes and (2) selecting those that optimize both task performance and refusal behavior. Experimental results demonstrate that PING significantly enhances the safety of fine-tuned LLM agents without sacrificing their effectiveness. PING consistently outperforms existing prompting approaches across diverse benchmarks in both web navigation and code generation tasks. Our analysis of internal hidden states via linear probes reveals that prefix tokens are crucial for behavior modification, explaining the performance gains. WARNING: This paper contains contents that are unethical or offensive in nature.
9. Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
Authors: Xiao Liang, Zhongzhi Li, Yeyun Gong, Yelong Shen, Ying Nian Wu, Zhijiang Guo, Weizhu Chen β’
Published: 2025-08-19 β’
Source: arXiv
Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a key paradigm for post-training Large Language Models (LLMs), particularly for complex reasoning tasks. However, vanilla RLVR training has been shown to improve Pass@1 performance at the expense of policy entropy, leading to reduced generation diversity and limiting the Pass@k performance, which typically represents the upper bound of LLM reasoning capability. In this paper, we systematically analyze the policy's generation diversity from the perspective of training problems and find that augmenting and updating training problems helps mitigate entropy collapse during training. Based on these observations, we propose an online Self-play with Variational problem Synthesis (SvS) strategy for RLVR training, which uses the policy's correct solutions to synthesize variational problems while ensuring their reference answers remain identical to the originals. This self-improving strategy effectively maintains policy entropy during training and substantially improves Pass@k compared with standard RLVR, sustaining prolonged improvements and achieving absolute gains of 18.3% and 22.8% in Pass@32 performance on the competition-level AIME24 and AIME25 benchmarks. Experiments on 12 reasoning benchmarks across varying model sizes from 3B to 32B consistently demonstrate the generalizability and robustness of SvS.
10. Trust and Reputation in Data Sharing: A Survey
Authors: Wenbo Wu, George Konstantinidis β’
Published: 2025-08-19 β’
Source: arXiv
Data sharing is the fuel of the galloping artificial intelligence economy, providing diverse datasets for training robust models. Trust between data providers and data consumers is widely considered one of the most important factors for enabling data sharing initiatives. Concerns about data sensitivity, privacy breaches, and misuse contribute to reluctance in sharing data across various domains. In recent years, there has been a rise in technological and algorithmic solutions to measure, capture and manage trust, trustworthiness, and reputation in what we collectively refer to as Trust and Reputation Management Systems (TRMSs). Such approaches have been developed and applied to different domains of computer science, such as autonomous vehicles, or IoT networks, but there have not been dedicated approaches to data sharing and its unique characteristics. In this survey, we examine TRMSs from a data-sharing perspective, analyzing how they assess the trustworthiness of both data and entities across different environments. We develop novel taxonomies for system designs, trust evaluation framework, and evaluation metrics for both data and entity, and we systematically analyze the applicability of existing TRMSs in data sharing. Finally, we identify open challenges and propose future research directions to enhance the explainability, comprehensiveness, and accuracy of TRMSs in large-scale data-sharing ecosystems.
11. Learning from Preferences and Mixed Demonstrations in General Settings
Authors: Jason R Brown, Carl Henrik Ek, Robert D Mullins β’
Published: 2025-08-19 β’
Source: arXiv
Reinforcement learning is a general method for learning in sequential settings, but it can often be difficult to specify a good reward function when the task is complex. In these cases, preference feedback or expert demonstrations can be used instead. However, existing approaches utilising both together are often ad-hoc, rely on domain-specific properties, or won't scale. We develop a new framing for learning from human data, \emph{reward-rational partial orderings over observations}, designed to be flexible and scalable. Based on this we introduce a practical algorithm, LEOPARD: Learning Estimated Objectives from Preferences And Ranked Demonstrations. LEOPARD can learn from a broad range of data, including negative demonstrations, to efficiently learn reward functions across a wide range of domains. We find that when a limited amount of preference and demonstration feedback is available, LEOPARD outperforms existing baselines by a significant margin. Furthermore, we use LEOPARD to investigate learning from many types of feedback compared to just a single one, and find that combining feedback types is often beneficial.
12. UNICON: UNIfied CONtinual Learning for Medical Foundational Models
Authors: Mohammad Areeb Qazi, Munachiso S Nwadike, Ibrahim Almakky, Mohammad Yaqub, Numan Saeed β’
Published: 2025-08-19 β’
Source: arXiv
Foundational models are trained on extensive datasets to capture the general trends of a domain. However, in medical imaging, the scarcity of data makes pre-training for every domain, modality, or task challenging. Continual learning offers a solution by fine-tuning a model sequentially on different domains or tasks, enabling it to integrate new knowledge without requiring large datasets for each training phase. In this paper, we propose UNIfied CONtinual Learning for Medical Foundational Models (UNICON), a framework that enables the seamless adaptation of foundation models to diverse domains, tasks, and modalities. Unlike conventional adaptation methods that treat these changes in isolation, UNICON provides a unified, perpetually expandable framework. Through careful integration, we show that foundation models can dynamically expand across imaging modalities, anatomical regions, and clinical objectives without catastrophic forgetting or task interference. Empirically, we validate our approach by adapting a chest CT foundation model initially trained for classification to a prognosis and segmentation task. Our results show improved performance across both additional tasks. Furthermore, we continually incorporated PET scans and achieved a 5\% improvement in Dice score compared to respective baselines. These findings establish that foundation models are not inherently constrained to their initial training scope but can evolve, paving the way toward generalist AI models for medical imaging.
13. BLIPs: Bayesian Learned Interatomic Potentials
Authors: Dario Coscia, Pim de Haan, Max Welling β’
Published: 2025-08-19 β’
Source: arXiv
Machine Learning Interatomic Potentials (MLIPs) are becoming a central tool in simulation-based chemistry. However, like most deep learning models, MLIPs struggle to make accurate predictions on out-of-distribution data or when trained in a data-scarce regime, both common scenarios in simulation-based chemistry. Moreover, MLIPs do not provide uncertainty estimates by construction, which are fundamental to guide active learning pipelines and to ensure the accuracy of simulation results compared to quantum calculations. To address this shortcoming, we propose BLIPs: Bayesian Learned Interatomic Potentials. BLIP is a scalable, architecture-agnostic variational Bayesian framework for training or fine-tuning MLIPs, built on an adaptive version of Variational Dropout. BLIP delivers well-calibrated uncertainty estimates and minimal computational overhead for energy and forces prediction at inference time, while integrating seamlessly with (equivariant) message-passing architectures. Empirical results on simulation-based computational chemistry tasks demonstrate improved predictive accuracy with respect to standard MLIPs, and trustworthy uncertainty estimates, especially in data-scarse or heavy out-of-distribution regimes. Moreover, fine-tuning pretrained MLIPs with BLIP yields consistent performance gains and calibrated uncertainties.
14. Data Compression with Noise Suppression for Inference under Noisy Covariance
Authors: Sunao Sugiyama, Minsu Park β’
Published: 2025-08-19 β’
Source: arXiv
In many fields including cosmology, statistical inference often relies on Gaussian likelihoods whose covariance matrices are estimated from a finite number of simulations. This finite-sample estimation introduces noise into the covariance, which propagates to parameter estimates, a phenomenon known as the Dodelson-Schneider (DS) effect, leading to inflated uncertainties. While the Massively Optimized Parameter Estimation and Data compression (MOPED) algorithm offers lossless Fisher information-preserving compression, it does not mitigate the DS effect when the compression matrix itself is derived from noisy covariances. In this paper, we propose a modified compression scheme, powered MOPED ($p$-MOPED), which suppresses noise propagation by balancing information retention and covariance estimate noise reduction through a tunable power-law transformation of the sample correlation matrix. We test $p$-MOPED against standard and diagonal MOPED on toy models and on cosmological data from the Subaru Hyper Suprime-Cam Year 3 weak lensing survey. Our results demonstrate that $p$-MOPED consistently outperforms other approaches, especially in regimes with limited simulations, offering a robust compression strategy for high-dimensional data analyses under practical constraints.
15. A Biased Random Key Genetic Algorithm for Solving the Longest Run Subsequence Problem
Authors: Christian Blum, Pedro Pinacho-Davidson β’
Published: 2025-08-19 β’
Source: arXiv
The longest run subsequence (LRS) problem is an NP-hard combinatorial optimization problem belonging to the class of subsequence problems from bioinformatics. In particular, the problem plays a role in genome reassembly. In this paper, we present a solution to the LRS problem using a Biased Random Key Genetic Algorithm (BRKGA). Our approach places particular focus on the computational efficiency of evaluating individuals, which involves converting vectors of gray values into valid solutions to the problem. For comparison purposes, a Max-Min Ant System is developed and implemented. This is in addition to the application of the integer linear programming solver CPLEX for solving all considered problem instances. The computation results show that the proposed BRKGA is currently a state-of-the-art technique for the LRS problem. Nevertheless, the results also show that there is room for improvement, especially in the context of input strings based on large alphabet sizes.
16. Dark Energy Survey Year 3 Results: Cosmological constraints from second and third-order shear statistics
Authors: R. C. H. Gomes, S. Sugiyama, B. Jain, M. Jarvis, D. Anbajagane, A. Halder, G. A. Marques, S. Pandey, J. Marshall, A. Alarcon, A. Amon, K. Bechtol, M. Becker, G. Bernstein, A. Campos, R. Cawthon, C. Chang, R. Chen, A. Choi, J. Cordero, C. Davis, J. Derose, S. Dodelson, C. Doux, K. Eckert, F. Elsner, J. Elvin-Poole, S. Everett, A. FertΓ©, M. Gatti, G. Giannini, D. Gruen, I. Harrison, K. Herner, E. M. Huff, D. Huterer, N. Kuropatkin, P. F. Leget, N. Maccrann, J. Mccullough, J. Muir, J. Myles, A. Navarro Alsina, J. Prat, M. Raveri, R. P. Rollins, A. Roodman, A. J. Ross, E. S. Rykoff, C. SΓ‘nchez, L. F. Secco, E. Sheldon, T. Shin, M. Troxel, I. Tutusaus, T. N. Varga, B. Yanny, B. Yin, Y. Zhang, J. Zuntz, M. Aguena, F. Andrade-Oliveira, D. Bacon, J. Blazek, S. Bocquet, D. Brooks, A. Carnero Rosell, J. Carretero, M. Costanzi, L. da Costa, M. E. da Silva Pereira, T. M. Davis, J. De Vicente, H. T. Diehl, B. Flaugher, J. Frieman, G. Gutierrez, S. R. Hinton, D. L. Hollowood, K. Honscheid, D. J. James, N. Jeffrey, S. Lee, J. Mena-FernΓ‘ndez, R. Miquel, R. L. C. Ogando, A. A. Plazas MalagΓ³n, A. Porredon, E. Sanchez, D. Sanchez Cid, S. Samuroff, M. Smith, E. Suchyta, M. E. C. Swanson, D. Thomas, V. Vikram, J. Weller, M. Yamamoto β’
Published: 2025-08-19 β’
Source: arXiv
We present a cosmological analysis of the third-order aperture mass statistic using Dark Energy Survey Year 3 (DES Y3) data. We perform a complete tomographic measurement of the three-point correlation function of the Y3 weak lensing shape catalog with the four fiducial source redshift bins. Building upon our companion methodology paper, we apply a pipeline that combines the two-point function $\xi_{\pm}$ with the mass aperture skewness statistic $\langle M_{\rm ap}^3\rangle$, which is an efficient compression of the full shear three-point function. We use a suite of simulated shear maps to obtain a joint covariance matrix. By jointly analyzing $\xi_\pm$ and $\langle M_{\rm ap}^3\rangle$ measured from DES Y3 data with a $\Lambda$CDM model, we find $S_8=0.780\pm0.015$ and $\Omega_{\rm m}=0.266^{+0.039}_{-0.040}$, yielding 111% of figure-of-merit improvement in $\Omega_m$-$S_8$ plane relative to $\xi_{\pm}$ alone, consistent with expectations from simulated likelihood analyses. With a $w$CDM model, we find $S_8=0.749^{+0.027}_{-0.026}$ and $w_0=-1.39\pm 0.31$, which gives an improvement of $22\%$ on the joint $S_8$-$w_0$ constraint. Our results are consistent with $w_0=-1$. Our new constraints are compared to CMB data from the Planck satellite, and we find that with the inclusion of $\langle M_{\rm ap}^3\rangle$ the existing tension between the data sets is at the level of $2.3\sigma$. We show that the third-order statistic enables us to self-calibrate the mean photometric redshift uncertainty parameter of the highest redshift bin with little degradation in the figure of merit. Our results demonstrate the constraining power of higher-order lensing statistics and establish $\langle M_{\rm ap}^3\rangle$ as a practical observable for joint analyses in current and future surveys.
17. Cosmology from a joint analysis of second and third order shear statistics with Subaru Hyper Suprime-Cam Year 3 data
Authors: Sunao Sugiyama, Rafael C. H. Gomes, Bhuvnesh Jain β’
Published: 2025-08-19 β’
Source: arXiv
We present a joint cosmological analysis of the two-point correlation function and the aperture-mass skewness measured from the Year 3 data of the Hyper Suprime-Cam Subaru Strategic Program (HSC-Y3). The aperture-mass skewness is a compressed representation of three-point shear information, designed to capture non-Gaussian features while keeping the data vector computationally tractable. We find that including the aperture-mass skewness improves the $S_8$-$\Omega_m$ figure of merit by 80% compared to the 2PCF-only case, primarily due to the breaking of degeneracies. Our joint analysis yields a constraint of $S_8=0.736\pm0.020$, which is slightly lower than the two-point-only result and increases the tension with Planck 2018 to 3.2$\sigma$ in the $S_8$-$\Omega_m$ plane. The two- and three-point statistics are found to be internally consistent across redshift bins and angular scales, and we detect no significant intrinsic alignment signal. We also explore extensions to the $w$CDM model and find no evidence for deviations from a cosmological constant. This work demonstrates the feasibility and scientific value of incorporating third-order shear statistics into weak lensing cosmology and provides a practical pathway for similar analyses in future Stage-IV surveys such as LSST, Euclid, and Roman.
18. Analog computation with transcriptional networks
Authors: David Doty, Mina Latifi, David Soloveichick β’
Published: 2025-08-19 β’
Source: arXiv
Transcriptional networks represent one of the most extensively studied types of systems in synthetic biology. Although the completeness of transcriptional networks for digital logic is well-established, *analog* computation plays a crucial role in biological systems and offers significant potential for synthetic biology applications. While transcriptional circuits typically rely on cooperativity and highly non-linear behavior of transcription factors to regulate *production* of proteins, they are often modeled with simple linear *degradation* terms. In contrast, general analog dynamics require both non-linear positive as well as negative terms, seemingly necessitating control over not just transcriptional (i.e., production) regulation but also the degradation rates of transcription factors. Surprisingly, we prove that controlling transcription factor production (i.e., transcription rate) without explicitly controlling degradation is mathematically complete for analog computation, achieving equivalent capabilities to systems where both production and degradation are programmable. We demonstrate our approach on several examples including oscillatory and chaotic dynamics, analog sorting, memory, PID controller, and analog extremum seeking. Our result provides a systematic methodology for engineering novel analog dynamics using synthetic transcriptional networks without the added complexity of degradation control and informs our understanding of the capabilities of natural transcriptional circuits. We provide a compiler, in the form of a Python package that can take any system of polynomial ODEs and convert it to an equivalent transcriptional network implementing the system *exactly*, under appropriate conditions.
19. Online 3D Gaussian Splatting Modeling with Novel View Selection
Authors: Byeonggwon Lee, Junkyu Park, Khang Truong Giang, Soohwan Song β’
Published: 2025-08-19 β’
Source: arXiv
This study addresses the challenge of generating online 3D Gaussian Splatting (3DGS) models from RGB-only frames. Previous studies have employed dense SLAM techniques to estimate 3D scenes from keyframes for 3DGS model construction. However, these methods are limited by their reliance solely on keyframes, which are insufficient to capture an entire scene, resulting in incomplete reconstructions. Moreover, building a generalizable model requires incorporating frames from diverse viewpoints to achieve broader scene coverage. However, online processing restricts the use of many frames or extensive training iterations. Therefore, we propose a novel method for high-quality 3DGS modeling that improves model completeness through adaptive view selection. By analyzing reconstruction quality online, our approach selects optimal non-keyframes for additional training. By integrating both keyframes and selected non-keyframes, the method refines incomplete regions from diverse viewpoints, significantly enhancing completeness. We also present a framework that incorporates an online multi-view stereo approach, ensuring consistency in 3D information throughout the 3DGS modeling process. Experimental results demonstrate that our method outperforms state-of-the-art methods, delivering exceptional performance in complex outdoor scenes.
20. Efficient Knowledge Graph Unlearning with Zeroth-order Information
Authors: Yang Xiao, Ruimeng Ye, Bohan Liu, Xiaolong Ma, Bo Hui β’
Published: 2025-08-19 β’
Source: arXiv
Due to regulations like the Right to be Forgotten, there is growing demand for removing training data and its influence from models. Since full retraining is costly, various machine unlearning methods have been proposed. In this paper, we firstly present an efficient knowledge graph (KG) unlearning algorithm. We remark that KG unlearning is nontrivial due to the distinctive structure of KG and the semantic relations between entities. Also, unlearning by estimating the influence of removed components incurs significant computational overhead when applied to large-scale knowledge graphs. To this end, we define an influence function for KG unlearning and propose to approximate the model's sensitivity without expensive computation of first-order and second-order derivatives for parameter updates. Specifically, we use Taylor expansion to estimate the parameter changes caused by data removal. Given that the first-order gradients and second-order derivatives dominate the computational load, we use the Fisher matrices and zeroth-order optimization to approximate the inverse-Hessian vector product without constructing the computational graphs. Our experimental results demonstrate that the proposed method outperforms other state-of-the-art graph unlearning baselines significantly in terms of unlearning efficiency and unlearning quality. Our code is released at https://github.com/NKUShaw/ZOWFKGIF.
21. Brace for impact: ECDLP challenges for quantum cryptanalysis
Authors: Pierre-Luc Dallaire-Demers, William Doyle, Timothy Foo β’
Published: 2025-08-19 β’
Source: arXiv
Precise suites of benchmarks are required to assess the progress of early fault-tolerant quantum computers at economically impactful applications such as cryptanalysis. Appropriate challenges exist for factoring but those for elliptic curve cryptography are either too sparse or inadequate for standard applications of Shor's algorithm. We introduce a difficulty-graded suite of elliptic curve discrete logarithm (ECDLP) challenges that use Bitcoin's curve $y^{2}=x^{3}+7 \pmod p$ while incrementally lowering the prime field from 256 down to 6 bits. For each bit-length, we provide the prime, the base point and an example public key. All challenges are generated by a deterministic, reproducible procedure. We calibrate classical cost against Pollard's rho records and quantum cost against resource estimation results for Shor's algorithm. We compile Shor's ECDLP circuit to logical counts and map them to physical resources for various parameters of the surface code, the repetition cat code and the LDPC cat codes. Under explicit and testable assumptions on physical error rates, code distances, and non-Clifford supply, our scenarios place the full 256-bit instance within a 2027--2033 window. The challenge ladder thus offers a transparent ruler to track fault-tolerant progress on a cryptanalytic target of immediate relevance, and it motivates proactive migration of digital assets to post-quantum signatures.
22. Typed Topological Structures Of Datasets
Authors: Wanjun Hu β’
Published: 2025-08-19 β’
Source: arXiv
A datatset $X$ on $R^2$ is a finite topological space. Current research of a dataset focuses on statistical methods and the algebraic topological method \cite{carlsson}. In \cite{hu}, the concept of typed topological space was introduced and showed to have the potential for studying finite topological spaces, such as a dataset. It is a new method from the general topology perspective. A typed topological space is a topological space whose open sets are assigned types. Topological concepts and methods can be redefined using open sets of certain types. In this article, we develop a special set of types and its related typed topology on a dataset $X$. Using it, we can investigate the inner structure of $X$. In particular, $R^2$ has a natural quotient space, in which $X$ is organized into tracks, and each track is split into components. Those components are in a order. Further, they can be represented by an integer sequence. Components crossing tracks form branches, and the relationship can be well represented by a type of pseudotree (called typed-II pseudotree). Such structures provide a platform for new algorithms for problems such as calculating convex hull, holes, clustering and anomaly detection.
23. ResPlan: A Large-Scale Vector-Graph Dataset of 17,000 Residential Floor Plans
Authors: Mohamed Abouagour, Eleftherios Garyfallidis β’
Published: 2025-08-19 β’
Source: arXiv
We introduce ResPlan, a large-scale dataset of 17,000 detailed, structurally rich, and realistic residential floor plans, created to advance spatial AI research. Each plan includes precise annotations of architectural elements (walls, doors, windows, balconies) and functional spaces (such as kitchens, bedrooms, and bathrooms). ResPlan addresses key limitations of existing datasets such as RPLAN (Wu et al., 2019) and MSD (van Engelenburg et al., 2024) by offering enhanced visual fidelity and greater structural diversity, reflecting realistic and non-idealized residential layouts. Designed as a versatile, general-purpose resource, ResPlan supports a wide range of applications including robotics, reinforcement learning, generative AI, virtual and augmented reality, simulations, and game development. Plans are provided in both geometric and graph-based formats, enabling direct integration into simulation engines and fast 3D conversion. A key contribution is an open-source pipeline for geometry cleaning, alignment, and annotation refinement. Additionally, ResPlan includes structured representations of room connectivity, supporting graph-based spatial reasoning tasks. Finally, we present comparative analyses with existing benchmarks and outline several open benchmark tasks enabled by ResPlan. Ultimately, ResPlan offers a significant advance in scale, realism, and usability, providing a robust foundation for developing and benchmarking next-generation spatial intelligence systems.
24. ASDFormer: A Transformer with Mixtures of Pooling-Classifier Experts for Robust Autism Diagnosis and Biomarker Discovery
Authors: Mohammad Izadi, Mehran Safayani β’
Published: 2025-08-19 β’
Source: arXiv
Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition marked by disruptions in brain connectivity. Functional MRI (fMRI) offers a non-invasive window into large-scale neural dynamics by measuring blood-oxygen-level-dependent (BOLD) signals across the brain. These signals can be modeled as interactions among Regions of Interest (ROIs), which are grouped into functional communities based on their underlying roles in brain function. Emerging evidence suggests that connectivity patterns within and between these communities are particularly sensitive to ASD-related alterations. Effectively capturing these patterns and identifying interactions that deviate from typical development is essential for improving ASD diagnosis and enabling biomarker discovery. In this work, we introduce ASDFormer, a Transformer-based architecture that incorporates a Mixture of Pooling-Classifier Experts (MoE) to capture neural signatures associated with ASD. By integrating multiple specialized expert branches with attention mechanisms, ASDFormer adaptively emphasizes different brain regions and connectivity patterns relevant to autism. This enables both improved classification performance and more interpretable identification of disorder-related biomarkers. Applied to the ABIDE dataset, ASDFormer achieves state-of-the-art diagnostic accuracy and reveals robust insights into functional connectivity disruptions linked to ASD, highlighting its potential as a tool for biomarker discovery.
25. Chiral effective potential in $4D$, $\mathcal{N}=1$ supersymmetric gauge theories
Authors: I. L. Buchbinder, R. M. Iakhibbaev, A. I. Mukhaeva, D. I. Kazakov, D. M. Tolkachev β’
Published: 2025-08-19 β’
Source: arXiv
We calculate the chiral effective superpotential in $4D$ $\mathcal{N}=1$, $SU(N)$ super Yang-Mills theory coupled to chiral matter in one- and two-loop approximations. It is found that the one-loop contribution to the chiral effective potential is always finite and is expressed in terms of a specific triangle integral. The two-loop contributions generated by purely chiral vertices turned out to be finite as well. The chiral effective potential stipulated by supergraphs with gauge superfield subgraphs is finite for the supergraphs with no divergent subgraphs. In the case of the finite $\mathcal{N}=2$ SYM theory, the two-loop chiral contributions to the effective action are significanlty simplified. The leading large $N$ behavior of the chiral effective superpotential in finite $\mathcal{N}=2$ super-Yang-Mills models with $SU(N)$ gauge symmetry is studied and it is shown that the exact form in the coupling constant of the chiral effective superpotential can be found.
26. Electrochemical response of biological membranes to localized currents and external electric fields
Authors: Joshua B. Fernandes, Hyeongjoo Row, Kranthi K. Mandadapu, Karthik Shekhar β’
Published: 2025-08-19 β’
Source: arXiv
Electrochemical phenomena in biology often unfold in confined geometries where micrometer- to millimeter-scale domains coexist with nanometer-scale interfacial diffuse charge layers. We analyze a model lipid membrane-electrolyte system where an ion channel-like current flows across the membrane while parallel electrodes simultaneously apply a step voltage, emulating an extrinsic electric field. Matched asymptotic expansions of the Poisson-Nernst-Planck equations show that, under physiological conditions, the diffuse charge layers rapidly reach a quasi-steady state, and the bulk electrolyte remains electroneutral. As a result, all free charge is confined to the nanometer-scale screening layers at the membrane and electrode interfaces. The bulk electric potential satisfies Laplace's equation, and is dynamically coupled to the interfacial layers through time-dependent boundary conditions. This multiscale coupling partitions the space-time response into distinct regimes. At sufficiently long times, we show that the system can be represented by an equivalent circuit analogous to those used in classical cable theory. We derive closed-form expressions of the transmembrane potential within each regime, and verify them against nonlinear numerical simulations. Our results show how electrode-induced screening and confinement effects influence the electrochemical response over multiple length and time scales in biological systems.
27. Formal Algorithms for Model Efficiency
Authors: Naman Tyagi, Srishti Das, Kunal, Vatsal Gupta β’
Published: 2025-08-19 β’
Source: arXiv
We introduce the Knob-Meter-Rule (KMR) framework, a unified formalism for representing and reasoning about model efficiency techniques in deep learning. By abstracting diverse methods, including pruning, quantization, knowledge distillation, and parameter-efficient architectures, into a consistent set of controllable knobs, deterministic rules, and measurable meters, KMR provides a mathematically precise and modular perspective on efficiency optimization. The framework enables systematic composition of multiple techniques, flexible policy-driven application, and iterative budgeted optimization through the Budgeted-KMR algorithm. We demonstrate how well-known efficiency methods can be instantiated as KMR triples and present concise algorithmic templates for each. The framework highlights underlying relationships between methods, facilitates hybrid pipelines, and lays the foundation for future research in automated policy learning, dynamic adaptation, and theoretical analysis of cost-quality trade-offs. Overall, KMR offers both a conceptual and practical tool for unifying and advancing model efficiency research.
28. Multiwavelength Observations of the Apparently Non-repeating FRB 20250316A
Authors: Ye Li, Hui Sun, Lei Qian, Dong-Yue Li, Yan-Long Hua, Li-Ping Xin, Cheng-Kui Li, Yi-Han Wang, Jia-Rui Niu, Tian-Rui Sun, Zhu-Heng Yao, Jin-Jun Geng, Chi-Chuan Jin, Nanda Rea, Yuan Liu, Zhi-Chen Pan, Tao An, Vadim Burwitz, Zhi-Ming Cai, Jin-Huang Cao, Yong Chen, Hua-Qing Cheng, Wei-Wei Cui, Hua Feng, Peter Friedrich, Da-Wei Han, Jing-Wei Hu, Lei Hu, Yu-Xiang Huang, Shu-Mei Jia, Ji-An Jiang, Bin Li, Feng Li, Ming Liang, Yi-Fang Liang, Hao Liu, He-Yang Liu, Hua-Qiu Liu, Norbert Meidinger, Hai-Wu Pan, Arne Rau, Xin-Wen Shu, Chun Sun, Lian Tao, Jin-Long Tang, Zhen Wan, Hai-Ren Wang, Jian Wang, Jing Wang, Yun-Fei Xu, Yong-Quan Xue, Xuan Yang, Da-Zhi Yao, Yu-Han Yao, Wen Zhao, Xiao-Fan Zhao, Hong-Fei Zhang, Jia-Heng Zhang, Juan Zhang, Mo Zhang, Song-Bo Zhang, Wen-Da Zhang, Xiao-Ling Zhang, Yong-He Zhang, Yong-Kun Zhang, Xian-Zhong Zheng, Yu-Hao Zhu, Ying-Xi Zuo, Sheng-Li Sun, Jian-Yan Wei, Wei-Wei Zhu, Peng Jiang, Weimin Yuan, Xue-Feng Wu, Bing Zhang β’
Published: 2025-08-19 β’
Source: arXiv
The physical origin of fast radio bursts (FRBs) remains uncertain. Although multiwavelength observations offer critical diagnostics and have been widely conducted, only Galactic FRB~20200428D is associated with an X-ray burst from the magnetar SGR J1935+2154. Here, we present multiwavelength follow-up observations of the nearby bright FRB~20250316A, including the Five-hundred-meter Aperture Spherical radio Telescope (FAST), Einstein Probe (EP) X-ray mission, Chandra X-ray Observatory, Wide Field Survey Telescope (WFST) and Space Variable Object Monitor/Visible Telescope (SVOM/VT). A 13.08-hour FAST follow-up observational campaign suggests that this burst is likely a one-off event. A prompt EP follow-up and multi-epoch observational campaign totaling $>$ 100 ks led to the detection of an X-ray source within the angular resolution of its Follow-up X-ray Telescope (FXT, $10^{\prime\prime}$). A subsequent Chandra observation revealed this source to be offset by $7^{\prime\prime}$ from the FRB position, and established a 0.5-10 keV flux upper limit of $7.6\times 10^{-15}$ $\rm erg\,cm^{-2}\,s^{-1}$ at the FRB position, corresponding to $\sim 10^{39}$ $\rm erg\,s^{-1}$ at the 40 Mpc distance of the host galaxy NGC~4141. These results set one of the most stringent limits on X-ray emission from a non-repeating FRB, disfavoring ultra-luminous X-ray sources (ULXs) as counterparts of apparently one-off FRBs and offering critical insights into afterglow models. Our study suggests that an arcsecond localization of both the FRB and its potential X-ray counterpart is essential for exploring the X-ray counterpart of an FRB.
29. Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation
Authors: Yifu Yuan, Haiqin Cui, Yaoting Huang, Yibin Chen, Fei Ni, Zibin Dong, Pengyi Li, Yan Zheng, Jianye Hao β’
Published: 2025-08-19 β’
Source: arXiv
Generalization in embodied AI is hindered by the "seeing-to-doing gap," which stems from data scarcity and embodiment heterogeneity. To address this, we pioneer "pointing" as a unified, embodiment-agnostic intermediate representation, defining four core embodied pointing abilities that bridge high-level vision-language comprehension with low-level action primitives. We introduce Embodied-R1, a 3B Vision-Language Model (VLM) specifically designed for embodied reasoning and pointing. We use a wide range of embodied and general visual reasoning datasets as sources to construct a large-scale dataset, Embodied-Points-200K, which supports key embodied pointing capabilities. We then train Embodied-R1 using a two-stage Reinforced Fine-tuning (RFT) curriculum with a specialized multi-task reward design. Embodied-R1 achieves state-of-the-art performance on 11 embodied spatial and pointing benchmarks. Critically, it demonstrates robust zero-shot generalization by achieving a 56.2% success rate in the SIMPLEREnv and 87.5% across 8 real-world XArm tasks without any task-specific fine-tuning, representing a 62% improvement over strong baselines. Furthermore, the model exhibits high robustness against diverse visual disturbances. Our work shows that a pointing-centric representation, combined with an RFT training paradigm, offers an effective and generalizable pathway to closing the perception-action gap in robotics.
30. EUV Late Phase Flares as Observed by EVE and AIA Onboard the Solar Dynamics Observatory
Authors: Sascha Ornig, Astrid M. Veronig, Karin Dissauer β’
Published: 2025-08-19 β’
Source: arXiv
Context. EUV late phase (ELP) flares exhibit a second peak in warm coronal emissions minutes to hours after the main peak of the flare. This phase is all but negligible, yet it is still poorly understood what role it plays across the solar cycle and what governs it. Aims. We present a statistical analysis of ELP flares over four years between May 2010 and May 2014 based on properties like eruptivity, magnetic configuration, and late-phase duration, delay and strength in order to understand what influences the likelihood of this class of flares and their behavior on a general scale. Methods. We primarily make use of data from the Solar Dynamics Observatory's (SDO) Extreme-ultraviolet Variability Experiment (EVE), as well as complimentary spatial information provided by the Atmospheric Imaging Assembly (AIA), to assess relationships between the various parameters and see if ELP flares differ from the general flare population. We quantify the criteria for ELP flare definition and determine its characteristics. Results. Our analysis shows that about 10% of all flares with a GOES class greater than or equal to C3.0 experience an EUV late phase (179 out of 1803). This percentage decreases from solar minimum to solar maximum. C-class flares are considerably less likely to be identified as ELP flares than their higher-energetic counterparts, which is in line with previous investigations. The majority of this type of flares is confined (67%), more so than in the general flare population (greater than or equal to C5.0). There appears to be a (linear) relationship between the late-phase delay and its duration. The ratio of the emission peak of the late and main flare phase lies between 0.3 and 5.9, and exceeds 1 in 71.5% of cases, which is considerably higher than previously reported.
31. Self-Supervised Sparse Sensor Fusion for Long Range Perception
Authors: Edoardo Palladin, Samuel Brucker, Filippo Ghilotti, Praveen Narayanan, Mario Bijelic, Felix Heide β’
Published: 2025-08-19 β’
Source: arXiv
Outside of urban hubs, autonomous cars and trucks have to master driving on intercity highways. Safe, long-distance highway travel at speeds exceeding 100 km/h demands perception distances of at least 250 m, which is about five times the 50-100m typically addressed in city driving, to allow sufficient planning and braking margins. Increasing the perception ranges also allows to extend autonomy from light two-ton passenger vehicles to large-scale forty-ton trucks, which need a longer planning horizon due to their high inertia. However, most existing perception approaches focus on shorter ranges and rely on Bird's Eye View (BEV) representations, which incur quadratic increases in memory and compute costs as distance grows. To overcome this limitation, we built on top of a sparse representation and introduced an efficient 3D encoding of multi-modal and temporal features, along with a novel self-supervised pre-training scheme that enables large-scale learning from unlabeled camera-LiDAR data. Our approach extends perception distances to 250 meters and achieves an 26.6% improvement in mAP in object detection and a decrease of 30.5% in Chamfer Distance in LiDAR forecasting compared to existing methods, reaching distances up to 250 meters. Project Page: https://light.princeton.edu/lrs4fusion/
32. Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization
Authors: Shaohua Duan, Xinze Li, Zhenghao Liu, Xiaoyuan Yi, Yukun Yan, Shuo Wang, Yu Gu, Ge Yu, Maosong Sun β’
Published: 2025-08-19 β’
Source: arXiv
Long-context modeling is critical for a wide range of real-world tasks, including long-context question answering, summarization, and complex reasoning tasks. Recent studies have explored fine-tuning Large Language Models (LLMs) with synthetic data to enhance their long-context capabilities. However, the effectiveness of such approaches is often limited by the low diversity and factual inconsistencies in the generated data. To address these challenges, we propose LongMab-PO, a novel framework that leverages a Multi-Armed Bandit (MAB) rollout strategy to identify the most informative chunks from the given long context for sampling high-quality and diverse responses and constructing preference data pairs for Direct Preference Optimization (DPO) training. Specifically, we treat context chunks as arms of MAB, select chunks based on their expected reward scores to input into LLMs to generate responses, and iteratively update these scores based on reward feedback. This exploration and exploitation process enables the model to focus on the most relevant context segments, thereby generating and collecting high-quality and diverse responses. Finally, we collect these generated responses from the rollout process and apply the DPO method to further optimize the LLM. Experimental results show that LongMab-PO significantly improves the diversity and quality of preference data pairs, achieving state-of-the-art performance on long-context reasoning benchmarks. All code and data will be released on https://github.com/NEUIR/LongMab-PO.
33. MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
Authors: Sonal Kumar, Ε imon SedlΓ‘Δek, Vaibhavi Lokegaonkar, Fernando LΓ³pez, Wenyi Yu, Nishit Anand, Hyeonggon Ryu, Lichang Chen, Maxim PliΔka, Miroslav HlavΓ‘Δek, William Fineas Ellingwood, Sathvik Udupa, Siyuan Hou, Allison Ferner, Sara Barahona, Cecilia BolaΓ±os, Satish Rahi, Laura Herrera-AlarcΓ³n, Satvik Dixit, Siddhi Patil, Soham Deshmukh, Lasha Koroshinadze, Yao Liu, Leibny Paola Garcia Perera, Eleni Zanou, Themos Stafylakis, Joon Son Chung, David Harwath, Chao Zhang, Dinesh Manocha, Alicia Lozano-Diez, Santosh Kesiraju, Sreyan Ghosh, Ramani Duraiswami β’
Published: 2025-08-19 β’
Source: arXiv
Audio comprehension-including speech, non-speech sounds, and music-is essential for achieving human-level intelligence. Consequently, AI agents must demonstrate holistic audio understanding to qualify as generally intelligent. However, evaluating auditory intelligence comprehensively remains challenging. To address this gap, we introduce MMAU-Pro, the most comprehensive and rigorously curated benchmark for assessing audio intelligence in AI systems. MMAU-Pro contains 5,305 instances, where each instance has one or more audios paired with human expert-generated question-answer pairs, spanning speech, sound, music, and their combinations. Unlike existing benchmarks, MMAU-Pro evaluates auditory intelligence across 49 unique skills and multiple complex dimensions, including long-form audio comprehension, spatial audio reasoning, multi-audio understanding, among others. All questions are meticulously designed to require deliberate multi-hop reasoning, including both multiple-choice and open-ended response formats. Importantly, audio data is sourced directly ``from the wild" rather than from existing datasets with known distributions. We evaluate 22 leading open-source and proprietary multimodal AI models, revealing significant limitations: even state-of-the-art models such as Gemini 2.5 Flash and Audio Flamingo 3 achieve only 59.2% and 51.7% accuracy, respectively, approaching random performance in multiple categories. Our extensive analysis highlights specific shortcomings and provides novel insights, offering actionable perspectives for the community to enhance future AI systems' progression toward audio general intelligence. The benchmark and code is available at https://sonalkum.github.io/mmau-pro.
34. Uncertainty-Aware PCA for Arbitrarily Distributed Data Modeled by Gaussian Mixture Models
Authors: Daniel KlΓΆtzl, Ozan Tastekin, David HΓ€gele, Marina Evers, Daniel Weiskopf β’
Published: 2025-08-19 β’
Source: arXiv
Multidimensional data is often associated with uncertainties that are not well-described by normal distributions. In this work, we describe how such distributions can be projected to a low-dimensional space using uncertainty-aware principal component analysis (UAPCA). We propose to model multidimensional distributions using Gaussian mixture models (GMMs) and derive the projection from a general formulation that allows projecting arbitrary probability density functions. The low-dimensional projections of the densities exhibit more details about the distributions and represent them more faithfully compared to UAPCA mappings. Further, we support including user-defined weights between the different distributions, which allows for varying the importance of the multidimensional distributions. We evaluate our approach by comparing the distributions in low-dimensional space obtained by our method and UAPCA to those obtained by sample-based projections.
35. A Novel Attention-Augmented Wavelet YOLO System for Real-time Brain Vessel Segmentation on Transcranial Color-coded Doppler
Authors: Wenxuan Zhang, Shuai Li, Xinyi Wang, Yu Sun, Hongyu Kang, Pui Yuk Chryste Wan, Yong-Ping Zheng, Sai-Kit Lam β’
Published: 2025-08-19 β’
Source: arXiv
The Circle of Willis (CoW), vital for ensuring consistent blood flow to the brain, is closely linked to ischemic stroke. Accurate assessment of the CoW is important for identifying individuals at risk and guiding appropriate clinical management. Among existing imaging methods, Transcranial Color-coded Doppler (TCCD) offers unique advantages due to its radiation-free nature, affordability, and accessibility. However, reliable TCCD assessments depend heavily on operator expertise for identifying anatomical landmarks and performing accurate angle correction, which limits its widespread adoption. To address this challenge, we propose an AI-powered, real-time CoW auto-segmentation system capable of efficiently capturing cerebral arteries. No prior studies have explored AI-driven cerebrovascular segmentation using TCCD. In this work, we introduce a novel Attention-Augmented Wavelet YOLO (AAW-YOLO) network tailored for TCCD data, designed to provide real-time guidance for brain vessel segmentation in the CoW. We prospectively collected TCCD data comprising 738 annotated frames and 3,419 labeled artery instances to establish a high-quality dataset for model training and evaluation. The proposed AAW-YOLO demonstrated strong performance in segmenting both ipsilateral and contralateral CoW vessels, achieving an average Dice score of 0.901, IoU of 0.823, precision of 0.882, recall of 0.926, and mAP of 0.953, with a per-frame inference speed of 14.199 ms. This system offers a practical solution to reduce reliance on operator experience in TCCD-based cerebrovascular screening, with potential applications in routine clinical workflows and resource-constrained settings. Future research will explore bilateral modeling and larger-scale validation.