1. Scaling Group Inference for Diverse and High-Quality Generation
Authors: Gaurav Parmar, Or Patashnik, Daniil Ostashev, Kuan-Chieh Wang, Kfir Aberman, Srinivasa Narasimhan, Jun-Yan Zhu β’
Published: 2025-08-21 β’
Source: arXiv
Generative models typically sample outputs independently, and recent inference-time guidance and scaling algorithms focus on improving the quality of individual samples. However, in real-world applications, users are often presented with a set of multiple images (e.g., 4-8) for each prompt, where independent sampling tends to lead to redundant results, limiting user choices and hindering idea exploration. In this work, we introduce a scalable group inference method that improves both the diversity and quality of a group of samples. We formulate group inference as a quadratic integer assignment problem: candidate outputs are modeled as graph nodes, and a subset is selected to optimize sample quality (unary term) while maximizing group diversity (binary term). To substantially improve runtime efficiency, we progressively prune the candidate set using intermediate predictions, allowing our method to scale up to large candidate sets. Extensive experiments show that our method significantly improves group diversity and quality compared to independent sampling baselines and recent inference algorithms. Our framework generalizes across a wide range of tasks, including text-to-image, image-to-image, image prompting, and video generation, enabling generative models to treat multiple outputs as cohesive groups rather than independent samples.
2. CineScale: Free Lunch in High-Resolution Cinematic Visual Generation
Authors: Haonan Qiu, Ning Yu, Ziqi Huang, Paul Debevec, Ziwei Liu β’
Published: 2025-08-21 β’
Source: arXiv
Visual diffusion models achieve remarkable progress, yet they are typically trained at limited resolutions due to the lack of high-resolution data and constrained computation resources, hampering their ability to generate high-fidelity images or videos at higher resolutions. Recent efforts have explored tuning-free strategies to exhibit the untapped potential higher-resolution visual generation of pre-trained models. However, these methods are still prone to producing low-quality visual content with repetitive patterns. The key obstacle lies in the inevitable increase in high-frequency information when the model generates visual content exceeding its training resolution, leading to undesirable repetitive patterns deriving from the accumulated errors. In this work, we propose CineScale, a novel inference paradigm to enable higher-resolution visual generation. To tackle the various issues introduced by the two types of video generation architectures, we propose dedicated variants tailored to each. Unlike existing baseline methods that are confined to high-resolution T2I and T2V generation, CineScale broadens the scope by enabling high-resolution I2V and V2V synthesis, built atop state-of-the-art open-source video generation frameworks. Extensive experiments validate the superiority of our paradigm in extending the capabilities of higher-resolution visual generation for both image and video models. Remarkably, our approach enables 8k image generation without any fine-tuning, and achieves 4k video generation with only minimal LoRA fine-tuning. Generated video samples are available at our website: https://eyeline-labs.github.io/CineScale/.
3. Visual Autoregressive Modeling for Instruction-Guided Image Editing
Authors: Qingyang Mao, Qi Cai, Yehao Li, Yingwei Pan, Mingyue Cheng, Ting Yao, Qi Liu, Tao Mei β’
Published: 2025-08-21 β’
Source: arXiv
Recent advances in diffusion models have brought remarkable visual fidelity to instruction-guided image editing. However, their global denoising process inherently entangles the edited region with the entire image context, leading to unintended spurious modifications and compromised adherence to editing instructions. In contrast, autoregressive models offer a distinct paradigm by formulating image synthesis as a sequential process over discrete visual tokens. Their causal and compositional mechanism naturally circumvents the adherence challenges of diffusion-based methods. In this paper, we present VAREdit, a visual autoregressive (VAR) framework that reframes image editing as a next-scale prediction problem. Conditioned on source image features and text instructions, VAREdit generates multi-scale target features to achieve precise edits. A core challenge in this paradigm is how to effectively condition the source image tokens. We observe that finest-scale source features cannot effectively guide the prediction of coarser target features. To bridge this gap, we introduce a Scale-Aligned Reference (SAR) module, which injects scale-matched conditioning information into the first self-attention layer. VAREdit demonstrates significant advancements in both editing adherence and efficiency. On standard benchmarks, it outperforms leading diffusion-based methods by 30\%+ higher GPT-Balance score. Moreover, it completes a $512\times512$ editing in 1.2 seconds, making it 2.2$\times$ faster than the similarly sized UltraEdit. The models are available at https://github.com/HiDream-ai/VAREdit.
4. SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
Authors: Yanxu Meng, Haoning Wu, Ya Zhang, Weidi Xie β’
Published: 2025-08-21 β’
Source: arXiv
3D content generation has recently attracted significant research interest due to its applications in VR/AR and embodied AI. In this work, we address the challenging task of synthesizing multiple 3D assets within a single scene image. Concretely, our contributions are fourfold: (i) we present SceneGen, a novel framework that takes a scene image and corresponding object masks as input, simultaneously producing multiple 3D assets with geometry and texture. Notably, SceneGen operates with no need for optimization or asset retrieval; (ii) we introduce a novel feature aggregation module that integrates local and global scene information from visual and geometric encoders within the feature extraction module. Coupled with a position head, this enables the generation of 3D assets and their relative spatial positions in a single feedforward pass; (iii) we demonstrate SceneGen's direct extensibility to multi-image input scenarios. Despite being trained solely on single-image inputs, our architectural design enables improved generation performance with multi-image inputs; and (iv) extensive quantitative and qualitative evaluations confirm the efficiency and robust generation abilities of our approach. We believe this paradigm offers a novel solution for high-quality 3D content generation, potentially advancing its practical applications in downstream tasks. The code and model will be publicly available at: https://mengmouxu.github.io/SceneGen.
5. ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling
Authors: Jinhyung Park, Javier Romero, Shunsuke Saito, Fabian Prada, Takaaki Shiratori, Yichen Xu, Federica Bogo, Shoou-I Yu, Kris Kitani, Rawal Khirodkar β’
Published: 2025-08-21 β’
Source: arXiv
Parametric body models offer expressive 3D representation of humans across a wide range of poses, shapes, and facial expressions, typically derived by learning a basis over registered 3D meshes. However, existing human mesh modeling approaches struggle to capture detailed variations across diverse body poses and shapes, largely due to limited training data diversity and restrictive modeling assumptions. Moreover, the common paradigm first optimizes the external body surface using a linear basis, then regresses internal skeletal joints from surface vertices. This approach introduces problematic dependencies between internal skeleton and outer soft tissue, limiting direct control over body height and bone lengths. To address these issues, we present ATLAS, a high-fidelity body model learned from 600k high-resolution scans captured using 240 synchronized cameras. Unlike previous methods, we explicitly decouple the shape and skeleton bases by grounding our mesh representation in the human skeleton. This decoupling enables enhanced shape expressivity, fine-grained customization of body attributes, and keypoint fitting independent of external soft-tissue characteristics. ATLAS outperforms existing methods by fitting unseen subjects in diverse poses more accurately, and quantitative evaluations show that our non-linear pose correctives more effectively capture complex poses compared to linear models.
6. Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO
Authors: Jaeha Lee, Gio Huh, Ning Su, Tony Yue YU β’
Published: 2025-08-21 β’
Source: arXiv
Recent efforts have extended the capabilities of transformers in logical reasoning and symbolic computations. In this work, we investigate their capacity for non-linear latent pattern discovery in the context of functional decomposition, focusing on the challenging algebraic task of multivariate polynomial decomposition. This problem, with widespread applications in science and engineering, is proved to be NP-hard, and demands both precision and insight. Our contributions are threefold: First, we develop a synthetic data generation pipeline providing fine-grained control over problem complexity. Second, we train transformer models via supervised learning and evaluate them across four key dimensions involving scaling behavior and generalizability. Third, we propose Beam Grouped Relative Policy Optimization (BGRPO), a rank-aware reinforcement learning method suitable for hard algebraic problems. Finetuning with BGRPO improves accuracy while reducing beam width by up to half, resulting in approximately 75% lower inference compute. Additionally, our model demonstrates competitive performance in polynomial simplification, outperforming Mathematica in various cases.
7. Distributed Detection of Adversarial Attacks in Multi-Agent Reinforcement Learning with Continuous Action Space
Authors: Kiarash Kazari, Ezzeldin Shereen, GyΓΆrgy DΓ‘n β’
Published: 2025-08-21 β’
Source: arXiv
We address the problem of detecting adversarial attacks against cooperative multi-agent reinforcement learning with continuous action space. We propose a decentralized detector that relies solely on the local observations of the agents and makes use of a statistical characterization of the normal behavior of observable agents. The proposed detector utilizes deep neural networks to approximate the normal behavior of agents as parametric multivariate Gaussian distributions. Based on the predicted density functions, we define a normality score and provide a characterization of its mean and variance. This characterization allows us to employ a two-sided CUSUM procedure for detecting deviations of the normality score from its mean, serving as a detector of anomalous behavior in real-time. We evaluate our scheme on various multi-agent PettingZoo benchmarks against different state-of-the-art attack methods, and our results demonstrate the effectiveness of our method in detecting impactful adversarial attacks. Particularly, it outperforms the discrete counterpart by achieving AUC-ROC scores of over 0.95 against the most impactful attacks in all evaluated environments.
8. Intern-S1: A Scientific Multimodal Foundation Model
Authors: Lei Bai, Zhongrui Cai, Maosong Cao, Weihan Cao, Chiyu Chen, Haojiong Chen, Kai Chen, Pengcheng Chen, Ying Chen, Yongkang Chen, Yu Cheng, Yu Cheng, Pei Chu, Tao Chu, Erfei Cui, Ganqu Cui, Long Cui, Ziyun Cui, Nianchen Deng, Ning Ding, Nanqin Dong, Peijie Dong, Shihan Dou, Sinan Du, Haodong Duan, Caihua Fan, Ben Gao, Changjiang Gao, Jianfei Gao, Songyang Gao, Yang Gao, Zhangwei Gao, Jiaye Ge, Qiming Ge, Lixin Gu, Yuzhe Gu, Aijia Guo, Qipeng Guo, Xu Guo, Conghui He, Junjun He, Yili Hong, Siyuan Hou, Caiyu Hu, Hanglei Hu, Jucheng Hu, Ming Hu, Zhouqi Hua, Haian Huang, Junhao Huang, Xu Huang, Zixian Huang, Zhe Jiang, Lingkai Kong, Linyang Li, Peiji Li, Pengze Li, Shuaibin Li, Tianbin Li, Wei Li, Yuqiang Li, Dahua Lin, Junyao Lin, Tianyi Lin, Zhishan Lin, Hongwei Liu, Jiangning Liu, Jiyao Liu, Junnan Liu, Kai Liu, Kaiwen Liu, Kuikun Liu, Shichun Liu, Shudong Liu, Wei Liu, Xinyao Liu, Yuhong Liu, Zhan Liu, Yinquan Lu, Haijun Lv, Hongxia Lv, Huijie Lv, Qidang Lv, Ying Lv, Chengqi Lyu, Chenglong Ma, Jianpeng Ma, Ren Ma, Runmin Ma, Runyuan Ma, Xinzhu Ma, Yichuan Ma, Zihan Ma, Sixuan Mi, Junzhi Ning, Wenchang Ning, Xinle Pang, Jiahui Peng, Runyu Peng, Yu Qiao, Jiantao Qiu, Xiaoye Qu, Yuan Qu, Yuchen Ren, Fukai Shang, Wenqi Shao, Junhao Shen, Shuaike Shen, Chunfeng Song, Demin Song, Diping Song, Chenlin Su, Weijie Su, Weigao Sun, Yu Sun, Qian Tan, Cheng Tang, Huanze Tang, Kexian Tang, Shixiang Tang, Jian Tong, Aoran Wang, Bin Wang, Dong Wang, Lintao Wang, Rui Wang, Weiyun Wang, Wenhai Wang, Yi Wang, Ziyi Wang, Ling-I Wu, Wen Wu, Yue Wu, Zijian Wu, Linchen Xiao, Shuhao Xing, Chao Xu, Huihui Xu, Jun Xu, Ruiliang Xu, Wanghan Xu, GanLin Yang, Yuming Yang, Haochen Ye, Jin Ye, Shenglong Ye, Jia Yu, Jiashuo Yu, Jing Yu, Fei Yuan, Bo Zhang, Chao Zhang, Chen Zhang, Hongjie Zhang, Jin Zhang, Qiaosheng Zhang, Qiuyinzhe Zhang, Songyang Zhang, Taolin Zhang, Wenlong Zhang, Wenwei Zhang, Yechen Zhang, Ziyang Zhang, Haiteng Zhao, Qian Zhao, Xiangyu Zhao, Xiangyu Zhao, Bowen Zhou, Dongzhan Zhou, Peiheng Zhou, Yuhao Zhou, Yunhua Zhou, Dongsheng Zhu, Lin Zhu, Yicheng Zou β’
Published: 2025-08-21 β’
Source: arXiv
In recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared to those in popular areas, far from sufficient for transforming scientific research and leaving substantial gap between open-source models and closed-source models in these scientific domains. To mitigate this gap and explore a step further toward Artificial General Intelligence (AGI), we introduce Intern-S1, a specialized generalist equipped with general understanding and reasoning capabilities with expertise to analyze multiple science modal data. Intern-S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T tokens, including over 2.5T tokens from scientific domains. In the post-training stage, Intern-S1 undergoes offline and then online reinforcement learning (RL) in InternBootCamp, where we propose Mixture-of-Rewards (MoR) to synergize the RL training on more than 1000 tasks simultaneously. Through integrated innovations in algorithms, data, and training systems, Intern-S1 achieved top-tier performance in online RL training.On comprehensive evaluation benchmarks, Intern-S1 demonstrates competitive performance on general reasoning tasks among open-source models and significantly outperforms open-source models in scientific domains, surpassing closed-source state-of-the-art models in professional tasks, such as molecular synthesis planning, reaction condition prediction, predicting thermodynamic stabilities for crystals. Our models are available at https://huggingface.co/internlm/Intern-S1.
9. Bayesian Hierarchical Methods for Surveillance of Cervical Dystonia Treatments
Authors: D. Baidoo, E. Kubuafor, S. F. Osarfo, F. A. Agyei-Owusu, J. A. Frimpong, R. Amevor, A. Duah, F. Aboagye β’
Published: 2025-08-21 β’
Source: arXiv
Cervical dystonia, a debilitating neurological disorder marked by involuntary muscle contractions and chronic pain, presents significant treatment challenges despite advances in botulinum toxin therapy. While botulinum toxin type B has emerged as one of the leading treatments, comparative efficacy across doses and the influence of demographic factors for personalized medicine remain understudied. This study aimed to: (1) compare the efficacy of different botulinum toxin type B doses using Bayesian methods, (2) evaluate demographic and clinical factors affecting treatment response, and (3) establish a probabilistic framework for personalized cervical dystonia management. We analyzed data from a multicenter randomized controlled trial involving 109 patients assigned to placebo, 5,000 units, or 10,000 units of botulinum toxin type B groups. The primary outcome was the Toronto Western Spasmodic Torticollis Rating Scale measured over 16 weeks. Bayesian hierarchical modeling assessed treatment effects while accounting for patient heterogeneity. Lower botulinum toxin type B doses (5,000 units) showed greater overall Toronto Western Spasmodic Torticollis Rating Scale score reductions (treatment effect: -2.39, 95% Probability Interval: -4.10 to -0.70). Male patients demonstrated better responses (5.2% greater improvement) than female patients. Substantial between-patient variability and site-specific effects were observed, highlighting the need for personalized protocols. The study confirms botulinum toxin type B's dose-dependent efficacy while identifying key modifiable factors in treatment response. Bayesian methods provided nuanced insights into uncertainty and heterogeneity, paving the way for personalized medicine in cervical dystonia management.
10. Waver: Wave Your Way to Lifelike Video Generation
Authors: Yifu Zhang, Hao Yang, Yuqi Zhang, Yifei Hu, Fengda Zhu, Chuang Lin, Xiaofeng Mei, Yi Jiang, Zehuan Yuan, Bingyue Peng β’
Published: 2025-08-21 β’
Source: arXiv
We present Waver, a high-performance foundation model for unified image and video generation. Waver can directly generate videos with durations ranging from 5 to 10 seconds at a native resolution of 720p, which are subsequently upscaled to 1080p. The model simultaneously supports text-to-video (T2V), image-to-video (I2V), and text-to-image (T2I) generation within a single, integrated framework. We introduce a Hybrid Stream DiT architecture to enhance modality alignment and accelerate training convergence. To ensure training data quality, we establish a comprehensive data curation pipeline and manually annotate and train an MLLM-based video quality model to filter for the highest-quality samples. Furthermore, we provide detailed training and inference recipes to facilitate the generation of high-quality videos. Building on these contributions, Waver excels at capturing complex motion, achieving superior motion amplitude and temporal consistency in video synthesis. Notably, it ranks among the Top 3 on both the T2V and I2V leaderboards at Artificial Analysis (data as of 2025-07-30 10:00 GMT+8), consistently outperforming existing open-source models and matching or surpassing state-of-the-art commercial solutions. We hope this technical report will help the community more efficiently train high-quality video generation models and accelerate progress in video generation technologies. Official page: https://github.com/FoundationVision/Waver.
11. Skyrmion Lattice Order Controlled by Confinement Geometry
Authors: Raphael Gruber, Jan RothΓΆrl, Simon M. FrΓΆhlich, Maarten A. Brems, Fabian Kammerbauer, Maria-Andromachi Syskaki, Elizabeth M. Jefremovas, Sachin Krishnia, Asle SudbΓΈ, Peter Virnau, Mathias KlΓ€ui β’
Published: 2025-08-21 β’
Source: arXiv
Magnetic skyrmions forming two-dimensional (2D) lattices provide a versatile platform for investigating phase transitions predicted by Kosterlitz-Thouless-Halperin-Nelson-Young (KTHNY) theory. While 2D melting in skyrmion systems has been demonstrated, achieving controlled ordering in skyrmion lattices remains challenging due to pinning effects from a non-uniform energy landscape, which often results in polycrystalline structures. Skyrmions in thin films, however, offer thermal diffusion with high tunability and can be directly imaged via Kerr microscopy, enabling real-time observation of their dynamics. To regulate lattice order in such flexible systems, we introduce geometric confinements of varying shapes. Combining Kerr microscopy experiments with Thiele model simulations, we demonstrate that confinement geometry critically influences lattice order. Specifically, hexagonal confinements commensurate with the skyrmion lattice stabilize monodomain hexagonal ordering, while incommensurate geometries induce domain formation and reduce overall order. Understanding these boundary-driven effects is essential for advancing the study of 2D phase behavior and for the design of skyrmion-based spintronic applications, ranging from memory devices to unconventional computing architectures.
12. Language-Guided Tuning: Enhancing Numeric Optimization with Textual Feedback
Authors: Yuxing Lu, Yucheng Hu, Nan Sun, Xukai Zhao β’
Published: 2025-08-21 β’
Source: arXiv
Configuration optimization remains a critical bottleneck in machine learning, requiring coordinated tuning across model architecture, training strategy, feature engineering, and hyperparameters. Traditional approaches treat these dimensions independently and lack interpretability, while recent automated methods struggle with dynamic adaptability and semantic reasoning about optimization decisions. We introduce Language-Guided Tuning (LGT), a novel framework that employs multi-agent Large Language Models to intelligently optimize configurations through natural language reasoning. We apply textual gradients - qualitative feedback signals that complement numerical optimization by providing semantic understanding of training dynamics and configuration interdependencies. LGT coordinates three specialized agents: an Advisor that proposes configuration changes, an Evaluator that assesses progress, and an Optimizer that refines the decision-making process, creating a self-improving feedback loop. Through comprehensive evaluation on six diverse datasets, LGT demonstrates substantial improvements over traditional optimization methods, achieving performance gains while maintaining high interpretability.
13. Robust Data Interpretation for Perturbed Nulling Interferometers via Proper Handling of Correlated Errors
Authors: Philipp A. Huber, Felix A. Dannert, Romain Laugier, Taro Matsuo, Loes W. Rutten, Adrian M. Glauser, Sascha P. Quanz β’
Published: 2025-08-21 β’
Source: arXiv
The detection and atmospheric characterization of potentially habitable, temperate terrestrial exoplanets using a space-based mid-infrared nulling interferometer is a major goal of contemporary astrophysics. A central part of the analysis of such an instrument are correlated errors arising from perturbations in the system. While previous studies have often treated their effects in a limited manner, we aim to treat them comprehensively here and argue that data whitening based on the covariance of these errors is a suitable method to mitigate their impact. We present a framework that quantitatively connects instrumental perturbations to performance metrics and develop two computational tools to support our analysis: PHRINGE, for the generation of synthetic nulling data, and LIFEsimMC, a new Monte Carlo-based end-to-end simulator for the Large Interferometer For Exoplanets (LIFE). Applying our framework to a reference observation of an Earth twin orbiting a Sun twin at 10 pc, we find that whitening is not only essential for a correct interpretation of the detection metric used in hypothesis testing, but also improves the estimates of the planetary properties. Moreover, our approach enables an estimation of the spectral covariance of the extracted planetary spectra, providing valuable additional input for future atmospheric retrievals. We therefore recommend incorporating the framework into performance assessments and requirement derivations for future nulling interferometers.
14. Neural Robot Dynamics
Authors: Jie Xu, Eric Heiden, Iretiayo Akinola, Dieter Fox, Miles Macklin, Yashraj Narang β’
Published: 2025-08-21 β’
Source: arXiv
Accurate and efficient simulation of modern robots remains challenging due to their high degrees of freedom and intricate mechanisms. Neural simulators have emerged as a promising alternative to traditional analytical simulators, capable of efficiently predicting complex dynamics and adapting to real-world data; however, existing neural simulators typically require application-specific training and fail to generalize to novel tasks and/or environments, primarily due to inadequate representations of the global state. In this work, we address the problem of learning generalizable neural simulators for robots that are structured as articulated rigid bodies. We propose NeRD (Neural Robot Dynamics), learned robot-specific dynamics models for predicting future states for articulated rigid bodies under contact constraints. NeRD uniquely replaces the low-level dynamics and contact solvers in an analytical simulator and employs a robot-centric and spatially-invariant simulation state representation. We integrate the learned NeRD models as an interchangeable backend solver within a state-of-the-art robotics simulator. We conduct extensive experiments to show that the NeRD simulators are stable and accurate over a thousand simulation steps; generalize across tasks and environment configurations; enable policy learning exclusively in a neural engine; and, unlike most classical simulators, can be fine-tuned from real-world data to bridge the gap between simulation and reality.
15. Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis
Authors: Yufeng Zhao, Junnan Liu, Hongwei Liu, Dongsheng Zhu, Yuan Shen, Songyang Zhang, Kai Chen β’
Published: 2025-08-21 β’
Source: arXiv
Large Language Models (LLMs) have made significant strides in reasoning tasks through methods like chain-of-thought (CoT) reasoning. However, they often fall short in tasks requiring precise computations. Tool-Integrated Reasoning (TIR) has emerged as a solution by incorporating external tools into the reasoning process. Nevertheless, the generalization of TIR in improving the reasoning ability of LLM is still unclear. Additionally, whether TIR has improved the model's reasoning behavior and helped the model think remains to be studied. We introduce ReasonZoo, a comprehensive benchmark encompassing nine diverse reasoning categories, to evaluate the effectiveness of TIR across various domains. Additionally, we propose two novel metrics, Performance-Aware Cost (PAC) and Area Under the Performance-Cost Curve (AUC-PCC), to assess reasoning efficiency. Our empirical evaluation demonstrates that TIR-enabled models consistently outperform their non-TIR counterparts in both mathematical and non-mathematical tasks. Furthermore, TIR enhances reasoning efficiency, as evidenced by improved PAC and AUC-PCC, indicating reduced overthinking and more streamlined reasoning. These findings underscore the domain-general benefits of TIR and its potential to advance LLM capabilities in complex reasoning tasks.
16. Fine-grained Multi-class Nuclei Segmentation with Molecular-empowered All-in-SAM Model
Authors: Xueyuan Li, Can Cui, Ruining Deng, Yucheng Tang, Quan Liu, Tianyuan Yao, Shunxing Bao, Naweed Chowdhury, Haichun Yang, Yuankai Huo β’
Published: 2025-08-21 β’
Source: arXiv
Purpose: Recent developments in computational pathology have been driven by advances in Vision Foundation Models, particularly the Segment Anything Model (SAM). This model facilitates nuclei segmentation through two primary methods: prompt-based zero-shot segmentation and the use of cell-specific SAM models for direct segmentation. These approaches enable effective segmentation across a range of nuclei and cells. However, general vision foundation models often face challenges with fine-grained semantic segmentation, such as identifying specific nuclei subtypes or particular cells. Approach: In this paper, we propose the molecular-empowered All-in-SAM Model to advance computational pathology by leveraging the capabilities of vision foundation models. This model incorporates a full-stack approach, focusing on: (1) annotation-engaging lay annotators through molecular-empowered learning to reduce the need for detailed pixel-level annotations, (2) learning-adapting the SAM model to emphasize specific semantics, which utilizes its strong generalizability with SAM adapter, and (3) refinement-enhancing segmentation accuracy by integrating Molecular-Oriented Corrective Learning (MOCL). Results: Experimental results from both in-house and public datasets show that the All-in-SAM model significantly improves cell classification performance, even when faced with varying annotation quality. Conclusions: Our approach not only reduces the workload for annotators but also extends the accessibility of precise biomedical image analysis to resource-limited settings, thereby advancing medical diagnostics and automating pathology image analysis.
17. Active Learning for Neurosymbolic Program Synthesis
Authors: Celeste Barnaby, Qiaochu Chen, Ramya Ramalingam, Osbert Bastani, Isil Dillig β’
Published: 2025-08-21 β’
Source: arXiv
The goal of active learning for program synthesis is to synthesize the desired program by asking targeted questions that minimize user interaction. While prior work has explored active learning in the purely symbolic setting, such techniques are inadequate for the increasingly popular paradigm of neurosymbolic program synthesis, where the synthesized program incorporates neural components. When applied to the neurosymbolic setting, such techniques can -- and, in practice, do -- return an unintended program due to mispredictions of neural components. This paper proposes a new active learning technique that can handle the unique challenges posed by neural network mispredictions. Our approach is based upon a new evaluation strategy called constrained conformal evaluation (CCE), which accounts for neural mispredictions while taking into account user-provided feedback. Our proposed method iteratively makes CCE more precise until all remaining programs are guaranteed to be observationally equivalent. We have implemented this method in a tool called SmartLabel and experimentally evaluated it on three neurosymbolic domains. Our results demonstrate that SmartLabel identifies the ground truth program for 98% of the benchmarks, requiring under 5 rounds of user interaction on average. In contrast, prior techniques for active learning are only able to converge to the ground truth program for at most 65% of the benchmarks.
18. Response and Prompt Evaluation to Prevent Parasocial Relationships with Chatbots
Authors: Emma Rath, Stuart Armstrong, Rebecca Gorman β’
Published: 2025-08-21 β’
Source: arXiv
The development of parasocial relationships with AI agents has severe, and in some cases, tragic effects for human well-being. Yet preventing such dynamics is challenging: parasocial cues often emerge gradually in private conversations, and not all forms of emotional engagement are inherently harmful. We address this challenge by introducing a simple response evaluation framework, created by repurposing a state-of-the-art language model, that evaluates ongoing conversations for parasocial cues in real time. To test the feasibility of this approach, we constructed a small synthetic dataset of thirty dialogues spanning parasocial, sycophantic, and neutral conversations. Iterative evaluation with five stage testing successfully identified all parasocial conversations while avoiding false positives under a tolerant unanimity rule, with detection typically occurring within the first few exchanges. These findings provide preliminary evidence that evaluation agents can provide a viable solution for the prevention of parasocial relations.
19. End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
Authors: Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang, Weidi Xie β’
Published: 2025-08-21 β’
Source: arXiv
Accurate diagnosis with medical large language models is hindered by knowledge gaps and hallucinations. Retrieval and tool-augmented methods help, but their impact is limited by weak use of external knowledge and poor feedback-reasoning traceability. To address these challenges, We introduce Deep-DxSearch, an agentic RAG system trained end-to-end with reinforcement learning (RL) that enables steer tracebale retrieval-augmented reasoning for medical diagnosis. In Deep-DxSearch, we first construct a large-scale medical retrieval corpus comprising patient records and reliable medical knowledge sources to support retrieval-aware reasoning across diagnostic scenarios. More crutially, we frame the LLM as the core agent and the retrieval corpus as its environment, using tailored rewards on format, retrieval, reasoning structure, and diagnostic accuracy, thereby evolving the agentic RAG policy from large-scale data through RL. Experiments demonstrate that our end-to-end agentic RL training framework consistently outperforms prompt-engineering and training-free RAG approaches across multiple data centers. After training, Deep-DxSearch achieves substantial gains in diagnostic accuracy, surpassing strong diagnostic baselines such as GPT-4o, DeepSeek-R1, and other medical-specific frameworks for both common and rare disease diagnosis under in-distribution and out-of-distribution settings. Moreover, ablation studies on reward design and retrieval corpus components confirm their critical roles, underscoring the uniqueness and effectiveness of our approach compared with traditional implementations. Finally, case studies and interpretability analyses highlight improvements in Deep-DxSearch's diagnostic policy, providing deeper insight into its performance gains and supporting clinicians in delivering more reliable and precise preliminary diagnoses. See https://github.com/MAGIC-AI4Med/Deep-DxSearch.
20. Energy conditions for regular black holes in EFT of gravity
Authors: Ziyue Zhu, Alexey S. Koshelev, Yang Liu, Anna Tokareva β’
Published: 2025-08-21 β’
Source: arXiv
As Einstein's gravity is a non-renormalizable theory, it can be a good description of physics only at the scales of energy or spacetime curvature below the Planck mass. Moreover, it requires the presence of an infinite tower of higher-derivative corrections, as required in the framework of effective field theory (EFT). Black holes, known to be vacuum solutions in Einstein's gravity, necessarily have singularities in the center, where both Einstein's gravity and low-energy EFT expansions break down. In this work, we address the question of whether, in the presence of matter, regular solutions looking like black holes from outside do exist. We show that the matter distribution supporting the regular black hole solution in the presence of Riemann tensor cube and Riemann tensor to the fourth power EFT corrections satisfies positivity of energy (also called weak energy condition, WEC) and null energy condition (NEC) everywhere outside the horizon. Unlike the case of singular solutions, the EFT description is also valid in the interior of such an object, given that the maximal curvature is bounded and does not exceed the cut-off scale. We found that in a wide range of parameters, WEC is satisfied inside the horizon, but NEC is violated inside the horizon in all cases.
21. Colour Codes Reach Surface Code Performance using Vibe Decoding
Authors: Stergios Koutsioumpas, Tamas Noszko, Hasan Sayginel, Mark Webster, Joschka Roffe β’
Published: 2025-08-21 β’
Source: arXiv
Two-dimensional quantum colour codes hold significant promise for quantum error correction, offering advantages such as planar connectivity and low overhead logical gates. Despite their theoretical appeal, the practical deployment of these codes faces challenges due to complex decoding requirements compared to surface codes. This paper introduces vibe decoding which, for the first time, brings colour code performance on par with the surface code under practical decoding. Our approach leverages an ensemble of belief propagation decoders - each executing a distinct serial message passing schedule - combined with localised statistics post-processing. We refer to this combined protocol as VibeLSD. The VibeLSD decoder is highly versatile: our numerical results show it outperforms all practical existing colour code decoders across various syndrome extraction schemes, noise models, and error rates. By estimating qubit footprints through quantum memory simulations, we show that colour codes can operate with overhead that is comparable to, and in some cases lower than, that of the surface code. This, combined with the fact that localised statistics decoding is a parallel algorithm, makes VibeLSD suitable for implementation on specialised hardware for real-time decoding. Our results establish the colour code as a practical architecture for near-term quantum hardware, providing improved compilation efficiency for both Clifford and non-Clifford gates without incurring additional qubit overhead relative to the surface code.
22. Effective programming of a photonic processor with complex interferometric structure
Authors: Ilya V. Kondratyev, Kseniia N. Urusova, Artem S. Argenchiev, Nikita S. Klushnikov, Sergei S. Kuzmin, Nikolay N. Skryabin, Alexander D. Golikov, Vadim V. Kovalyuk, Gregory N. Goltsman, Ivan V. Dyakonov, Stanislav S. Straupe, Sergei P. Kulik β’
Published: 2025-08-21 β’
Source: arXiv
Reconfigurable photonics have rapidly become an invaluable tool for information processing. Light-based computing accelerators are promising for boosting neural network learning and inference and optical interconnects are foreseen as a solution to the information transfer bottleneck in high-performance computing. In this study, we demonstrate the successful programming of a transformation implemented using a reconfigurable photonic circuit with a non-conventional architecture. The core of most photonic processors is an MZI-based architecture that establishes an analytical connection between the controllable parameters and circuit transformation. However, several architectures that are substantially more difficult to program have improved robustness to fabrication defects. We use two algorithms that rely on different initial datasets to reconstruct the circuit model of a complex interferometer, and then program the required unitary transformation. Both methods performed accurate circuit programming with an average fidelity above 98\%. Our results provide a strong foundation for the introduction of non-conventional interferometric architectures for photonic information processing.
23. Probability Density from Latent Diffusion Models for Out-of-Distribution Detection
Authors: Joonas JΓ€rve, Karl Kaspar Haavel, Meelis Kull β’
Published: 2025-08-21 β’
Source: arXiv
Despite rapid advances in AI, safety remains the main bottleneck to deploying machine-learning systems. A critical safety component is out-of-distribution detection: given an input, decide whether it comes from the same distribution as the training data. In generative models, the most natural OOD score is the data likelihood. Actually, under the assumption of uniformly distributed OOD data, the likelihood is even the optimal OOD detector, as we show in this work. However, earlier work reported that likelihood often fails in practice, raising doubts about its usefulness. We explore whether, in practice, the representation space also suffers from the inability to learn good density estimation for OOD detection, or if it is merely a problem of the pixel space typically used in generative models. To test this, we trained a Variational Diffusion Model not on images, but on the representation space of a pre-trained ResNet-18 to assess the performance of our likelihood-based detector in comparison to state-of-the-art methods from the OpenOOD suite.
24. Measuring the environmental impact of delivering AI at Google Scale
Authors: Cooper Elsworth, Keguo Huang, David Patterson, Ian Schneider, Robert Sedivy, Savannah Goodman, Ben Townsend, Parthasarathy Ranganathan, Jeff Dean, Amin Vahdat, Ben Gomes, James Manyika β’
Published: 2025-08-21 β’
Source: arXiv
The transformative power of AI is undeniable - but as user adoption accelerates, so does the need to understand and mitigate the environmental impact of AI serving. However, no studies have measured AI serving environmental metrics in a production environment. This paper addresses this gap by proposing and executing a comprehensive methodology for measuring the energy usage, carbon emissions, and water consumption of AI inference workloads in a large-scale, AI production environment. Our approach accounts for the full stack of AI serving infrastructure - including active AI accelerator power, host system energy, idle machine capacity, and data center energy overhead. Through detailed instrumentation of Google's AI infrastructure for serving the Gemini AI assistant, we find the median Gemini Apps text prompt consumes 0.24 Wh of energy - a figure substantially lower than many public estimates. We also show that Google's software efficiency efforts and clean energy procurement have driven a 33x reduction in energy consumption and a 44x reduction in carbon footprint for the median Gemini Apps text prompt over one year. We identify that the median Gemini Apps text prompt uses less energy than watching nine seconds of television (0.24 Wh) and consumes the equivalent of five drops of water (0.26 mL). While these impacts are low compared to other daily activities, reducing the environmental impact of AI serving continues to warrant important attention. Towards this objective, we propose that a comprehensive measurement of AI serving environmental metrics is critical for accurately comparing models, and to properly incentivize efficiency gains across the full AI serving stack.
25. Numerical models outperform AI weather forecasts of record-breaking extremes
Authors: Zhongwei Zhang, Erich Fischer, Jakob Zscheischler, Sebastian Engelke β’
Published: 2025-08-21 β’
Source: arXiv
Artificial intelligence (AI)-based models are revolutionizing weather forecasting and have surpassed leading numerical weather prediction systems on various benchmark tasks. However, their ability to extrapolate and reliably forecast unprecedented extreme events remains unclear. Here, we show that for record-breaking weather extremes, the numerical model High RESolution forecast (HRES) from the European Centre for Medium-Range Weather Forecasts still consistently outperforms state-of-the-art AI models GraphCast, GraphCast operational, Pangu-Weather, Pangu-Weather operational, and Fuxi. We demonstrate that forecast errors in AI models are consistently larger for record-breaking heat, cold, and wind than in HRES across nearly all lead times. We further find that the examined AI models tend to underestimate both the frequency and intensity of record-breaking events, and they underpredict hot records and overestimate cold records with growing errors for larger record exceedance. Our findings underscore the current limitations of AI weather models in extrapolating beyond their training domain and in forecasting the potentially most impactful record-breaking weather events that are particularly frequent in a rapidly warming climate. Further rigorous verification and model development is needed before these models can be solely relied upon for high-stakes applications such as early warning systems and disaster management.
26. Orientation dependent anomalous Hall and spin Hall currents at the junctions of altermagnets with $p$-wave magnets
Authors: Sachchidanand Das, Abhiram Soori β’
Published: 2025-08-21 β’
Source: arXiv
We study charge and spin transport across a junction between an altermagnet (AM) and a $p$-wave magnet (PM) using a continuum model with boundary conditions tailored to the spin-split band structures of the two materials. Remarkably, although neither AM nor PM is spin-polarized, we find that the junction supports finite spin currents both longitudinally and transversely. We compute the longitudinal and transverse charge and spin conductivities as functions of the crystallographic orientations and the relative angle between the N\'eel vectors of AM and PM. Our results reveal that transverse charge and spin conductivities can be finite even when the longitudinal charge conductivity vanishes. For suitable parameter choices and orientation angles, the transverse conductivities are more prominent than the longitudinal ones. The origin of these effects lies in the matching and mismatching of transverse momentum modes ($k_y$) across the junction combined with the spin-dependent band splitting in AM and PM. Furthermore, while the transverse charge conductivity may cancel for certain orientations, the transverse spin conductivity remains finite due to unequal contributions of opposite $k_y$ channels. These findings highlight altermagnet/$p$-wave magnet junctions as a promising platform for tunable generation and control of transverse charge and spin currents driven purely by crystallographic orientation and spin structure.
27. WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception
Authors: Zhiheng Liu, Xueqing Deng, Shoufa Chen, Angtian Wang, Qiushan Guo, Mingfei Han, Zeyue Xue, Mengzhao Chen, Ping Luo, Linjie Yang β’
Published: 2025-08-21 β’
Source: arXiv
Generative video modeling has made significant strides, yet ensuring structural and temporal consistency over long sequences remains a challenge. Current methods predominantly rely on RGB signals, leading to accumulated errors in object structure and motion over extended durations. To address these issues, we introduce WorldWeaver, a robust framework for long video generation that jointly models RGB frames and perceptual conditions within a unified long-horizon modeling scheme. Our training framework offers three key advantages. First, by jointly predicting perceptual conditions and color information from a unified representation, it significantly enhances temporal consistency and motion dynamics. Second, by leveraging depth cues, which we observe to be more resistant to drift than RGB, we construct a memory bank that preserves clearer contextual information, improving quality in long-horizon video generation. Third, we employ segmented noise scheduling for training prediction groups, which further mitigates drift and reduces computational cost. Extensive experiments on both diffusion- and rectified flow-based models demonstrate the effectiveness of WorldWeaver in reducing temporal drift and improving the fidelity of generated videos.
28. Tutorial on the Probabilistic Unification of Estimation Theory, Machine Learning, and Generative AI
Authors: Mohammed Elmusrati β’
Published: 2025-08-21 β’
Source: arXiv
Extracting meaning from uncertain, noisy data is a fundamental problem across time series analysis, pattern recognition, and language modeling. This survey presents a unified mathematical framework that connects classical estimation theory, statistical inference, and modern machine learning, including deep learning and large language models. By analyzing how techniques such as maximum likelihood estimation, Bayesian inference, and attention mechanisms address uncertainty, the paper illustrates that many AI methods are rooted in shared probabilistic principles. Through illustrative scenarios including system identification, image classification, and language generation, we show how increasingly complex models build upon these foundations to tackle practical challenges like overfitting, data sparsity, and interpretability. In other words, the work demonstrates that maximum likelihood, MAP estimation, Bayesian classification, and deep learning all represent different facets of a shared goal: inferring hidden causes from noisy and/or biased observations. It serves as both a theoretical synthesis and a practical guide for students and researchers navigating the evolving landscape of machine learning.
29. StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding
Authors: Yanlai Yang, Zhuokai Zhao, Satya Narayan Shukla, Aashu Singh, Shlok Kumar Mishra, Lizhu Zhang, Mengye Ren β’
Published: 2025-08-21 β’
Source: arXiv
Multimodal large language models (MLLMs) have made significant progress in visual-language reasoning, but their ability to efficiently handle long videos remains limited. Despite recent advances in long-context MLLMs, storing and attending to the key-value (KV) cache for long visual contexts incurs substantial memory and computational overhead. Existing visual compression methods require either encoding the entire visual context before compression or having access to the questions in advance, which is impractical for long video understanding and multi-turn conversational settings. In this work, we propose StreamMem, a query-agnostic KV cache memory mechanism for streaming video understanding. Specifically, StreamMem encodes new video frames in a streaming manner, compressing the KV cache using attention scores between visual tokens and generic query tokens, while maintaining a fixed-size KV memory to enable efficient question answering (QA) in memory-constrained, long-video scenarios. Evaluation on three long video understanding and two streaming video question answering benchmarks shows that StreamMem achieves state-of-the-art performance in query-agnostic KV cache compression and is competitive with query-aware compression approaches.
30. Foundation Models for Cross-Domain EEG Analysis Application: A Survey
Authors: Hongqi Li, Yitong Chen, Yujuan Wang, Weihang Ni, Haodong Zhang β’
Published: 2025-08-21 β’
Source: arXiv
Electroencephalography (EEG) analysis stands at the forefront of neuroscience and artificial intelligence research, where foundation models are reshaping the traditional EEG analysis paradigm by leveraging their powerful representational capacity and cross-modal generalization. However, the rapid proliferation of these techniques has led to a fragmented research landscape, characterized by diverse model roles, inconsistent architectures, and a lack of systematic categorization. To bridge this gap, this study presents the first comprehensive modality-oriented taxonomy for foundation models in EEG analysis, systematically organizing research advances based on output modalities of the native EEG decoding, EEG-text, EEG-vision, EEG-audio, and broader multimodal frameworks. We rigorously analyze each category's research ideas, theoretical foundations, and architectural innovations, while highlighting open challenges such as model interpretability, cross-domain generalization, and real-world applicability in EEG-based systems. By unifying this dispersed field, our work not only provides a reference framework for future methodology development but accelerates the translation of EEG foundation models into scalable, interpretable, and online actionable solutions.
31. Stemming -- The Evolution and Current State with a Focus on Bangla
Authors: Abhijit Paul, Mashiat Amin Farin, Sharif Md. Abdullah, Ahmedul Kabir, Zarif Masud, Shebuti Rayana β’
Published: 2025-08-21 β’
Source: arXiv
Bangla, the seventh most widely spoken language worldwide with 300 million native speakers, faces digital under-representation due to limited resources and lack of annotated datasets. Stemming, a critical preprocessing step in language analysis, is essential for low-resource, highly-inflectional languages like Bangla, because it can reduce the complexity of algorithms and models by significantly reducing the number of words the algorithm needs to consider. This paper conducts a comprehensive survey of stemming approaches, emphasizing the importance of handling morphological variants effectively. While exploring the landscape of Bangla stemming, it becomes evident that there is a significant gap in the existing literature. The paper highlights the discontinuity from previous research and the scarcity of accessible implementations for replication. Furthermore, it critiques the evaluation methodologies, stressing the need for more relevant metrics. In the context of Bangla's rich morphology and diverse dialects, the paper acknowledges the challenges it poses. To address these challenges, the paper suggests directions for Bangla stemmer development. It concludes by advocating for robust Bangla stemmers and continued research in the field to enhance language analysis and processing.
32. Position Bias Mitigates Position Bias:Mitigate Position Bias Through Inter-Position Knowledge Distillation
Authors: Yifei Wang, Feng Xiong, Yong Wang, Linjing Li, Xiangxiang Chu, Daniel Dajun Zeng β’
Published: 2025-08-21 β’
Source: arXiv
Positional bias (PB), manifesting as non-uniform sensitivity across different contextual locations, significantly impairs long-context comprehension and processing capabilities. While prior work seeks to mitigate PB through modifying the architectures causing its emergence, significant PB still persists. To address PB effectively, we introduce \textbf{Pos2Distill}, a position to position knowledge distillation framework. Pos2Distill transfers the superior capabilities from advantageous positions to less favorable ones, thereby reducing the huge performance gaps. The conceptual principle is to leverage the inherent, position-induced disparity to counteract the PB itself. We identify distinct manifestations of PB under \textbf{\textsc{r}}etrieval and \textbf{\textsc{r}}easoning paradigms, thereby designing two specialized instantiations: \emph{Pos2Distill-R\textsuperscript{1}} and \emph{Pos2Distill-R\textsuperscript{2}} respectively, both grounded in this core principle. By employing the Pos2Distill approach, we achieve enhanced uniformity and significant performance gains across all contextual positions in long-context retrieval and reasoning tasks. Crucially, both specialized systems exhibit strong cross-task generalization mutually, while achieving superior performance on their respective tasks.
33. Existence of hyperbolic blow-up to the generalized quasi-geostrophic equation
Authors: Lucas C. F. Ferreira, Ricardo M. M. GuimarΓ£es β’
Published: 2025-08-21 β’
Source: arXiv
In this work, we investigate the blow-up of solutions to the generalized surface quasi-geostrophic (gSQG) equation in $\mathbb{R}^{2}$, within the more singular range $\beta\in(1,2)$ for the coupling of the velocity field. This behavior is studied under a hyperbolic setting based on the framework originally introduced by C\'{o}rdoba (1998, Annals of Math. 148, 1135--52) for the classical SQG equation. Assuming that the level sets of the solution contains a hyperbolic saddle, and under suitable conditions on the solution at the origin, we obtain the existence of a time $T^{\ast}\in\mathbb{R}^{+}\cup\{\infty\}$ at which the opening angle of the saddle collapses. Moreover, we derive a lower bound for the blow-up time $T^\ast$. This geometric degeneration leads to the blow-up of the H\"{o}lder norm $\Vert\theta(t)\Vert_{C^{\sigma}}$ as $t\rightarrow T^{\ast}$, for $\sigma\in(0, \beta -1)$, showing the formation of singularity in the H\"{o}lder space at time $T^{\ast}$. To the best of our knowledge, these are the first results in the literature to rigorously prove the formation of a singularity, whether in finite or infinite time, for a class of smooth solutions to the gSQG equation.
34. Classification of Magnetism and Altermagnetism in Quasicrystals
Authors: Zhi-Yan Shao, Chen Lu, Zhiming Pan, Yu-Bo Liu, Fan Yang β’
Published: 2025-08-21 β’
Source: arXiv
Altermagnetism, an unconventional magnetic phase characterized by zero net magnetism and a spin-split electronic band, has been studied exclusively in conventional crystalline materials. In this work, we extend the theoretical framework of altermagnetism to quasicrystals (QCs), which lack translational symmetry. We classify magnetic phases in 2D QCs with $n$-fold rotational symmetry without spin-orbit coupling, by using the irreducible representation (IRRP) of the $D_n$ point group. Based on symmetry analysis, we propose the conjecture that magnetic phases corresponding to 1D non-identity IRRPs are generally altermagnetic, with the exception of those possessing parity-time symmetry. To verify our conjecture, we take the Hubbard model as an example and develop a systematic approach to determine the magnetic pattern in the QC, which effectively avoids getting trapped at local energy minima. Consequently, our tests for the Hubbard model on various QCs with different symmetries are unexceptionally consistent with our proposal. Our work highlights the QC as a platform where the altermagnetism is common among magnetic phases.
35. Detection of non-absolute separability in quantum states and channels through moments
Authors: Bivas Mallick, Saheli Mukherjee, Nirman Ganguly, A. S. Majumdar β’
Published: 2025-08-21 β’
Source: arXiv
In quantum information and computation, generation of entanglement through unitary gates remains a significant and active area of research. However, there are states termed as absolutely separable, from which entanglement cannot be created through any non-local unitary action. Thus, from a resource-theoretic perspective, non-absolutely separable states are useful as they can be turned into entangled states using some appropriate unitary gates. In this work, we propose an efficient method to detect non-absolutely separable states. Our approach relies on evaluating moments that can bypass the need for full state tomography, thereby enhancing its practical applicability. We then present several examples in support of our detection scheme. We also address a closely related problem concerning states whose partial transpose remains positive under any arbitrary non-local unitary action. Furthermore, we examine the effectiveness of our moment-based approach in the detection of quantum channels that are not absolutely separating, which entails the detection of resource preserving channels. Finally, we demonstrate the operational significance of non-absolutely separable states by proving that every such state can provide an advantage in a quantum-channel discrimination task.