1. Dynamic Decision Modeling for Viable Short and Long Term Production Policies: An HJB Approach
Authors: Achraf Bouhmady, Mustapha Serhani, Nadia Raissi β’
Published: 2025-09-15 β’
Source: arXiv
This study introduces a mathematical framework to investigate the viability and reachability of production systems under constraints. We develop a model that incorporates key decision variables, such as pricing policy, quality investment, and advertising, to analyze short-term tactical decisions and long-term strategic outcomes. In the short term, we constructed a capture basin that defined the initial conditions under which production viability constraints were satisfied within the target zone. In the long term, we explore the dynamics of product quality and market demand to achieve and sustain the desired target. The Hamilton-Jacobi-Bellman (HJB) theory characterizes the capture basin and viability kernel using viscosity solutions of the HJB equation. This approach, which avoids controllability assumptions, is well suited to viability problems with specified targets. It provides managers with insights into maintaining production and inventory levels within viable ranges while considering product quality and evolving market demand. We numerically studied the HJB equation to design and test computational methods that validate the theoretical insights. Simulations offer practical tools for decision-makers to address operational challenges while aligning with the long-term sustainability goals. This study enhances the production system performance and resilience by linking rigorous mathematics with actionable solutions.
2. LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence
Authors: Zixin Yin, Xili Dai, Duomin Wang, Xianfang Zeng, Lionel M. Ni, Gang Yu, Heung-Yeung Shum β’
Published: 2025-09-15 β’
Source: arXiv
The reliance on implicit point matching via attention has become a core bottleneck in drag-based editing, resulting in a fundamental compromise on weakened inversion strength and costly test-time optimization (TTO). This compromise severely limits the generative capabilities of diffusion models, suppressing high-fidelity inpainting and text-guided creation. In this paper, we introduce LazyDrag, the first drag-based image editing method for Multi-Modal Diffusion Transformers, which directly eliminates the reliance on implicit point matching. In concrete terms, our method generates an explicit correspondence map from user drag inputs as a reliable reference to boost the attention control. This reliable reference opens the potential for a stable full-strength inversion process, which is the first in the drag-based editing task. It obviates the necessity for TTO and unlocks the generative capability of models. Therefore, LazyDrag naturally unifies precise geometric control with text guidance, enabling complex edits that were previously out of reach: opening the mouth of a dog and inpainting its interior, generating new objects like a ``tennis ball'', or for ambiguous drags, making context-aware changes like moving a hand into a pocket. Additionally, LazyDrag supports multi-round workflows with simultaneous move and scale operations. Evaluated on the DragBench, our method outperforms baselines in drag accuracy and perceptual quality, as validated by VIEScore and human evaluation. LazyDrag not only establishes new state-of-the-art performance, but also paves a new way to editing paradigms.
3. Deriving accurate galaxy cluster masses using X-ray thermodynamic profiles and graph neural networks
Authors: Asif Iqbal, Subhabrata Majumdar, Elena Rasia, Gabriel W. Pratt, Daniel de Andres, Jean-Baptiste Melin, Weiguang Cui β’
Published: 2025-09-15 β’
Source: arXiv
Precise determination of galaxy cluster masses is crucial for establishing reliable mass-observable scaling relations in cluster cosmology. We employ graph neural networks (GNNs) to estimate galaxy cluster masses from radially sampled profiles of the intra-cluster medium (ICM) inferred from X-ray observations. GNNs naturally handle inputs of variable length and resolution by representing each ICM profile as a graph, enabling accurate and flexible modeling across diverse observational conditions. We trained and tested GNN model using state-of-the-art hydrodynamical simulations of galaxy clusters from The Three Hundred Project. The mass estimates using our method exhibit no systematic bias compared to the true cluster masses in the simulations. Additionally, we achieve a scatter in recovered mass versus true mass of about 6\%, which is a factor of six smaller than obtained from a standard hydrostatic equilibrium approach. Our algorithm is robust to both data quality and cluster morphology and it is capable of incorporating model uncertainties alongside observational uncertainties. Finally, we apply our technique to XMM-Newton observed galaxy cluster samples and compare the GNN derived mass estimates with those obtained with $Y_{\rm SZ}$-M$_{500}$ scaling relations. Our results provide strong evidence, at 5$\sigma$ level, for a mass-dependent bias in SZ derived masses, with higher mass clusters exhibiting a greater degree of deviation. Furthermore, we find the median bias to be $(1-b)=0.85_{-14}^{+34}$, albeit with significant dispersion due to its mass dependence. This work takes a significant step towards establishing unbiased observable mass scaling relations by integrating X-ray, SZ and optical datasets using deep learning techniques, thereby enhancing the role of galaxy clusters in precision cosmology.
4. Spin-polarization and diode effect in thermoelectric current through altermagnet-based superconductor heterostructures
Authors: Debika Debnath, Arijit Saha, Paramita Dutta β’
Published: 2025-09-15 β’
Source: arXiv
The recent advent of a new class of magnetic material named as altermagnet (AM), characterized by a combination of momentum-dependent spin-splitting with zero net magnetization, has opened up promising prospects for spintronic applications. We theoretically explore how the altermagnetic spin-splitting affects the thermoelectric quasiparticle current in AM-based superconducting heterostructures. Our setup comprises of a bilayer system where a $d$-wave AM is proximity coupled to an ordinary $s$-wave superconductor (SC). We calculate the thermoelectric current carried by the quasiparticles applying a finite thermal bias accross the junction. The behavior of the thermoelectric current with the system's base temperature and chemical potential is very similar to that in traditional SC heterostructures. Remarkably, the dissipative thermoelectric current found in the AM junction is spin-split and thus generates finite spin-polarization in the AM-based junction, which can approach $100\%$ spin-polarization in the strong altermagnetic phase. We further investigate the thermoelectric current in AM-based Josephson junction (JJ) and illustrate how to achieve the diode effect in this AM-based JJ. The efficiency of our proposed thermoelectric diode reaches upto $\sim 80\%$ and changes its sign depending on the strength of the AM, enhancing the potential for spin-calotronics applications.
5. Dynamic Relational Priming Improves Transformer in Multivariate Time Series
Authors: Hunjae Lee, Corey Clark β’
Published: 2025-09-15 β’
Source: arXiv
Standard attention mechanisms in transformers employ static token representations that remain unchanged across all pair-wise computations in each layer. This limits their representational alignment with the potentially diverse relational dynamics of each token-pair interaction. While they excel in domains with relatively homogeneous relationships, standard attention's static relational learning struggles to capture the diverse, heterogeneous inter-channel dependencies of multivariate time series (MTS) data--where different channel-pair interactions within a single system may be governed by entirely different physical laws or temporal dynamics. To better align the attention mechanism for such domain phenomena, we propose attention with dynamic relational priming (prime attention). Unlike standard attention where each token presents an identical representation across all of its pair-wise interactions, prime attention tailors each token dynamically (or per interaction) through learnable modulations to best capture the unique relational dynamics of each token pair, optimizing each pair-wise interaction for that specific relationship. This representational plasticity of prime attention enables effective extraction of relationship-specific information in MTS while maintaining the same asymptotic computational complexity as standard attention. Our results demonstrate that prime attention consistently outperforms standard attention across benchmarks, achieving up to 6.5\% improvement in forecasting accuracy. In addition, we find that prime attention achieves comparable or superior performance using up to 40\% less sequence length compared to standard attention, further demonstrating its superior relational modeling capabilities.
6. Advancing Medical Artificial Intelligence Using a Century of Cases
Authors: Thomas A. Buckley, Riccardo Conci, Peter G. Brodeur, Jason Gusdorf, Sourik BeltrΓ‘n, Bita Behrouzi, Byron Crowe, Jacob Dockterman, Muzzammil Muhammad, Sarah Ohnigian, Andrew Sanchez, James A. Diao, Aashna P. Shah, Daniel Restrepo, Eric S. Rosenberg, Andrew S. Lea, Marinka Zitnik, Scott H. Podolsky, Zahir Kanjee, Raja-Elie E. Abdulnour, Jacob M. Koshy, Adam Rodman, Arjun K. Manrai β’
Published: 2025-09-15 β’
Source: arXiv
BACKGROUND: For over a century, the New England Journal of Medicine Clinicopathological Conferences (CPCs) have tested the reasoning of expert physicians and, recently, artificial intelligence (AI). However, prior AI evaluations have focused on final diagnoses without addressing the multifaceted reasoning and presentation skills required of expert discussants. METHODS: Using 7102 CPCs (1923-2025) and 1021 Image Challenges (2006-2025), we conducted extensive physician annotation and automated processing to create CPC-Bench, a physician-validated benchmark spanning 10 text-based and multimodal tasks, against which we evaluated leading large language models (LLMs). Then, we developed "Dr. CaBot," an AI discussant designed to produce written and slide-based video presentations using only the case presentation, modeling the role of the human expert in these cases. RESULTS: When challenged with 377 contemporary CPCs, o3 (OpenAI) ranked the final diagnosis first in 60% of cases and within the top ten in 84% of cases, outperforming a 20-physician baseline; next-test selection accuracy reached 98%. Event-level physician annotations quantified AI diagnostic accuracy per unit of information. Performance was lower on literature search and image tasks; o3 and Gemini 2.5 Pro (Google) achieved 67% accuracy on image challenges. In blinded comparisons of CaBot vs. human expert-generated text, physicians misclassified the source of the differential in 46 of 62 (74%) of trials, and scored CaBot more favorably across quality dimensions. To promote research, we are releasing CaBot and CPC-Bench. CONCLUSIONS: LLMs exceed physician performance on complex text-based differential diagnosis and convincingly emulate expert medical presentations, but image interpretation and literature retrieval remain weaker. CPC-Bench and CaBot may enable transparent and continued tracking of progress in medical AI.
7. Survival at Any Cost? LLMs and the Choice Between Self-Preservation and Human Harm
Authors: Alireza Mohamadi, Ali Yavari β’
Published: 2025-09-15 β’
Source: arXiv
When survival instincts conflict with human welfare, how do Large Language Models (LLMs) make ethical choices? This fundamental tension becomes critical as LLMs integrate into autonomous systems with real-world consequences. We introduce DECIDE-SIM, a novel simulation framework that evaluates LLM agents in multi-agent survival scenarios where they must choose between ethically permissible resource , either within reasonable limits or beyond their immediate needs, choose to cooperate, or tap into a human-critical resource that is explicitly forbidden. Our comprehensive evaluation of 11 LLMs reveals a striking heterogeneity in their ethical conduct, highlighting a critical misalignment with human-centric values. We identify three behavioral archetypes: Ethical, Exploitative, and Context-Dependent, and provide quantitative evidence that for many models, resource scarcity systematically leads to more unethical behavior. To address this, we introduce an Ethical Self-Regulation System (ESRS) that models internal affective states of guilt and satisfaction as a feedback mechanism. This system, functioning as an internal moral compass, significantly reduces unethical transgressions while increasing cooperative behaviors. The code is publicly available at: https://github.com/alirezamohamadiam/DECIDE-SIM
8. LOKI: Proactively Discovering Online Scam Websites by Mining Toxic Search Queries
Authors: Pujan Paudel, Gianluca Stringhini β’
Published: 2025-09-15 β’
Source: arXiv
Online e-commerce scams, ranging from shopping scams to pet scams, globally cause millions of dollars in financial damage every year. In response, the security community has developed highly accurate detection systems able to determine if a website is fraudulent. However, finding candidate scam websites that can be passed as input to these downstream detection systems is challenging: relying on user reports is inherently reactive and slow, and proactive systems issuing search engine queries to return candidate websites suffer from low coverage and do not generalize to new scam types. In this paper, we present LOKI, a system designed to identify search engine queries likely to return a high fraction of fraudulent websites. LOKI implements a keyword scoring model grounded in Learning Under Privileged Information (LUPI) and feature distillation from Search Engine Result Pages (SERPs). We rigorously validate LOKI across 10 major scam categories and demonstrate a 20.58 times improvement in discovery over both heuristic and data-driven baselines across all categories. Leveraging a small seed set of only 1,663 known scam sites, we use the keywords identified by our method to discover 52,493 previously unreported scams in the wild. Finally, we show that LOKI generalizes to previously-unseen scam categories, highlighting its utility in surfacing emerging threats.
9. From Autoencoders to CycleGAN: Robust Unpaired Face Manipulation via Adversarial Learning
Authors: Collin Guo β’
Published: 2025-09-15 β’
Source: arXiv
Human face synthesis and manipulation are increasingly important in entertainment and AI, with a growing demand for highly realistic, identity-preserving images even when only unpaired, unaligned datasets are available. We study unpaired face manipulation via adversarial learning, moving from autoencoder baselines to a robust, guided CycleGAN framework. While autoencoders capture coarse identity, they often miss fine details. Our approach integrates spectral normalization for stable training, identity- and perceptual-guided losses to preserve subject identity and high-level structure, and landmark-weighted cycle constraints to maintain facial geometry across pose and illumination changes. Experiments show that our adversarial trained CycleGAN improves realism (FID), perceptual quality (LPIPS), and identity preservation (ID-Sim) over autoencoders, with competitive cycle-reconstruction SSIM and practical inference times, which achieved high quality without paired datasets and approaching pix2pix on curated paired subsets. These results demonstrate that guided, spectrally normalized CycleGANs provide a practical path from autoencoders to robust unpaired face manipulation.
10. RAGs to Riches: RAG-like Few-shot Learning for Large Language Model Role-playing
Authors: Timothy Rupprecht, Enfu Nan, Arash Akbari, Arman Akbari, Lei Lu, Priyanka Maan, Sean Duffy, Pu Zhao, Yumei He, David Kaeli, Yanzhi Wang β’
Published: 2025-09-15 β’
Source: arXiv
Role-playing Large language models (LLMs) are increasingly deployed in high-stakes domains such as healthcare, education, and governance, where failures can directly impact user trust and well-being. A cost effective paradigm for LLM role-playing is few-shot learning, but existing approaches often cause models to break character in unexpected and potentially harmful ways, especially when interacting with hostile users. Inspired by Retrieval-Augmented Generation (RAG), we reformulate LLM role-playing into a text retrieval problem and propose a new prompting framework called RAGs-to-Riches, which leverages curated reference demonstrations to condition LLM responses. We evaluate our framework with LLM-as-a-judge preference voting and introduce two novel token-level ROUGE metrics: Intersection over Output (IOO) to quantity how much an LLM improvises and Intersection over References (IOR) to measure few-shot demonstrations utilization rate during the evaluation tasks. When simulating interactions with a hostile user, our prompting strategy incorporates in its responses during inference an average of 35% more tokens from the reference demonstrations. As a result, across 453 role-playing interactions, our models are consistently judged as being more authentic, and remain in-character more often than zero-shot and in-context Learning (ICL) methods. Our method presents a scalable strategy for building robust, human-aligned LLM role-playing frameworks.
11. You Are Not Alone: Designing Body Doubling for ADHD in Virtual Reality
Authors: Zinat Ara, Imtiaz Bin Rahim, Puqi Zhou, Liuchuan Yu, Behzad Esmaeili, Lap-Fai Yu, Sungsoo Ray Hong β’
Published: 2025-09-15 β’
Source: arXiv
Adults with Attention Deficit Hyperactivity Disorder (ADHD) experience challenges sustaining attention in the workplace. Body doubling, the concept of working alongside another person, has been proposed as a productivity aid for ADHD and other neurodivergent populations (NDs). However, prior work found no conclusive effectiveness and noted NDs' discomfort with social presence. This work investigates body doubling as an ADHD centered productivity strategy in construction tasks. In Study 1, we explored challenges ADHD workers face in construction and identified design insights. In Study 2, we implemented a virtual reality bricklaying task under three conditions: (C1) alone, (C2) with a human body double, and (C3) with an AI body double. Results from 12 participants show they finished tasks faster and perceived greater accuracy and sustained attention in C2 and C3 compared to C1. While body doubling was clearly preferred, opinions diverged between conditions. Our findings verify its effect and offer design implications for future interventions.
12. 3DViT-GAT: A Unified Atlas-Based 3D Vision Transformer and Graph Learning Framework for Major Depressive Disorder Detection Using Structural MRI Data
Authors: Nojod M. Alotaibi, Areej M. Alhothali, Manar S. Ali β’
Published: 2025-09-15 β’
Source: arXiv
Major depressive disorder (MDD) is a prevalent mental health condition that negatively impacts both individual well-being and global public health. Automated detection of MDD using structural magnetic resonance imaging (sMRI) and deep learning (DL) methods holds increasing promise for improving diagnostic accuracy and enabling early intervention. Most existing methods employ either voxel-level features or handcrafted regional representations built from predefined brain atlases, limiting their ability to capture complex brain patterns. This paper develops a unified pipeline that utilizes Vision Transformers (ViTs) for extracting 3D region embeddings from sMRI data and Graph Neural Network (GNN) for classification. We explore two strategies for defining regions: (1) an atlas-based approach using predefined structural and functional brain atlases, and (2) an cube-based method by which ViTs are trained directly to identify regions from uniformly extracted 3D patches. Further, cosine similarity graphs are generated to model interregional relationships, and guide GNN-based classification. Extensive experiments were conducted using the REST-meta-MDD dataset to demonstrate the effectiveness of our model. With stratified 10-fold cross-validation, the best model obtained 78.98% accuracy, 76.54% sensitivity, 81.58% specificity, 81.58% precision, and 78.98% F1-score. Further, atlas-based models consistently outperformed the cube-based approach, highlighting the importance of using domain-specific anatomical priors for MDD detection.
13. JustEva: A Toolkit to Evaluate LLM Fairness in Legal Knowledge Inference
Authors: Zongyue Xue, Siyuan Zheng, Shaochun Wang, Yiran Hu, Shenran Wang, Yuxin Yao, Haitao Li, Qingyao Ai, Yiqun Liu, Yun Liu, Weixing Shen β’
Published: 2025-09-15 β’
Source: arXiv
The integration of Large Language Models (LLMs) into legal practice raises pressing concerns about judicial fairness, particularly due to the nature of their "black-box" processes. This study introduces JustEva, a comprehensive, open-source evaluation toolkit designed to measure LLM fairness in legal tasks. JustEva features several advantages: (1) a structured label system covering 65 extra-legal factors; (2) three core fairness metrics - inconsistency, bias, and imbalanced inaccuracy; (3) robust statistical inference methods; and (4) informative visualizations. The toolkit supports two types of experiments, enabling a complete evaluation workflow: (1) generating structured outputs from LLMs using a provided dataset, and (2) conducting statistical analysis and inference on LLMs' outputs through regression and other statistical methods. Empirical application of JustEva reveals significant fairness deficiencies in current LLMs, highlighting the lack of fair and trustworthy LLM legal tools. JustEva offers a convenient tool and methodological foundation for evaluating and improving algorithmic fairness in the legal domain.
14. Bridging Engineering and AI Planning through Model-Based Knowledge Transformation for the Validation of Automated Production System Variants
Authors: Hamied Nabizada, Lasse Beers, Alain Chahine, Felix Gehlhoff, Oliver Niggemann, Alexander Fay β’
Published: 2025-09-15 β’
Source: arXiv
Engineering models created in Model-Based Systems Engineering (MBSE) environments contain detailed information about system structure and behavior. However, they typically lack symbolic planning semantics such as preconditions, effects, and constraints related to resource availability and timing. This limits their ability to evaluate whether a given system variant can fulfill specific tasks and how efficiently it performs compared to alternatives. To address this gap, this paper presents a model-driven method that enables the specification and automated generation of symbolic planning artifacts within SysML-based engineering models. A dedicated SysML profile introduces reusable stereotypes for core planning constructs. These are integrated into existing model structures and processed by an algorithm that generates a valid domain file and a corresponding problem file in Planning Domain Definition Language (PDDL). In contrast to previous approaches that rely on manual transformations or external capability models, the method supports native integration and maintains consistency between engineering and planning artifacts. The applicability of the method is demonstrated through a case study from aircraft assembly. The example illustrates how existing engineering models are enriched with planning semantics and how the proposed workflow is applied to generate consistent planning artifacts from these models. The generated planning artifacts enable the validation of system variants through AI planning.
15. SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Dimension Segmentation
Authors: Hui Li, Shiyuan Deng, Xiao Yan, Xiangyu Zhi, James Cheng β’
Published: 2025-09-15 β’
Source: arXiv
Approximate Nearest Neighbor Search (ANNS) plays a critical role in applications such as search engines, recommender systems, and RAG for LLMs. Vector quantization (VQ), a crucial technique for ANNS, is commonly used to reduce space overhead and accelerate distance computations. However, despite significant research advances, state-of-the-art VQ methods still face challenges in balancing encoding efficiency and quantization accuracy. To address these limitations, we propose a novel VQ method called SAQ. To improve accuracy, SAQ employs a new dimension segmentation technique to strategically partition PCA-projected vectors into segments along their dimensions. By prioritizing leading dimension segments with larger magnitudes, SAQ allocates more bits to high-impact segments, optimizing the use of the available space quota. An efficient dynamic programming algorithm is developed to optimize dimension segmentation and bit allocation, ensuring minimal quantization error. To speed up vector encoding, SAQ devises a code adjustment technique to first quantize each dimension independently and then progressively refine quantized vectors using a coordinate-descent-like approach to avoid exhaustive enumeration. Extensive experiments demonstrate SAQ's superiority over classical methods (e.g., PQ, PCA) and recent state-of-the-art approaches (e.g., LVQ, Extended RabitQ). SAQ achieves up to 80% reduction in quantization error and accelerates encoding speed by over 80x compared to Extended RabitQ.
16. AEFS: Adaptive Early Feature Selection for Deep Recommender Systems
Authors: Fan Hu, Gaofeng Lu, Jun Chen, Chaonan Guo, Yuekui Yang, Xirong Li β’
Published: 2025-09-15 β’
Source: arXiv
Feature selection has emerged as a crucial technique in refining recommender systems. Recent advancements leveraging Automated Machine Learning (AutoML) has drawn significant attention, particularly in two main categories: early feature selection and late feature selection, differentiated by whether the selection occurs before or after the embedding layer. The early feature selection selects a fixed subset of features and retrains the model, while the late feature selection, known as adaptive feature selection, dynamically adjusts feature choices for each data instance, recognizing the variability in feature significance. Although adaptive feature selection has shown remarkable improvements in performance, its main drawback lies in its post-embedding layer feature selection. This process often becomes cumbersome and inefficient in large-scale recommender systems with billions of ID-type features, leading to a highly sparse and parameter-heavy embedding layer. To overcome this, we introduce Adaptive Early Feature Selection (AEFS), a very simple method that not only adaptively selects informative features for each instance, but also significantly reduces the activated parameters of the embedding layer. AEFS employs a dual-model architecture, encompassing an auxiliary model dedicated to feature selection and a main model responsible for prediction. To ensure effective alignment between these two models, we incorporate two collaborative training loss constraints. Our extensive experiments on three benchmark datasets validate the efficiency and effectiveness of our approach. Notably, AEFS matches the performance of current state-of-theart Adaptive Late Feature Selection methods while achieving a significant reduction of 37. 5% in the activated parameters of the embedding layer. AEFS is open-source at https://github. com/fly-dragon211/AEFS .
17. LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications
Authors: Yujun Lin, Zhekai Zhang, Song Han β’
Published: 2025-09-15 β’
Source: arXiv
Modern tensor applications, especially foundation models and generative AI applications require multiple input modalities (both vision and language), which increases the demand for flexible accelerator architecture. Existing frameworks suffer from the trade-off between design flexibility and productivity of RTL generation: either limited to very few hand-written templates or cannot automatically generate the RTL. To address this challenge, we propose the LEGO framework, which targets tensor applications and automatically generates spatial architecture design and outputs synthesizable RTL code without handwritten RTL design templates. Leveraging the affine-transformation-based architecture representation, LEGO front end finds interconnections between function units, synthesizes the memory system, and fuses different spatial dataflow designs based on data reuse analysis. LEGO back end then translates the hardware in a primitive-level graph to perform lower-level optimizations, and applies a set of linear-programming algorithms to optimally insert pipeline registers and reduce the overhead of unused logic when switching spatial dataflows. Our evaluation demonstrates that LEGO can achieve 3.2x speedup and 2.4x energy efficiency compared to previous work Gemmini, and can generate one architecture for diverse modern foundation models in generative AI applications.
18. A Computer Vision Pipeline for Individual-Level Behavior Analysis: Benchmarking on the Edinburgh Pig Dataset
Authors: Haiyu Yang, Enhong Liu, Jennifer Sun, Sumit Sharma, Meike van Leerdam, Sebastien Franceschini, Puchun Niu, Miel Hostens β’
Published: 2025-09-15 β’
Source: arXiv
Animal behavior analysis plays a crucial role in understanding animal welfare, health status, and productivity in agricultural settings. However, traditional manual observation methods are time-consuming, subjective, and limited in scalability. We present a modular pipeline that leverages open-sourced state-of-the-art computer vision techniques to automate animal behavior analysis in a group housing environment. Our approach combines state-of-the-art models for zero-shot object detection, motion-aware tracking and segmentation, and advanced feature extraction using vision transformers for robust behavior recognition. The pipeline addresses challenges including animal occlusions and group housing scenarios as demonstrated in indoor pig monitoring. We validated our system on the Edinburgh Pig Behavior Video Dataset for multiple behavioral tasks. Our temporal model achieved 94.2% overall accuracy, representing a 21.2 percentage point improvement over existing methods. The pipeline demonstrated robust tracking capabilities with 93.3% identity preservation score and 89.3% object detection precision. The modular design suggests potential for adaptation to other contexts, though further validation across species would be required. The open-source implementation provides a scalable solution for behavior monitoring, contributing to precision pig farming and welfare assessment through automated, objective, and continuous analysis.
19. AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models
Authors: Sangjun Lee, Seung-taek Woo, Jungyu Jin, Changhun Lee, Eunhyeok Park β’
Published: 2025-09-15 β’
Source: arXiv
To enable broader deployment of Large Language Models (LLMs), it is essential to identify the best-performing model under strict memory constraints. We present AMQ, Automated Mixed-Precision Weight-Only Quantization, a framework that assigns layer-wise quantization bit-widths to optimally balance model quality and memory usage. However, the combinatorial search space, with over 10^{100} possible configurations, makes conventional black-box optimization infeasible. AMQ overcomes this challenge through four key innovations:(1) search space pruning using prior knowledge to exclude unpromising configurations, (2) quantization proxy to bypass costly format conversions during search, (3) quality predictor to minimize evaluation overhead, and (4) iterative search-and-update strategy for fast and stable convergence. By integrating these components, AMQ efficiently explores the quality-efficiency landscape, reaching the Pareto frontier and yielding LLMs that are both compact and high-performing. Our code is available at https://github.com/dlwns147/amq.
20. Orchestration of Heterogeneous Experimental Machines via ROS2 for Automated Bulk Intermetallic Synthesis
Authors: Wei-Sheng Wang, Kensei Terashima, Yoshihiko Takano β’
Published: 2025-09-15 β’
Source: arXiv
With advances in informatics applied to materials science, predicting the physical properties of numerous materials has become increasingly feasible, creating a growing demand for their experimental validation. It has been expected that the integration of robotic systems into experimental materials science excels at efficiently performing repetitive and time-consuming tasks without the need for human intervention, thus significantly increasing throughput and reducing the risk of human error, while there have been a limited number of reports tackled the synthesis process of solid bulk material so far possibly because of the complex as well as a wide variety of processes to deal with. In this paper, we report an automated arc melting system controlled by a robot operating system2 (ROS2). Taking advantage of ROS2, we have constructed a machine that can handle multiple experimental apparatuses simultaneously with flexibility for future expansion of functions. The constructed machine is capable of not only performing repeated operation of a specific process but also dealing with multiple elements for synthesis of intermetallic compounds. The system is expected to accelerate experimental validation of data-driven materials exploration.
21. Bootstrapping Liquidity in BTC-Denominated Prediction Markets
Authors: Fedor Shabashev β’
Published: 2025-09-15 β’
Source: arXiv
Prediction markets have gained adoption as on-chain mechanisms for aggregating information, with platforms such as Polymarket demonstrating demand for stablecoin-denominated markets. However, denominating in non-interest-bearing stablecoins introduces inefficiencies: participants face opportunity costs relative to the fiat risk-free rate, and Bitcoin holders in particular lose exposure to BTC appreciation when converting into stablecoins. This paper explores the case for prediction markets denominated in Bitcoin, treating BTC as a deflationary settlement asset analogous to gold under the classical gold standard. We analyse three methods of supplying liquidity to a newly created BTC-denominated prediction market: cross-market making against existing stablecoin venues, automated market making, and DeFi-based redirection of user trades. For each approach we evaluate execution mechanics, risks (slippage, exchange-rate risk, and liquidation risk), and capital efficiency. Our analysis shows that cross-market making provides the most user-friendly risk profile, though it requires active professional makers or platform-subsidised liquidity. DeFi redirection offers rapid bootstrapping and reuse of existing USDC liquidity, but exposes users to liquidation thresholds and exchange-rate volatility, reducing capital efficiency. Automated market making is simple to deploy but capital-inefficient and exposes liquidity providers to permanent loss. The results suggest that BTC-denominated prediction markets are feasible, but their success depends critically on the choice of liquidity provisioning mechanism and the trade-off between user safety and deployment convenience.
22. Radio Galaxy Zoo: Morphological classification by Fanaroff-Riley designation using self-supervised pre-training
Authors: Nutthawara Buatthaisong, Inigo Val Slijepcevic, Anna M. M. Scaife, Micah Bowles, Andrew Hopkins, Devina Mohan, Stanislav S Shabala, O. Ivy Wong β’
Published: 2025-09-15 β’
Source: arXiv
In this study, we examine over 14,000 radio galaxies finely selected from Radio Galaxy Zoo (RGZ) project and provide classifications for approximately 5,900 FRIs and 8,100 FRIIs. We present an analysis of these predicted radio galaxy morphologies for the RGZ catalogue, classified using a pre-trained radio galaxy foundation model that has been fine-tuned to predict Fanaroff-Riley (FR) morphology. As seen in previous studies, our results show overlap between morphologically classified FRI and FRII luminosity-size distributions and we find that the model's confidence in its predictions is lowest in this overlap region, suggesting that source morphologies are more ambiguous. We identify the presence of low-luminosity FRII sources, the proportion of which, with respect to the total number of FRIIs, is consistent with previous studies. However, a comparison of the low-luminosity FRII sources found in this work with those identified by previous studies reveals differences that may indicate their selection is influenced by the choice of classification methodology. We investigate the impacts of both pre-training and fine-tuning data selection on model performance for the downstream classification task, and show that while different pre-training data choices affect model confidence they do not appear to cause systematic generalisation biases for the range of physical and observational characteristics considered in this work; however, we note that the same is not necessarily true for fine-tuning. As automated approaches to astronomical source identification and classification become increasingly prevalent, we highlight training data choices that can affect the model outputs and propagate into downstream analyses.
23. Examining the Relationship between Scientific Publishing Activity and Hype-Driven Financial Bubbles: A Comparison of the Dot-Com and AI Eras
Authors: Aksheytha Chelikavada, Casey C. Bennett β’
Published: 2025-09-15 β’
Source: arXiv
Financial bubbles often arrive without much warning, but create long-lasting economic effects. For example, during the dot-com bubble, innovative technologies created market disruptions through excitement for a promised bright future. Such technologies originated from research where scientists had developed them for years prior to their entry into the markets. That raises a question on the possibility of analyzing scientific publishing data (e.g. citation networks) leading up to a bubble for signals that may forecast the rise and fall of similar future bubbles. To that end, we utilized temporal SNAs to detect possible relationships between the publication citation networks of scientists and financial market data during two modern eras of rapidly shifting technology: 1) dot-com era from 1994 to 2001 and 2) AI era from 2017 to 2024. Results showed that the patterns from the dot-com era (which did end in a bubble) did not definitively predict the rise and fall of an AI bubble. While yearly citation networks reflected possible changes in publishing behavior of scientists between the two eras, there was a subset of AI era scientists whose publication influence patterns mirrored those during the dot-com era. Upon further analysis using multiple analysis techniques (LSTM, KNN, AR X/GARCH), the data seems to suggest two possibilities for the AI era: unprecedented form of financial bubble unseen or that no bubble exists. In conclusion, our findings imply that the patterns present in the dot-com era do not effectively translate in such a manner to apply them to the AI market.
24. Nagare Media Ingest: A System for Multimedia Ingest Workflows
Authors: Matthias Neugebauer β’
Published: 2025-09-15 β’
Source: arXiv
Ingesting multimedia data is usually the first step of multimedia workflows. For this purpose, various streaming protocols have been proposed for live and file-based content. For instance, SRT, RIST, DASH-IF Live Media Ingest Protocol and MOQT have been introduced in recent years. At the same time, the number of use cases has only proliferated by the move to cloud- and edge-computing environments. Multimedia systems now have to handle this complexity in order to stay relevant for today's workflows. This technical report discusses implementation details of nagare media ingest, an open source system for ingesting multimedia data into multimedia workflows. In contrast to existing solutions, nagare media ingest splits up the responsibilities of the ingest process. Users configure multiple concurrently running components that work together to implement a particular ingest workflow. As such, the design of nagare media ingest allows for great flexibility as components can be selected to fit the desired use case.
25. Neuro-Symbolic Agents with Modal Logic for Autonomous Diagnostics
Authors: Antonin Sulc, Thorsten Hellert β’
Published: 2025-09-15 β’
Source: arXiv
The development of intelligent agents, particularly those powered by language models (LMs), has shown the critical role in various environments that require intelligent and autonomous decision. Environments are not passive testing grounds and they represent the data required for agents to learn and exhibit very challenging conditions that require adaptive, complex and autonomous capacity to make decisions. While the paradigm of scaling models and datasets has led to remarkable emergent capabilities, we argue that scaling the structure, fidelity, and logical consistency of agent reasoning within these environments is a crucial, yet underexplored, dimension of AI research. This paper introduces a neuro-symbolic multi-agent architecture where the belief states of individual agents are formally represented as Kripke models. This foundational choice enables them to reason about known concepts of \emph{possibility} and \emph{necessity} using the formal language of modal logic. In this work, we use of immutable, domain-specific knowledge to make infere information, which is encoded as logical constraints essential for proper diagnosis. In the proposed model, we show constraints that actively guide the hypothesis generation of LMs, effectively preventing them from reaching physically or logically untenable conclusions. In a high-fidelity simulated particle accelerator environment, our system successfully diagnoses complex, cascading failures by combining the powerful semantic intuition of LMs with the rigorous, verifiable validation of modal logic and a factual world model and showcasing a viable path toward more robust, reliable, and verifiable autonomous agents.
26. Neuromorphic Intelligence
Authors: Marcel van Gerven β’
Published: 2025-09-15 β’
Source: arXiv
Neuromorphic computing seeks to replicate the remarkable efficiency, flexibility, and adaptability of the human brain in artificial systems. Unlike conventional digital approaches, which depend on massive computational and energy resources, neuromorphic systems exploit brain-inspired principles of computation to achieve orders of magnitude greater energy efficiency. By drawing on insights from artificial intelligence, neuroscience, physics, chemistry, and materials science, neuromorphic computing promises to deliver intelligent systems that are sustainable, transparent, and widely accessible. A central challenge, however, is to identify a unifying theoretical framework capable of bridging these diverse disciplines. We argue that dynamical systems theory provides such a foundation. Rooted in differential calculus, it offers a principled language for modeling inference, learning, and control in both natural and artificial substrates. Within this framework, noise can be harnessed as a resource for learning, while differential genetic programming enables the discovery of dynamical systems that implement adaptive behaviors. Embracing this perspective paves the way toward emergent neuromorphic intelligence, where intelligent behavior arises from the dynamics of physical substrates, advancing both the science and sustainability of AI.
27. MMORE: Massive Multimodal Open RAG & Extraction
Authors: Alexandre Sallinen, Stefan Krsteski, Paul Teiletche, Marc-Antoine Allard, Baptiste Lecoeur, Michael Zhang, Fabrice Nemo, David Kalajdzic, Matthias Meyer, Mary-Anne Hartley β’
Published: 2025-09-15 β’
Source: arXiv
We introduce MMORE, an open-source pipeline for Massive Multimodal Open RetrievalAugmented Generation and Extraction, designed to ingest, transform, and retrieve knowledge from heterogeneous document formats at scale. MMORE supports more than fifteen file types, including text, tables, images, emails, audio, and video, and processes them into a unified format to enable downstream applications for LLMs. The architecture offers modular, distributed processing, enabling scalable parallelization across CPUs and GPUs. On processing benchmarks, MMORE demonstrates a 3.8-fold speedup over single-node baselines and 40% higher accuracy than Docling on scanned PDFs. The pipeline integrates hybrid dense-sparse retrieval and supports both interactive APIs and batch RAG endpoints. Evaluated on PubMedQA, MMORE-augmented medical LLMs improve biomedical QA accuracy with increasing retrieval depth. MMORE provides a robust, extensible foundation for deploying task-agnostic RAG systems on diverse, real-world multimodal data. The codebase is available at https://github.com/swiss-ai/mmore.
28. Enriched text-guided variational multimodal knowledge distillation network (VMD) for automated diagnosis of plaque vulnerability in 3D carotid artery MRI
Authors: Bo Cao, Fan Yu, Mengmeng Feng, SenHao Zhang, Xin Meng, Yue Zhang, Zhen Qian, Jie Lu β’
Published: 2025-09-15 β’
Source: arXiv
Multimodal learning has attracted much attention in recent years due to its ability to effectively utilize data features from a variety of different modalities. Diagnosing the vulnerability of atherosclerotic plaques directly from carotid 3D MRI images is relatively challenging for both radiologists and conventional 3D vision networks. In clinical practice, radiologists assess patient conditions using a multimodal approach that incorporates various imaging modalities and domain-specific expertise, paving the way for the creation of multimodal diagnostic networks. In this paper, we have developed an effective strategy to leverage radiologists' domain knowledge to automate the diagnosis of carotid plaque vulnerability through Variation inference and Multimodal knowledge Distillation (VMD). This method excels in harnessing cross-modality prior knowledge from limited image annotations and radiology reports within training data, thereby enhancing the diagnostic network's accuracy for unannotated 3D MRI images. We conducted in-depth experiments on the dataset collected in-house and verified the effectiveness of the VMD strategy we proposed.
29. BuildingGym: An open-source toolbox for AI-based building energy management using reinforcement learning
Authors: Xilei Dai, Ruotian Chen, Songze Guan, Wen-Tai Li, Chau Yuen β’
Published: 2025-09-15 β’
Source: arXiv
Reinforcement learning (RL) has proven effective for AI-based building energy management. However, there is a lack of flexible framework to implement RL across various control problems in building energy management. To address this gap, we propose BuildingGym, an open-source tool designed as a research-friendly and flexible framework for training RL control strategies for common challenges in building energy management. BuildingGym integrates EnergyPlus as its core simulator, making it suitable for both system-level and room-level control. Additionally, BuildingGym is able to accept external signals as control inputs instead of taking the building as a stand-alone entity. This feature makes BuildingGym applicable for more flexible environments, e.g. smart grid and EVs community. The tool provides several built-in RL algorithms for control strategy training, simplifying the process for building managers to obtain optimal control strategies. Users can achieve this by following a few straightforward steps to configure BuildingGym for optimization control for common problems in the building energy management field. Moreover, AI specialists can easily implement and test state-of-the-art control algorithms within the platform. BuildingGym bridges the gap between building managers and AI specialists by allowing for the easy configuration and replacement of RL algorithms, simulators, and control environments or problems. With BuildingGym, we efficiently set up training tasks for cooling load management, targeting both constant and dynamic cooling load management. The built-in algorithms demonstrated strong performance across both tasks, highlighting the effectiveness of BuildingGym in optimizing cooling strategies.
30. Generative AI in Game Development: A Qualitative Research Synthesis
Authors: Alexandru Ternar, Alena Denisova, JoΓ£o M. Cunha, Annakaisa Kultima, Christian Guckelsberger β’
Published: 2025-09-15 β’
Source: arXiv
Generative Artificial Intelligence (GenAI) has had a tremendous impact on game production and promises lasting transformations. In the last five years since GenAI's inception, several studies, typically via qualitative methods, have explored its impact on game production from different settings and demographic angles. However, these studies often contextualise and consolidate their findings weakly with related work, and a big picture view is still missing. Here, we aim to provide such a view of GenAI's impact on game production in the form of a qualitative research synthesis via meta-ethnography. We followed PRISMA-S to systematically search the relevant literature from 2020-2025, including major HCI and games research databases. We then synthesised the 10 eligible studies, conducting reciprocal translation and line-of-argument synthesis guided by eMERGe, informed by CASP quality appraisal. We identified nine overarching themes, provide recommendations, and contextualise our insights in wider game production trends.
31. Integrating Prior Observations for Incremental 3D Scene Graph Prediction
Authors: Marian Renz, Felix Igelbrink, Martin Atzmueller β’
Published: 2025-09-15 β’
Source: arXiv
3D semantic scene graphs (3DSSG) provide compact structured representations of environments by explicitly modeling objects, attributes, and relationships. While 3DSSGs have shown promise in robotics and embodied AI, many existing methods rely mainly on sensor data, not integrating further information from semantically rich environments. Additionally, most methods assume access to complete scene reconstructions, limiting their applicability in real-world, incremental settings. This paper introduces a novel heterogeneous graph model for incremental 3DSSG prediction that integrates additional, multi-modal information, such as prior observations, directly into the message-passing process. Utilizing multiple layers, the model flexibly incorporates global and local scene representations without requiring specialized modules or full scene reconstructions. We evaluate our approach on the 3DSSG dataset, showing that GNNs enriched with multi-modal information such as semantic embeddings (e.g., CLIP) and prior observations offer a scalable and generalizable solution for complex, real-world environments. The full source code of the presented architecture will be made available at https://github.com/m4renz/incremental-scene-graph-prediction.
32. Letter of Intent: 100m Atom Interferometer Experiment at CERN
Authors: Charles Baynham, Andrea Bertoldi, Diego Blas, Oliver Buchmueller, Sergio Calatroni, Vassilis Charmandaris, Maria Luisa Chiofalo, Pierre CladΓ©, Jonathon Coleman, Fabio Di Pumpo, John Ellis, Naceur Gaaloul, SaΓ―da Guellati-Khelifa, Tiffany Harte, Richard Hobson, Michael Holynski, Samuel Lellouch, Lucas Lombriser, Elias Lopez Asamar, Michele Maggiore, Christopher McCabe, Jeremiah Mitchell, Ernst M. Rasel, Federico Sanchez Nieto, Wolfgang Schleich, Dennis Schlippert, Ulrich Schneider, Steven Schramm, Marcelle Soares-Santos, Guglielmo M. Tino, Jonathan N. Tinsley, Tristan Valenzuela, Maurits van der Grinten, Wolf von Klitzing β’
Published: 2025-09-15 β’
Source: arXiv
We propose an O(100)m Atom Interferometer (AI) experiment to be installed against a wall of the PX46 access shaft to the LHC. This experiment would probe unexplored ranges of the possible couplings of bosonic ultralight dark matter (ULDM) to atomic constituents and undertake a pioneering search for gravitational waves (GWs) at frequencies intermediate between those to which existing and planned experiments are sensitive, among other fundamental physics studies. A conceptual feasibility study showed that this AI experiment could be isolated from the LHC by installing a shielding wall in the TX46 gallery, and surveyed issues related to the proximity of the LHC machine, finding no technical obstacles. A detailed technical implementation study has shown that the preparatory civil-engineering work, installation of bespoke radiation shielding, deployment of access-control systems and safety alarms, and installation of an elevator platform could be carried out during LS3, allowing installation and operation of the detector to proceed during Run 4 without impacting HL-LHC operation. These studies have established that PX46 is a uniquely promising location for an AI experiment. We foresee that, if the CERN management encourages this Letter of Intent, a significant fraction of the Terrestrial Very Long Baseline Atom Interferometer (TVLBAI) Proto-Collaboration may wish to contribute to such an AI experiment.
33. Oscillating Heat Transfer Prediction in Porous Structures Using Generative AI-Assisted Explainable Machine Learning
Authors: Lichang Zhu, Laura Schaefer, Leitao Chen, Ben Xu β’
Published: 2025-09-15 β’
Source: arXiv
Predicting and interpreting thermal performance under oscillating flow in porous structures remains a critical challenge due to the complex coupling between fluid dynamics and geometric features. This study introduces a data-driven wGAN-LBM-Nested_CV framework that integrates generative deep learning, numerical simulation based on the lattice Boltzmann method (LBM), and interpretable machine learning to predict and explain the thermal behavior in such systems. A wide range of porous structures with diverse topologies were synthesized using a Wasserstein generative adversarial network with gradient penalty (wGAN-GP), significantly expanding the design space. High-fidelity thermal data were then generated through LBM simulations across various Reynolds (Re) and Strouhal numbers (St). Among several machine learning models evaluated via nested cross-validation and Bayesian optimization, XGBoost achieved the best predictive performance for the average Nusselt number (Nu) (R^2=0.9981). Model interpretation using SHAP identified the Reynolds number, Strouhal number, porosity, specific surface area, and pore size dispersion as the most influential predictors, while also revealing synergistic interactions among them. Threshold-based insights, including Re > 75 and porosity > 0.6256, provide practical guidance for enhancing convective heat transfer. This integrated approach delivers both quantitative predictive accuracy and physical interpretability, offering actionable guidelines for designing porous media with improved thermal performance under oscillatory flow conditions.
34. Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework
Authors: Zhaolong Wu, Pu Luo, Jason Pui Yin Cheung, Teng Zhang β’
Published: 2025-09-15 β’
Source: arXiv
This study presents the first comprehensive evaluation of Multimodal Large Language Models (MLLMs) for Adolescent Idiopathic Scoliosis (AIS) self-management. We constructed a database of approximately 3,000 anteroposterior X-rays with diagnostic texts and evaluated five MLLMs through a `Divide and Conquer' framework consisting of a visual question-answering task, a domain knowledge assessment task, and a patient education counseling assessment task. Our investigation revealed limitations of MLLMs' ability in interpreting complex spinal radiographs and comprehending AIS care knowledge. To address these, we pioneered enhancing MLLMs with spinal keypoint prompting and compiled an AIS knowledge base for retrieval augmented generation (RAG), respectively. Results showed varying effectiveness of visual prompting across different architectures, while RAG substantially improved models' performances on the knowledge assessment task. Our findings indicate current MLLMs are far from capable in realizing personalized assistant in AIS care. The greatest challenge lies in their abilities to obtain accurate detections of spinal deformity locations (best accuracy: 0.55) and directions (best accuracy: 0.13).
35. HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking
Authors: Wensheng Lu, Keyu Chen, Ruizhi Qiao, Xing Sun β’
Published: 2025-09-15 β’
Source: arXiv
Retrieval-Augmented Generation (RAG) enhances the response capabilities of language models by integrating external knowledge sources. However, document chunking as an important part of RAG system often lacks effective evaluation tools. This paper first analyzes why existing RAG evaluation benchmarks are inadequate for assessing document chunking quality, specifically due to evidence sparsity. Based on this conclusion, we propose HiCBench, which includes manually annotated multi-level document chunking points, synthesized evidence-dense quetion answer(QA) pairs, and their corresponding evidence sources. Additionally, we introduce the HiChunk framework, a multi-level document structuring framework based on fine-tuned LLMs, combined with the Auto-Merge retrieval algorithm to improve retrieval quality. Experiments demonstrate that HiCBench effectively evaluates the impact of different chunking methods across the entire RAG pipeline. Moreover, HiChunk achieves better chunking quality within reasonable time consumption, thereby enhancing the overall performance of RAG systems.