πŸ€– AI Research Papers

August 06, 2025

πŸ€– AI-Generated Research Summary

Comprehensive Summary of 20 Recent AI, LLM, Agent, and Workflow Research Papers


1. Key Research Trends

a. Privacy, Explainability, and Trust in AI - Growing focus on privacy-preserving methods for explainable AI (e.g., private counterfactual retrieval). - Emphasis on transparency and interpretability in black-box models, especially in high-stakes domains.

b. Representation Learning and Model Alignment - Investigations into cross-model semantic alignment and transferability of internal representations. - New approaches to prompt optimization and embedding-based adaptation for LLMs.

c. Generative Models and Diffusion Techniques - Rapid advances in diffusion models for image restoration, anomaly detection, and likelihood estimation. - Expansion of generative modeling to 4D world modeling and panoramic data synthesis for robotics and autonomous systems.

d. Integration of LLMs with Multimodal and Real-World Systems - LLMs are being combined with vision models for AR guidance and self-questioning frameworks for self-improvement. - Use of LLMs in automated code and artifact generation with a focus on intent preservation and diversity.

e. Domain-Specific AI Applications - Application of AI to medical imaging, climate-robust agriculture, solar forecasting, food recommendation, and autonomous mobility. - Creation of specialized datasets (e.g., SlideAudit) and taxonomies for evaluation and benchmarking.


2. Breakthrough Findings


3. Methodological Approaches


4. Applications and Use Cases


5. Future Directions


Conclusion

This collection of papers highlights a vibrant and rapidly evolving AI research landscape, with strong trends toward privacy, explainability, generative modeling, and real-world integration. Breakthroughs in self-improving LLMs, diffusion models, and multimodal systems are paving the way for more robust, interpretable, and impactful AI applications across diverse domains. The methodological innovations and practical use cases presented here point to a future where AI systems are not only more capable and autonomous but also more aligned with human values, needs, and societal challenges.

πŸ“š arXiv (20 papers)
1. LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
Authors: Ao Liang, Youquan Liu, Yu Yang, Dongyue Lu, Linfeng Li, Lingdong Kong, Huaici Zhao, Wei Tsang Ooi β€’ Published: 2025-08-05 β€’ Source: arXiv
Generative world models have become essential data engines for autonomous driving, yet most existing efforts focus on videos or occupancy grids, overlooking the unique LiDAR properties. Extending LiDAR generation to dynamic 4D world modeling presents challenges in controllability, temporal coherence, and evaluation standardization. To this end, we present LiDARCrafter, a unified framework for 4D LiDAR generation and editing. Given free-form natural language inputs, we parse instructions into ego-centric scene graphs, which condition a tri-branch diffusion network to generate object structures, motion trajectories, and geometry. These structured conditions enable diverse and fine-grained scene editing. Additionally, an autoregressive module generates temporally coherent 4D LiDAR sequences with smooth transitions. To support standardized evaluation, we establish a comprehensive benchmark with diverse metrics spanning scene-, object-, and sequence-level aspects. Experiments on the nuScenes dataset using this benchmark demonstrate that LiDARCrafter achieves state-of-the-art performance in fidelity, controllability, and temporal consistency across all levels, paving the way for data augmentation and simulation. The code and benchmark are released to the community.
2. Veila: Panoramic LiDAR Generation from a Monocular RGB Image
Authors: Youquan Liu, Lingdong Kong, Weidong Yang, Ao Liang, Jianxiong Gao, Yang Wu, Xiang Xu, Xin Li, Linfeng Li, Runnan Chen, Ben Fei β€’ Published: 2025-08-05 β€’ Source: arXiv
Realistic and controllable panoramic LiDAR data generation is critical for scalable 3D perception in autonomous driving and robotics. Existing methods either perform unconditional generation with poor controllability or adopt text-guided synthesis, which lacks fine-grained spatial control. Leveraging a monocular RGB image as a spatial control signal offers a scalable and low-cost alternative, which remains an open problem. However, it faces three core challenges: (i) semantic and depth cues from RGB are vary spatially, complicating reliable conditioning generation; (ii) modality gaps between RGB appearance and LiDAR geometry amplify alignment errors under noisy diffusion; and (iii) maintaining structural coherence between monocular RGB and panoramic LiDAR is challenging, particularly in non-overlap regions between images and LiDAR. To address these challenges, we propose Veila, a novel conditional diffusion framework that integrates: a Confidence-Aware Conditioning Mechanism (CACM) that strengthens RGB conditioning by adaptively balancing semantic and depth cues according to their local reliability; a Geometric Cross-Modal Alignment (GCMA) for robust RGB-LiDAR alignment under noisy diffusion; and a Panoramic Feature Coherence (PFC) for enforcing global structural consistency across monocular RGB and panoramic LiDAR. Additionally, we introduce two metrics, Cross-Modal Semantic Consistency and Cross-Modal Depth Consistency, to evaluate alignment quality across modalities. Experiments on nuScenes, SemanticKITTI, and our proposed KITTI-Weather benchmark demonstrate that Veila achieves state-of-the-art generation fidelity and cross-modal consistency, while enabling generative data augmentation that improves downstream LiDAR semantic segmentation.
3. Self-Questioning Language Models
Authors: Lili Chen, Mihir Prabhudesai, Katerina Fragkiadaki, Hao Liu, Deepak Pathak β€’ Published: 2025-08-05 β€’ Source: arXiv
Can large language models improve without external data -- by generating their own questions and answers? We hypothesize that a pre-trained language model can improve its reasoning skills given only a single prompt specifying the topic (e.g., algebra word problems) and asking the model to generate its own questions. To do this, we propose Self-Questioning Language Models (SQLM): an asymmetric self-play framework where a proposer is given the topic and generates a question for a solver, who tries to answer it. Both the proposer and solver are trained via reinforcement learning. The proposer receives a reward if the problem is not too easy or too difficult, and the solver receives a reward based on majority voting, a proxy for correctness in the absence of ground-truth answers. For coding, the proposer can instead generate unit tests which are used for verification. We study this asymmetric self-play framework on three benchmarks: three-digit multiplication, algebra problems from the OMEGA benchmark, and programming problems from Codeforces. By continually generating more interesting problems and attempting to solve them, language models can improve on downstream benchmarks without access to any curated training datasets.
4. What If, But Privately: Private Counterfactual Retrieval
Authors: Shreya Meel, Mohamed Nomeir, Pasan Dissanayake, Sanghamitra Dutta, Sennur Ulukus β€’ Published: 2025-08-05 β€’ Source: arXiv
Transparency and explainability are two important aspects to be considered when employing black-box machine learning models in high-stake applications. Providing counterfactual explanations is one way of catering this requirement. However, this also poses a threat to the privacy of the institution that is providing the explanation, as well as the user who is requesting it. In this work, we are primarily concerned with the user's privacy who wants to retrieve a counterfactual instance, without revealing their feature vector to the institution. Our framework retrieves the exact nearest neighbor counterfactual explanation from a database of accepted points while achieving perfect, information-theoretic, privacy for the user. First, we introduce the problem of private counterfactual retrieval (PCR) and propose a baseline PCR scheme that keeps the user's feature vector information-theoretically private from the institution. Building on this, we propose two other schemes that reduce the amount of information leaked about the institution database to the user, compared to the baseline scheme. Second, we relax the assumption of mutability of all features, and consider the setting of immutable PCR (I-PCR). Here, the user retrieves the nearest counterfactual without altering a private subset of their features, which constitutes the immutable set, while keeping their feature vector and immutable set private from the institution. For this, we propose two schemes that preserve the user's privacy information-theoretically, but ensure varying degrees of database privacy. Third, we extend our PCR and I-PCR schemes to incorporate user's preference on transforming their attributes, so that a more actionable explanation can be received. Finally, we present numerical results to support our theoretical findings, and compare the database leakage of the proposed schemes.
5. Personalized Recommendation of Dish and Restaurant Collections on iFood
Authors: Fernando F. Granado, Davi A. Bezerra, Iuri Queiroz, Nathan Oliveira, Pedro Fernandes, Bruno Schock β€’ Published: 2025-08-05 β€’ Source: arXiv
Food delivery platforms face the challenge of helping users navigate vast catalogs of restaurants and dishes to find meals they truly enjoy. This paper presents RED, an automated recommendation system designed for iFood, Latin America's largest on-demand food delivery platform, to personalize the selection of curated food collections displayed to millions of users. Our approach employs a LightGBM classifier that scores collections based on three feature groups: collection characteristics, user-collection similarity, and contextual information. To address the cold-start problem of recommending newly created collections, we develop content-based representations using item embeddings and implement monotonicity constraints to improve generalization. We tackle data scarcity by bootstrapping from category carousel interactions and address visibility bias through unbiased sampling of impressions and purchases in production. The system demonstrates significant real-world impact through extensive A/B testing with 5-10% of iFood's user base. Online results of our A/B tests add up to 97% improvement in Card Conversion Rate and 1.4% increase in overall App Conversion Rate compared to popularity-based baselines. Notably, our offline accuracy metrics strongly correlate with online performance, enabling reliable impact prediction before deployment. To our knowledge, this is the first work to detail large-scale recommendation of curated food collections in a dynamic commercial environment.
6. Rigidity for graph product von Neumann algebras
Authors: Camille Horbez, Adrian Ioana β€’ Published: 2025-08-05 β€’ Source: arXiv
We establish rigidity theorems for graph product von Neumann algebras $M_\Gamma=*_{v,\Gamma}M_v$ associated to finite simple graphs $\Gamma$ and families of tracial von Neumann algebras $(M_v)_{v\in\Gamma}$. We consider the following three broad classes of vertex algebras: diffuse, diffuse amenable, and II$_1$ factors. In each of these three regimes, we exhibit a large class of graphs $\Gamma,\Lambda$ for which the following holds: any isomorphism $\theta$ between $M_\Gamma$ and $N_\Lambda$ ensures the existence of a graph isomorphism $\alpha:\Gamma\to\Lambda$, and tight relations between $\theta(M_v)$ and $N_{\alpha(v)}$ for every vertex $v\in\Gamma$, ranging from strong intertwining in both directions (in the sense of Popa), to unitary conjugacy in some cases. Our results lead to a wide range of applications to the classification of graph product von Neumann algebras and the calculation of their symmetry groups. First, we obtain general classification theorems for von Neumann algebras of right-angled Artin groups and of graph products of ICC groups. We also provide a new family of II$_1$ factors with trivial fundamental group, including all graph products of II$_1$ factors over graphs with girth at least $5$ and no vertices of degree $0$ or $1$. Finally, we compute the outer automorphism group of certain graph products of II$_1$ factors.
7. Cross-Model Semantics in Representation Learning
Authors: Saleh Nikooroo, Thomas Engel β€’ Published: 2025-08-05 β€’ Source: arXiv
The internal representations learned by deep networks are often sensitive to architecture-specific choices, raising questions about the stability, alignment, and transferability of learned structure across models. In this paper, we investigate how structural constraints--such as linear shaping operators and corrective paths--affect the compatibility of internal representations across different architectures. Building on the insights from prior studies on structured transformations and convergence, we develop a framework for measuring and analyzing representational alignment across networks with distinct but related architectural priors. Through a combination of theoretical insights, empirical probes, and controlled transfer experiments, we demonstrate that structural regularities induce representational geometry that is more stable under architectural variation. This suggests that certain forms of inductive bias not only support generalization within a model, but also improve the interoperability of learned features across models. We conclude with a discussion on the implications of representational transferability for model distillation, modular learning, and the principled design of robust learning systems.
8. Intent Preserving Generation of Diverse and Idiomatic (Code-)Artifacts
Authors: Oliver Westphal β€’ Published: 2025-08-05 β€’ Source: arXiv
When automatically generating programming exercise tasks one often also needs to automatically generate programs. At the very least when providing sample solutions is part of automated feedback. But programs can also be used as part of the exercise task description to communicate a task's requirements. Writing good program generators that produce varied yet idiomatic code while being easily adaptable for new tasks is challenging. The challenges are intensified if task generation requires additional artifacts, like a more general behavior specification for testing or additional textual descriptions. Manually writing generators for multiple different but strongly related artifacts gets complicated quickly. We present an approach where instead of writing monolithic generators for multiple connected artifacts one specifies a small set of abstract building blocks and for each such building block defines sets of concrete realizations for various kinds of artifacts. Then the intended structure of the resulting artifacts is specified as a composition of the small abstract building blocks. This abstract description then serves as the common source from which related artifacts can be derived automatically. The approach is generic in the kind of artifacts it can produce and is therefore adaptable to a wide range of contexts.
9. Likelihood Matching for Diffusion Models
Authors: Lei Qian, Wu Su, Yanqi Huang, Song Xi Chen β€’ Published: 2025-08-05 β€’ Source: arXiv
We propose a Likelihood Matching approach for training diffusion models by first establishing an equivalence between the likelihood of the target data distribution and a likelihood along the sample path of the reverse diffusion. To efficiently compute the reverse sample likelihood, a quasi-likelihood is considered to approximate each reverse transition density by a Gaussian distribution with matched conditional mean and covariance, respectively. The score and Hessian functions for the diffusion generation are estimated by maximizing the quasi-likelihood, ensuring a consistent matching of both the first two transitional moments between every two time points. A stochastic sampler is introduced to facilitate computation that leverages on both the estimated score and Hessian information. We establish consistency of the quasi-maximum likelihood estimation, and provide non-asymptotic convergence guarantees for the proposed sampler, quantifying the rates of the approximation errors due to the score and Hessian estimation, dimensionality, and the number of diffusion steps. Empirical and simulation evaluations demonstrate the effectiveness of the proposed Likelihood Matching and validate the theoretical results.
10. SlideAudit: A Dataset and Taxonomy for Automated Evaluation of Presentation Slides
Authors: Zhuohao Jerry Zhang, Ruiqi Chen, Mingyuan Zhong, Jacob O. Wobbrock β€’ Published: 2025-08-05 β€’ Source: arXiv
Automated evaluation of specific graphic designs like presentation slides is an open problem. We present SlideAudit, a dataset for automated slide evaluation. We collaborated with design experts to develop a thorough taxonomy of slide design flaws. Our dataset comprises 2400 slides collected and synthesized from multiple sources, including a subset intentionally modified with specific design problems. We then fully annotated them using our taxonomy through strictly trained crowdsourcing from Prolific. To evaluate whether AI is capable of identifying design flaws, we compared multiple large language models under different prompting strategies, and with an existing design critique pipeline. We show that AI models struggle to accurately identify slide design flaws, with F1 scores ranging from 0.331 to 0.655. Notably, prompting techniques leveraging our taxonomy achieved the highest performance. We further conducted a remediation study to assess AI's potential for improving slides. Among 82.0% of slides that showed significant improvement, 87.8% of them were improved more with our taxonomy, further demonstrating its utility.
11. Towards a classification of topological defects in $K3$ sigma models
Authors: Roberta Angius, Stefano Giaccari β€’ Published: 2025-08-05 β€’ Source: arXiv
Given a $K3$ surface, a supersymmetric non-linear K3 sigma model is the internal superconformal field theory (SCFT) in a six dimensional compactification of type IIA superstring on $\mathbb{R}^{1,5} \times K3$. These models have attracted attention due to the discovery of Mathieu moonshine phenomena for the elliptic genera of K3 surfaces, and have played a pivotal role in extending Mukai's theorem on classification of symplectic automorphisms of $K3$ surfaces. We report on recent progress (arXiv:2402.08719 [hep-th]) in characterizing topological defects in $K3$ models, generalizing the notion of symmetries to categories of topological operators supported on arbitrary codimension submanifolds with possibly non-invertible fusion rules. Taking advantage of the interpretation of Mukai lattice as the D-brane charge lattice, we present a number of general results for the category of topological defect lines preserving the superconformal algebra and spectral flow, obtained by studying their fusion with boundary states. While for certain K3 models infinitely many simple defects, and even a continuum, can occur, at generic points in the moduli space the category is actually trivial, i.e. it is generated by the identity defect. Furthermore, if a K3 model is at the attractor point for some BPS configuration of D-branes, then all topological defects have integral quantum dimension. We also introduce a conjecture that a continuum of topological defects arises if and only if the K3 model is a (possibly generalized) orbifold of a torus model. These general results are confirmed by the analysis of significant examples. We also point out the connection to recent studies of topological defects in the Conway moonshine module theory (arXiv:2412.21141 [hep-th],arXiv:2504.18619 [hep-th]).
12. Suppressing secondary shock waves in jam-absorption driving via string-stable support vehicles
Authors: Atsushi Suzuki, Akihiro Tokumitsu, Ryosuke Nishi β€’ Published: 2025-08-05 β€’ Source: arXiv
As a freeway-driving strategy, jam-absorption driving (JAD) clears a shock wave (i.e., an upstream-moving congestion wave) by modulating the velocity of a single vehicle (called the absorbing vehicle) upstream of the wave, and improves traffic performances such as fuel consumption and collision risk. However, the deceleration of the absorbing vehicle causes trailing vehicles to slow down, generating secondary shock waves that undermine these benefits. This study proposes a method to suppress secondary shock waves by controlling the behavior of connected and automated vehicles (CAVs) upstream of the absorbing vehicle, which are called support vehicles (SVs). A string-stability-based control method is applied in which SVs dynamically extend their time gaps to provide support driving (SD) for JAD. Numerical simulations using a traffic stream composed of 2000 vehicles, including five SVs, revealed that SD damped perturbations caused by the absorbing vehicle and prevented secondary shock waves, consistent with the head-to-tail string stability criterion. The combination of JAD and SD reduced total fuel consumption by 109 kg and collision risk (inverse time-to-collision) by 1940 compared with the JAD-only method, but increased total travel time by 49.3 hours. Reverting the extended time gap to its initial value reduced total travel time by 27.1 hours while maintaining low collision risk compared with the non-reverting method, albeit with increased total fuel consumption by 113 kg. SD alone could not eliminate the target shock wave. The results indicate that combining JAD and SD effectively eliminates the target shock wave while suppressing the secondary shock waves with guaranteed string stability even with only six CAVs (an absorbing vehicle and five SVs) out of 2000 vehicles, which is consistent with low CAV penetration rates anticipated in early implementation stages.
13. Testing Gauss-Bonnet Gravity with DESI BAO Data
Authors: Praveen Kumar Dhankar, Dalale Mhamdi, Albert Munyeshyaka, Darshan Kumar, Joseph Ntahompagaze, Taoufik Oualib β€’ Published: 2025-08-05 β€’ Source: arXiv
In the present paper, we observationally constrain f (G) gravity at the background level using Type Ia supernovae from the Pantheon Plus (PP) sample, cosmic chronometer (CC) data, and the recent Baryon Acoustic Oscillation (BAO) measurements released by DESI. For the analysis, we consider two combinations of datasets: (i) PP + CC, and (ii) PP + CC + DESI BAO. In both cases, we determine the best-fit parameters by numerically solving the modified Friedmann equations for two distinct f (G) models, namely the power-law and exponential forms. This is achieved through Markov Chain Monte Carlo (MCMC) simulations. To assess the statistical significance of the f (G) models, we employ both the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Our results show that both f (G) models are statistically favored over the standard {\Lambda}CDM model. Notably, the exponential model exhibits an additional future transition at redshift closer to -0.1, indicating a possible return to a decelerating phase. This distinctive behavior sets it apart from both the power-law model and the {\Lambda}CDM scenario, which predict continued acceleration into the future.
14. CADD: Context aware disease deviations via restoration of brain images using normative conditional diffusion models
Authors: Ana Lawry Aguila, Ayodeji Ijishakin, Juan Eugenio Iglesias, Tomomi Takenaga, Yukihiro Nomura, Takeharu Yoshikawa, Osamu Abe, Shouhei Hanaoka β€’ Published: 2025-08-05 β€’ Source: arXiv
Applying machine learning to real-world medical data, e.g. from hospital archives, has the potential to revolutionize disease detection in brain images. However, detecting pathology in such heterogeneous cohorts is a difficult challenge. Normative modeling, a form of unsupervised anomaly detection, offers a promising approach to studying such cohorts where the ``normal'' behavior is modeled and can be used at subject level to detect deviations relating to disease pathology. Diffusion models have emerged as powerful tools for anomaly detection due to their ability to capture complex data distributions and generate high-quality images. Their performance relies on image restoration; differences between the original and restored images highlight potential abnormalities. However, unlike normative models, these diffusion model approaches do not incorporate clinical information which provides important context to guide the disease detection process. Furthermore, standard approaches often poorly restore healthy regions, resulting in poor reconstructions and suboptimal detection performance. We present CADD, the first conditional diffusion model for normative modeling in 3D images. To guide the healthy restoration process, we propose a novel inference inpainting strategy which balances anomaly removal with retention of subject-specific features. Evaluated on three challenging datasets, including clinical scans, which may have lower contrast, thicker slices, and motion artifacts, CADD achieves state-of-the-art performance in detecting neurological abnormalities in heterogeneous cohorts.
15. SolarSeer: Ultrafast and accurate 24-hour solar irradiance forecasts outperforming numerical weather prediction across the USA
Authors: Mingliang Bai, Zuliang Fang, Shengyu Tao, Siqi Xiang, Jiang Bian, Yanfei Xiang, Pengcheng Zhao, Weixin Jin, Jonathan A. Weyn, Haiyu Dong, Bin Zhang, Hongyu Sun, Kit Thambiratnam, Qi Zhang, Hongbin Sun, Xuan Zhang, Qiuwei Wu β€’ Published: 2025-08-05 β€’ Source: arXiv
Accurate 24-hour solar irradiance forecasting is essential for the safe and economic operation of solar photovoltaic systems. Traditional numerical weather prediction (NWP) models represent the state-of-the-art in forecasting performance but rely on computationally costly data assimilation and solving complicated partial differential equations (PDEs) that simulate atmospheric physics. Here, we introduce SolarSeer, an end-to-end large artificial intelligence (AI) model for solar irradiance forecasting across the Contiguous United States (CONUS). SolarSeer is designed to directly map the historical satellite observations to future forecasts, eliminating the computational overhead of data assimilation and PDEs solving. This efficiency allows SolarSeer to operate over 1,500 times faster than traditional NWP, generating 24-hour cloud cover and solar irradiance forecasts for the CONUS at 5-kilometer resolution in under 3 seconds. Compared with the state-of-the-art NWP in the CONUS, i.e., High-Resolution Rapid Refresh (HRRR), SolarSeer significantly reduces the root mean squared error of solar irradiance forecasting by 27.28% in reanalysis data and 15.35% across 1,800 stations. SolarSeer also effectively captures solar irradiance fluctuations and significantly enhances the first-order irradiance difference forecasting accuracy. SolarSeer's ultrafast, accurate 24-hour solar irradiance forecasts provide strong support for the transition to sustainable, net-zero energy systems.
16. VITA: Variational Pretraining of Transformers for Climate-Robust Crop Yield Forecasting
Authors: Adib Hasan, Mardavij Roozbehani, Munther Dahleh β€’ Published: 2025-08-05 β€’ Source: arXiv
Accurate crop yield forecasting is essential for global food security. However, current AI models systematically underperform when yields deviate from historical trends. This issue arises from key data challenges, including a major asymmetry between rich pretraining weather datasets and the limited data available for fine-tuning. We introduce VITA (Variational Inference Transformer for Asymmetric data), a variational pretraining framework that addresses this asymmetry. Instead of relying on input reconstruction, VITA uses detailed weather variables as proxy targets during pretraining and learns to predict rich atmospheric states through self-supervised feature masking. This allows the model to be fine-tuned using only basic weather statistics during deployment. Applied to 763 counties in the U.S. Corn Belt, VITA achieves state-of-the-art performance in predicting corn and soybean yields across all evaluation scenarios. While it consistently delivers superior performance under normal conditions, its advantages are particularly pronounced during extreme weather years, with statistically significant improvements (paired t-test, $p \approx 0.01$). Importantly, VITA outperforms prior frameworks like GNN-RNN using less data, making it more practical for real-world use--particularly in data-scarce regions. This work highlights how domain-aware AI design can overcome data limitations and support resilient agricultural forecasting in a changing climate.
17. PyLate: Flexible Training and Retrieval for Late Interaction Models
Authors: Antoine Chaffin, RaphaΓ«l Sourty β€’ Published: 2025-08-05 β€’ Source: arXiv
Neural ranking has become a cornerstone of modern information retrieval. While single vector search remains the dominant paradigm, it suffers from the shortcoming of compressing all the information into a single vector. This compression leads to notable performance degradation in out-of-domain, long-context, and reasoning-intensive retrieval tasks. Multi-vector approaches pioneered by ColBERT aim to address these limitations by preserving individual token embeddings and computing similarity via the MaxSim operator. This architecture has demonstrated superior empirical advantages, including enhanced out-of-domain generalization, long-context handling, and performance in complex retrieval scenarios. Despite these compelling empirical results and clear theoretical advantages, the practical adoption and public availability of late interaction models remain low compared to their single-vector counterparts, primarily due to a lack of accessible and modular tools for training and experimenting with such models. To bridge this gap, we introduce PyLate, a streamlined library built on top of Sentence Transformers to support multi-vector architectures natively, inheriting its efficient training, advanced logging, and automated model card generation while requiring minimal code changes to code templates users are already familiar with. By offering multi-vector-specific features such as efficient indexes, PyLate aims to accelerate research and real-world application of late interaction models, thereby unlocking their full potential in modern IR systems. Finally, PyLate has already enabled the development of state-of-the-art models, including GTE-ModernColBERT and Reason-ModernColBERT, demonstrating its practical utility for both research and production environments.
18. Guided Reality: Generating Visually-Enriched AR Task Guidance with LLMs and Vision Models
Authors: Ada Yi Zhao, Aditya Gunturu, Ellen Yi-Luen Do, Ryo Suzuki β€’ Published: 2025-08-05 β€’ Source: arXiv
Large language models (LLMs) have enabled the automatic generation of step-by-step augmented reality (AR) instructions for a wide range of physical tasks. However, existing LLM-based AR guidance often lacks rich visual augmentations to effectively embed instructions into spatial context for a better user understanding. We present Guided Reality, a fully automated AR system that generates embedded and dynamic visual guidance based on step-by-step instructions. Our system integrates LLMs and vision models to: 1) generate multi-step instructions from user queries, 2) identify appropriate types of visual guidance, 3) extract spatial information about key interaction points in the real world, and 4) embed visual guidance in physical space to support task execution. Drawing from a corpus of user manuals, we define five categories of visual guidance and propose an identification strategy based on the current step. We evaluate the system through a user study (N=16), completing real-world tasks and exploring the system in the wild. Additionally, four instructors shared insights on how Guided Reality could be integrated into their training workflows.
19. EmbedGrad: Gradient-Based Prompt Optimization in Embedding Space for Large Language Models
Authors: Xiaoming Hou, Jiquan Zhang, Zibin Lin, DaCheng Tao, Shengli Zhang β€’ Published: 2025-08-05 β€’ Source: arXiv
Effectively adapting powerful pretrained foundation models to diverse tasks remains a key challenge in AI deployment. Current approaches primarily follow two paradigms:discrete optimization of text prompts through prompt engineering, or continuous adaptation via additional trainable parameters. Both exhibit limitations-discrete methods lack refinement precision while parameter-based techniques increase complexity and reduce interpretability. To address these constraints, we propose EmbedGrad, a novel framework that optimizes text prompt embeddings through gradient-based refinement. Our approach uniquely decouples training from deployment:during optimization,labeled examples guide precise embedding adjustments while preserving semantic meaning; during inference, only optimized embeddings integrate with user queries. This enables fine-grained calibration impossible in text space, such as enhancing the reasoning capability of prompts like please reason step by step. Comprehensive evaluations across mathematical reasoning, sentiment analysis, and causal judgment tasks demonstrate EmbedGrad's effectiveness:optimizing this reasoning prompt for Qwen2.5-Math-1.5B increased accuracy from 14.74\% to 58.96\% on mathematical problems. Consistent improvements were observed across model scales (0.5B-14B) and all tasks, with particularly significant gains for smaller models on complex problems like causal judgment. By bridging prompt engineering and parameter efficiency without architectural changes, our work establishes embedding refinement as a powerful new paradigm for task adaptation.
20. Understanding Demand for Shared Autonomous Micro-Mobility
Authors: Naroa Coretti Sanchez, Kent Larson β€’ Published: 2025-08-05 β€’ Source: arXiv
This study examines the behavioral and environmental implications of shared autonomous micro-mobility systems, focusing on autonomous bicycles and their integration with transit in the U.S. While prior research has addressed operational and lifecycle aspects, a critical gap remains in understanding which modes these services are likely to substitute, who is most inclined to adopt them, and how service attributes influence user decisions. We design a context-aware stated preference survey grounded in real-world trips and estimate discrete choice models, including a hybrid model incorporating latent attitudes. Findings indicate that adoption, mode shift, and environmental impacts are highly sensitive to service design. Scenarios with minimal wait and cost yield high adoption but increase emissions, while moderate waits are more likely to reduce impacts. Adoption likelihood varies with demographic characteristics, and outcomes depend on city type, context, and infrastructure assumptions. These insights can inform the development of more sustainable and equitable mobility systems.