πŸ€– AI Research Papers

August 06, 2025

πŸ€– AI-Generated Research Summary

Comprehensive Summary of 20 Recent Research Papers in AI, LLMs, Agents, and Workflows


1. Key Research Trends

a. Large Language Models (LLMs) and Distillation - Several papers focus on improving LLMs for specialized tasks (e.g., theorem proving, code generation, ad recommendations). - Distillation and reinforcement learning are prominent for enhancing LLM efficiency and alignment.

b. AI Agents and Workflow Automation - There is a clear trend toward developing frameworks and tools for training and deploying AI agents, with an emphasis on flexibility, decoupling, and integration with existing systems.

c. Explainability and Societal Impact - Explainable AI (XAI) and frameworks for assessing societal impact are gaining traction, reflecting a shift from pure performance to trust, transparency, and responsible deployment.

d. Domain-Specific AI Applications - AI is being applied to diverse domains: video and image protection, speech-to-LaTeX conversion, histopathology, hardware design, and agricultural control, demonstrating the field’s broadening scope.

e. Methodological Rigor and Reproducibility - Papers address reproducibility in ML evaluation and propose new methodologies for more robust, reliable, and interpretable results.


2. Breakthrough Findings


3. Methodological Approaches


4. Applications and Use Cases


5. Future Directions


Conclusion

This collection of papers highlights a vibrant and rapidly evolving AI landscape, with significant advances in LLMs, agent frameworks, explainability, and domain-specific applications. The field is moving toward more robust, transparent, and generalizable AI systems, with a growing emphasis on societal impact, reproducibility, and practical deployment. Researchers and practitioners should focus on integrating these trends and methodologies to build trustworthy, efficient, and impactful AI solutions.

πŸ“š arXiv (20 papers)
1. Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition
Authors: Pulkit Kumar, Shuaiyi Huang, Matthew Walmer, Sai Saketh Rambhatla, Abhinav Shrivastava β€’ Published: 2025-08-05 β€’ Source: arXiv
Video understanding requires effective modeling of both motion and appearance information, particularly for few-shot action recognition. While recent advances in point tracking have been shown to improve few-shot action recognition, two fundamental challenges persist: selecting informative points to track and effectively modeling their motion patterns. We present Trokens, a novel approach that transforms trajectory points into semantic-aware relational tokens for action recognition. First, we introduce a semantic-aware sampling strategy to adaptively distribute tracking points based on object scale and semantic relevance. Second, we develop a motion modeling framework that captures both intra-trajectory dynamics through the Histogram of Oriented Displacements (HoD) and inter-trajectory relationships to model complex action patterns. Our approach effectively combines these trajectory tokens with semantic features to enhance appearance features with motion information, achieving state-of-the-art performance across six diverse few-shot action recognition benchmarks: Something-Something-V2 (both full and small splits), Kinetics, UCF101, HMDB51, and FineGym. For project page see https://trokens-iccv25.github.io
2. On the Efficiency of Producing Gamma-Ray Bursts from Isolated Population III Stars
Authors: Gibran Morales-Rivera, Ramandeep Gill, S. Jane Arthur, Paz Beniamini, Jonathan Granot β€’ Published: 2025-08-05 β€’ Source: arXiv
The rate of long-duration gamma-ray bursts (GRBs) from isolated Pop III stars is not well known, as it depends on our poor understanding of their initial mass function (IMF), rotation rates, stellar evolution, and mass loss. A sub-population of massive ($M_{\rm ZAMS}\gtrsim20M_\odot$) Pop III stars is expected to suffer core-collapse and launch a relativistic jet that would power a GRB. In the collapsar scenario, a key requirement is that the pre-supernova star imparts sufficient angular momentum to the remnant black hole to form an accretion disc and launch a relativistic jet, which demands rapid initial rotation of the progenitor star and suppression of line-driven mass loss during its chemically homogeneous evolution. Here we explore a grid of stellar evolution models of Pop III stars with masses $20\leq M_{\rm ZAMS}/M_\odot \leq 100$, which are initially rotating with surface angular velocities $0.6\leq \Omega_0/\Omega_{\rm crit}\leq 0.9$, where centrifugally-driven mass loss ensues for $\Omega>\Omega_{\rm crit}$. Realistic accretion and jet propagation models are used to derive the initial black hole masses and spins, and jet breakout times for these stars. The GRB production efficiency is obtained over a phase space comprising progenitor initial mass, rotation, and wind efficiency. For modest wind efficiency of $\eta_{\rm wind}=0.45-0.35$, the Pop III GRB production efficiency is $\eta_{\rm GRB}\sim10^{-5}-3\times10^{-4}\,M_\odot^{-1}$, respectively, for a top-heavy IMF. This yields an observable all-sky equivalent rate of $\sim2-40\,{\rm yr}^{-1}$ by \textit{Swift}, with 75\% of the GRBs located at $z\lesssim8$. If the actual observed rate is much lower, then this would imply $\eta_{\rm wind}>0.45$, which leads to significant loss of mass and angular momentum that renders isolated Pop III stars incapable of producing GRBs and favours a binary scenario instead.
3. Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws
Authors: GΓ©rard Ben Arous, Murat A. Erdogdu, N. Mert Vural, Denny Wu β€’ Published: 2025-08-05 β€’ Source: arXiv
We study the optimization and sample complexity of gradient-based training of a two-layer neural network with quadratic activation function in the high-dimensional regime, where the data is generated as $y \propto \sum_{j=1}^{r}\lambda_j \sigma\left(\langle \boldsymbol{\theta_j}, \boldsymbol{x}\rangle\right), \boldsymbol{x} \sim N(0,\boldsymbol{I}_d)$, $\sigma$ is the 2nd Hermite polynomial, and $\lbrace\boldsymbol{\theta}_j \rbrace_{j=1}^{r} \subset \mathbb{R}^d$ are orthonormal signal directions. We consider the extensive-width regime $r \asymp d^\beta$ for $\beta \in [0, 1)$, and assume a power-law decay on the (non-negative) second-layer coefficients $\lambda_j\asymp j^{-\alpha}$ for $\alpha \geq 0$. We present a sharp analysis of the SGD dynamics in the feature learning regime, for both the population limit and the finite-sample (online) discretization, and derive scaling laws for the prediction risk that highlight the power-law dependencies on the optimization time, sample size, and model width. Our analysis combines a precise characterization of the associated matrix Riccati differential equation with novel matrix monotonicity arguments to establish convergence guarantees for the infinite-dimensional effective dynamics.
4. Prospects of a New $L_5$ Trojan Flyby Target for the Lucy Mission
Authors: Luis E. Salazar Manzano, David W. Gerdes, Kevin J. Napier, Hsing Wen Lin, Fred C. Adams, Tessa Frincke, Simone Marchi, Keith S. Noll, John Spencer β€’ Published: 2025-08-05 β€’ Source: arXiv
NASA's Lucy spacecraft is en route to conduct the first close encounter with Jupiter's Trojans. While most scheduled flybys lie in the $L_4$ cloud, the only $L_5$ target is the Patroclus-Menoetius binary. Since each flyby offers unique insights into target and population properties unattainable from Earth, we examine the feasibility of including an additional, yet unknown, $L_5$ target while minimizing the impact on Lucy's primary mission. We use the background $L_5$ Trojans brighter than the completeness limit to model their absolute magnitude, spatial, and orbital distributions. A semi-analytical approach estimates the number of Trojans accessible to Lucy for a given $\Delta v$ budget in both pre- and post-Patroclus scenarios. Our results indicate that, while it is unlikely that any suitable Trojan lies on Lucy's nominal path, a moderate $\Delta v$ investment ($35-50\,\mathrm{m/s}$) could enable a sub-kilometer ($500-700\,\mathrm{m}$) flyby prior to the Patroclus encounter. Post-Patroclus, the likelihood of a similar flyby is $\sim60\%$ for $\Delta v\sim$ 50 m/s. Simulations with synthetic Trojans reveal that potential targets cluster near the node opposite to the encounter window, producing an optimal search period in late 2026 for both scenarios. Surveying the densest $10\%$ of this region would require under 5 nights with Subaru/HSC or under 2 nights with Rubin, using shift-and-stack techniques. A successful sub-kilometric flyby would expand Lucy's Trojan target size range and provide new constraints on collisional evolution and the long-standing asymmetry in the $L_4/L_5$ clouds. This nodal-clustering strategy could guide target searches in future Lucy extensions or other planetary flyby missions.
5. Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Authors: Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, Yuqing Yang β€’ Published: 2025-08-05 β€’ Source: arXiv
We present Agent Lightning, a flexible and extensible framework that enables Reinforcement Learning (RL)-based training of Large Language Models (LLMs) for any AI agent. Unlike existing methods that tightly couple RL training with agent or rely on sequence concatenation with masking, Agent Lightning achieves complete decoupling between agent execution and training, allowing seamless integration with existing agents developed via diverse ways (e.g., using frameworks like LangChain, OpenAI Agents SDK, AutoGen, and building from scratch) with almost ZERO code modifications. By formulating agent execution as Markov decision process, we define an unified data interface and propose a hierarchical RL algorithm, LightningRL, which contains a credit assignment module, allowing us to decompose trajectories generated by ANY agents into training transition. This enables RL to handle complex interaction logic, such as multi-agent scenarios and dynamic workflows. For the system design, we introduce a Training-Agent Disaggregation architecture, and brings agent observability frameworks into agent runtime, providing a standardized agent finetuning interface. Experiments across text-to-SQL, retrieval-augmented generation, and math tool-use tasks demonstrate stable, continuous improvements, showcasing the framework's potential for real-world agent training and deployment.
6. Fast Computation of Path Integrals of Killed Processes Using Confined Stochastic Bridges
Authors: Henrique B. N. Monteiro, Daniel M. Tartakovsky β€’ Published: 2025-08-05 β€’ Source: arXiv
Expectations of path integrals of killed stochastic processes play a central role in several applications across physics, chemistry, and finance. Simulation-based evaluation of these functionals is often biased and numerically expensive due to the need to explicitly approximate stochastic paths and the challenge of correctly modeling them in the neighborhood of the killing boundary. We consider It\^{o} processes killed at the boundary of some set in the $n$-dimensional space and introduce a novel stochastic method with negligible bias and lower computational cost to evaluate path integrals without simulated paths. Our approach draws a connection between stochastic bridges and killed processes to sample only exit times and locations instead of the full path. We apply it to a Wiener process killed in the $n$-ball and explicitly derive the density of the Brownian bridge confined to the $n$-ball for $n = 1, 2, 3$. Finally, we present two numerical examples that demonstrate the efficiency and negligible bias of the novel procedure compared to an evaluation using the standard Euler-Maruyama method.
7. Beyond risk: A proto-framework for assessing the societal impact of AI systems
Authors: Willem Fourie β€’ Published: 2025-08-05 β€’ Source: arXiv
In the discourse on AI regulation, 'responsible AI' is the dominant paradigm, with the focus on mitigating the risks related to AI systems. While this focus is important and necessary, it has limited use for a systematic consideration of AI's societal impact. This paper proposes a proto-framework for assessing the societal impact of AI systems by operationalising the concept of freedom. This proto-framework is intended as a step towards a fully operationalised framework to be used in policymaking contexts. By drawing on Kantian philosophy and related contemporary interpretations, freedom is developed as the counterpart to the concept of responsibility. Two dimensions of freedom are developed in further detail: freedom as capability and freedom as opportunity. These two dimensions of freedom are then applied in a proto-framework that systematically considers AI's impact on society using the Sustainable Development Goals. This proto-framework aims to complement current risk-based approaches and thereby offers a first step towards operationalising the concept of freedom in AI regulation.
8. Forest vs Tree: The $(N, K)$ Trade-off in Reproducible ML Evaluation
Authors: Deepak Pandita, Flip Korn, Chris Welty, Christopher M. Homan β€’ Published: 2025-08-05 β€’ Source: arXiv
Reproducibility is a cornerstone of scientific validation and of the authority it confers on its results. Reproducibility in machine learning evaluations leads to greater trust, confidence, and value. However, the ground truth responses used in machine learning often necessarily come from humans, among whom disagreement is prevalent, and surprisingly little research has studied the impact of effectively ignoring disagreement in these responses, as is typically the case. One reason for the lack of research is that budgets for collecting human-annotated evaluation data are limited, and obtaining more samples from multiple annotators for each example greatly increases the per-item annotation costs. We investigate the trade-off between the number of items ($N$) and the number of responses per item ($K$) needed for reliable machine learning evaluation. We analyze a diverse collection of categorical datasets for which multiple annotations per item exist, and simulated distributions fit to these datasets, to determine the optimal $(N, K)$ configuration, given a fixed budget ($N \times K$), for collecting evaluation data and reliably comparing the performance of machine learning models. Our findings show, first, that accounting for human disagreement may come with $N \times K$ at no more than 1000 (and often much lower) for every dataset tested on at least one metric. Moreover, this minimal $N \times K$ almost always occurred for $K > 10$. Furthermore, the nature of the tradeoff between $K$ and $N$ -- or if one even existed -- depends on the evaluation metric, with metrics that are more sensitive to the full distribution of responses performing better at higher levels of $K$. Our methods can be used to help ML practitioners get more effective test data by finding the optimal metrics and number of items and annotations per item to collect to get the most reliability for their budget.
9. Improving Q-Learning for Real-World Control: A Case Study in Series Hybrid Agricultural Tractors
Authors: Hend Abououf, Sidra Ghayour Bhatti, Qadeer Ahmed β€’ Published: 2025-08-05 β€’ Source: arXiv
The variable and unpredictable load demands in hybrid agricultural tractors make it difficult to design optimal rule-based energy management strategies, motivating the use of adaptive, learning-based control. However, existing approaches often rely on basic fuel-based rewards and do not leverage expert demonstrations to accelerate training. In this paper, first, the performance of Q-value-based reinforcement learning algorithms is evaluated for powertrain control in a hybrid agricultural tractor. Three algorithms, Double Q-Learning (DQL), Deep Q-Networks (DQN), and Double DQN (DDQN), are compared in terms of convergence speed and policy optimality. Second, a piecewise domain-specific reward-shaping strategy is introduced to improve learning efficiency and steer agent behavior toward engine fuel-efficient operating regions. Third, the design of the experience replay buffer is examined, with a focus on the effects of seeding the buffer with expert demonstrations and analyzing how different types of expert policies influence convergence dynamics and final performance. Experimental results demonstrate that (1) DDQN achieves 70\% faster convergence than DQN in this application domain, (2) the proposed reward shaping method effectively biases the learned policy toward fuel-efficient outcomes, and (3) initializing the replay buffer with structured expert data leads to a 33\% improvement in convergence speed.
10. The role of migration traps in the formation of binary black holes in AGN disks
Authors: Maria Paola Vaccaro, Yannick Seif, Michela Mapelli β€’ Published: 2025-08-05 β€’ Source: arXiv
Binary black holes (BBHs) forming in the accretion disks of active galactic nuclei (AGNs) represent a promising channel for gravitational-wave production. BBHs are typically expected to originate at migration traps, i.e. radial locations where the Type I migration of embedded stellar-mass black holes (BHs) transitions from outwards to inwards. In this work, we test this assumption by explicitly simulating the radial migration of BH pairs in AGN disks under different torque prescriptions, including thermal effects and the switch to Type II migration. We quantify where and when binaries form as a function of supermassive BH (SMBH) mass, disk viscosity, and migrating BH mass. We find that while the majority of pair-up events occur near migration traps, a substantial fraction takes place elsewhere in the disk, particularly for high-viscosity disks ($\alpha=0.1-0.4$) and SMBHs with mass above a threshold of $10^{7.5}$ solar masses, where differential migration is most efficient. The inclusion of thermal torques favors pair-up in outer locations of the disk and facilitates rapid pair-up. We also investigate hierarchical BBH formation, showing that higher-generation pair-ups are more tightly clustered around trap locations. Our results provide realistic prescriptions for BBH pair-up locations and timescales, highlighting the limitations of assuming fixed BBH formation sites.
11. LLMDistill4Ads: Using Cross-Encoders to Distill from LLM Signals for Advertiser Keyphrase Recommendations at eBay
Authors: Soumik Dey, Benjamin Braun, Naveen Ravipati, Hansi Wu, Binbin Li β€’ Published: 2025-08-05 β€’ Source: arXiv
Sellers at eBay are recommended keyphrases to bid on to enhance the performance of their advertising campaigns. The relevance of these keyphrases is crucial in avoiding the overcrowding of search systems with irrelevant items and maintaining a positive seller perception. It is essential that keyphrase recommendations align with both seller and Search judgments regarding auctions. Due to the difficulty in procuring negative human judgment at scale, employing LLM-as-a-judge to mimic seller judgment has been established as the norm in several studies. This study introduces a novel two-step LLM distillation process from a LLM-judge used to debias our Embedding Based Retrieval (EBR) model from the various biases that exist in click-data. We distill from an LLM teacher via a cross-encoder assistant into a bi-encoder student using a multi-task training approach, ultimately employing the student bi-encoder to retrieve relevant advertiser keyphrases. We show that integrating a knowledge distillation process from LLMs in a multi-task training setup enhances bi-encoder performance in retrieving relevant advertiser keyphrases at eBay.
12. Radiative Nonideal MHD Simulations of Inner Protoplanetary Disks: Temperature Structures, Asymmetric Winds, and Episodic Surface Accretion
Authors: Shoji Mori, Xue-Ning Bai, Kengo Tomida β€’ Published: 2025-08-05 β€’ Source: arXiv
We perform two-dimensional global magnetohydrodynamic (MHD) simulations including the full nonideal MHD effects (Ohmic diffusion, Hall effect, and ambipolar diffusion) and approximate radiation transport to understand the dynamics and thermal structure of the inner protoplanetary disks (PPDs). We have developed a simple radiative transfer model for PPDs that reasonably treats stellar non-thermal (XUV), stellar thermal (optical/infrared), and re-emitted radiations, reproducing the temperature structures from Monte Carlo radiative transfer. Our simulations show fast one-sided surface accretion ($\sim 10\%$ of Keplerian velocity) and asymmetric disk winds when the vertical magnetic field is aligned with the disk angular momentum. The asymmetry is due to the failure of the wind on the side with the accretion layer. On the accreting surface, clumps are repeatedly generated and accrete, driven by radiative feedback. For the anti-aligned fields, surface accretion becomes more moderate and time-variable, while the winds remain largely symmetric. For the thermal structure, accretion heating does not affect the disk temperature in any of our runs. This is because (1) the accretion energy dissipates via Joule heating at 2--3 gas scale heights, where low optical depth enables efficient radiative cooling, and (2) the winds remove $\gtrsim 10\%$ of the accretion energy. In contrast, the winds enhance radiative heating by elevating the irradiation front. These results highlight the importance of coupling between gas dynamics and radiation transport in PPDs, and provide observable magnetic activities such as fast episodic accretion, wind asymmetry, and molecular survival in XUV-irradiated winds.
13. Interferometric signature of higher-order images in a parametrized framework
Authors: Fabiano Feleppa, Fabio Aratore, Valerio Bozza β€’ Published: 2025-08-05 β€’ Source: arXiv
This paper investigates gravitational lensing in the strong deflection limit, focusing particularly on higher-order images produced near compact objects such as black holes and their observable impact through the visibility function. Employing a robust parametrization framework proposed by Rezzolla and Zhidenko, the study systematically explores deviations from the Schwarzschild metric. A detailed theoretical analysis of interferometric observables is provided, highlighting how higher-order images imprint distinctive, measurable patterns in the visibility function, notably characterized by a staircase-like structure. By parametrically varying metric coefficients, the analysis reveals clear dependencies between spacetime deviations and key observational signatures, specifically the step heights and periodicities in the interferometric visibility. The results enhance the theoretical groundwork for interpreting data from advanced interferometric observations, potentially enabling precise tests of general relativity and the discrimination among alternative gravitational theories.
14. Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
Authors: Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, Jiayun Wu, Jiri Gesi, Ximing Lu, David Acuna, Kaiyu Yang, Hongzhou Lin, Yejin Choi, Danqi Chen, Sanjeev Arora, Chi Jin β€’ Published: 2025-08-05 β€’ Source: arXiv
We introduce Goedel-Prover-V2, a series of open-source language models that set a new state-of-the-art in automated theorem proving. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) Scaffolded data synthesis: We generate synthetic tasks of increasing difficulty to train the model to master increasingly complex theorems; (2) Verifier-guided self-correction: We enable the model to iteratively revise its proofs by leveraging feedback from the Lean compiler; (3) Model averaging: We merge model checkpoints to mitigate the decrease in model output diversity in later stages of training. Our small model, Goedel-Prover-V2-8B, reaches 84.6% pass@32 on MiniF2F and outperforms DeepSeek-Prover-V2-671B under the same metric, despite being 80X smaller. Our flagship model, Goedel-Prover-V2-32B, achieves 88.1% on MiniF2F at pass@32 in standard mode and 90.4% in self-correction mode, outperforming prior SOTA by a large margin. Additionally, our flagship model solves 86 problems on PutnamBench at pass@184, securing the first place among open-source models on the leaderboard, surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by pass@1024 with a significantly smaller model size and compute budget. At the time of its release (July-August 2025), Goedel-Prover-V2 achieves the strongest overall performance among all open-source theorem provers. It also ranks among the top-performing models--including closed-source systems with publicly reported performance--under a constrained test-time compute budget. Our models, code, and data are released at https://github.com/Goedel-LM/Goedel-Prover-V2.
15. The problem of sharp notch in microstructured solids governed by dipolar gradient elasticity
Authors: P. A. Gourgiotis, M. D. Sifnaiou, H. G. Georgiadis β€’ Published: 2025-08-05 β€’ Source: arXiv
In this paper, we deal with the asymptotic problem of a body of infinite extent with a notch (re-entrant corner) under remotely applied plane-strain or anti-plane shear loadings. The problem is formulated within the framework of the Toupin-Mindlin theory of dipolar gradient elasticity. This generalized continuum theory is appropriate to model the response of materials with microstructure. A linear version of the theory results by considering a linear isotropic expression for the strain-energy density that depends on strain-gradient terms, in addition to the standard strain terms appearing in classical elasticity. Through this formulation, a microstructural material length is introduced, in addition to the standard Lam\'e constants . The faces of the notch are considered to be traction-free and a boundary-layer approach is followed. The boundary value problem is attacked with the asymptotic Knein-Williams technique. Our analysis leads to an eigenvalue problem, which, along with the restriction of a bounded strain energy, provides the asymptotic fields. The cases of a crack and a half-space are analyzed in detail as limit cases of the general notch (infinite wedge) problem. The results show significant departure from the predictions of the standard fracture mechanics.
16. DeepFaith: A Domain-Free and Model-Agnostic Unified Framework for Highly Faithful Explanations
Authors: Yuhan Guo, Lizhong Ding, Shihan Jia, Yanyu Ren, Pengqi Li, Jiarun Fu, Changsheng Li, Ye yuan, Guoren Wang β€’ Published: 2025-08-05 β€’ Source: arXiv
Explainable AI (XAI) builds trust in complex systems through model attribution methods that reveal the decision rationale. However, due to the absence of a unified optimal explanation, existing XAI methods lack a ground truth for objective evaluation and optimization. To address this issue, we propose Deep architecture-based Faith explainer (DeepFaith), a domain-free and model-agnostic unified explanation framework under the lens of faithfulness. By establishing a unified formulation for multiple widely used and well-validated faithfulness metrics, we derive an optimal explanation objective whose solution simultaneously achieves optimal faithfulness across these metrics, thereby providing a ground truth from a theoretical perspective. We design an explainer learning framework that leverages multiple existing explanation methods, applies deduplicating and filtering to construct high-quality supervised explanation signals, and optimizes both pattern consistency loss and local correlation to train a faithful explainer. Once trained, DeepFaith can generate highly faithful explanations through a single forward pass without accessing the model being explained. On 12 diverse explanation tasks spanning 6 models and 6 datasets, DeepFaith achieves the highest overall faithfulness across 10 metrics compared to all baseline methods, highlighting its effectiveness and cross-domain generalizability.
17. SAGE-HLS: Syntax-Aware AST-Guided LLM for High-Level Synthesis Code Generation
Authors: M Zafir Sadik Khan, Nowfel Mashnoor, Mohammad Akyash, Kimia Azar, Hadi Kamali β€’ Published: 2025-08-05 β€’ Source: arXiv
In today's rapidly evolving field of electronic design automation (EDA), the complexity of hardware designs is increasing, necessitating more sophisticated automation solutions. High-level synthesis (HLS), as a pivotal solution, automates hardware designs from high-level abstractions (e.g., C/C++). However, it faces significant challenges, particularly in design space exploration and optimization. While large language models (LLMs) have shown notable capabilities in code generation, their application to HLS has been limited due to the scarcity of (publicly) available HLS code datasets. Hence, research in this domain has primarily focused on techniques such as prompt engineering and retrieval-augmented generation (RAG). To overcome this limitation, this paper introduces SAGE-HLS, the first-of-its-kind fine-tuned LLM specifically for HLS code generation. Our method includes three key advancements: (i) We implement Verilog-to-C/C++ porting, converting verified and synthesizable Verilog codes into corresponding C, creating a dataset of 16.7K HLS codes; (ii) We implement a fine-tuning strategy, which is based on instruction prompting to code generation guided by abstract syntax tree (AST); (iii) We develop a semi-automated evaluation framework using VerilogEval to assess the functionality of the generated HLS code. Our experiments show that SAGE-HLS, fined-tuned on the QwenCoder (2.5) 7B model, achieves a near 100% success rate in code synthesizability and a 75% success rate in functional correctness.
18. Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences
Authors: Dmitrii Korzh, Dmitrii Tarasov, Artyom Iudin, Elvir Karimov, Matvey Skripkin, Nikita Kuzmin, Andrey Kuznetsov, Oleg Y. Rogov, Ivan Oseledets β€’ Published: 2025-08-05 β€’ Source: arXiv
Conversion of spoken mathematical expressions is a challenging task that involves transcribing speech into a strictly structured symbolic representation while addressing the ambiguity inherent in the pronunciation of equations. Although significant progress has been achieved in automatic speech recognition (ASR) and language models (LM), the problem of converting spoken mathematics into LaTeX remains underexplored. This task directly applies to educational and research domains, such as lecture transcription or note creation. Based on ASR post-correction, prior work requires 2 transcriptions, focuses only on isolated equations, has a limited test set, and provides neither training data nor multilingual coverage. To address these issues, we present the first fully open-source large-scale dataset, comprising over 66,000 human-annotated audio samples of mathematical equations and sentences in both English and Russian, drawn from diverse scientific domains. In addition to the ASR post-correction models and few-shot prompting, we apply audio language models, demonstrating comparable character error rate (CER) results on the MathSpeech benchmark (28% vs. 30%) for the equations conversion. In contrast, on the proposed S2L-equations benchmark, our models outperform the MathSpeech model by a substantial margin of more than 40 percentage points, even after accounting for LaTeX formatting artifacts (27% vs. 64%). We establish the first benchmark for mathematical sentence recognition (S2L-sentences) and achieve an equation CER of 40%. This work lays the groundwork for future advances in multimodal AI, with a particular focus on mathematical content recognition.
19. Semantic Mosaicing of Histo-Pathology Image Fragments using Visual Foundation Models
Authors: Stefan BrandstΓ€tter, Maximilian KΓΆller, Philipp SeebΓΆck, Alissa Blessing, Felicitas Oberndorfer, Svitlana Pochepnia, Helmut Prosch, Georg Langs β€’ Published: 2025-08-05 β€’ Source: arXiv
In histopathology, tissue samples are often larger than a standard microscope slide, making stitching of multiple fragments necessary to process entire structures such as tumors. Automated stitching is a prerequisite for scaling analysis, but is challenging due to possible tissue loss during preparation, inhomogeneous morphological distortion, staining inconsistencies, missing regions due to misalignment on the slide, or frayed tissue edges. This limits state-of-the-art stitching methods using boundary shape matching algorithms to reconstruct artificial whole mount slides (WMS). Here, we introduce SemanticStitcher using latent feature representations derived from a visual histopathology foundation model to identify neighboring areas in different fragments. Robust pose estimation based on a large number of semantic matching candidates derives a mosaic of multiple fragments to form the WMS. Experiments on three different histopathology datasets demonstrate that SemanticStitcher yields robust WMS mosaicing and consistently outperforms the state of the art in correct boundary matches.
20. VideoGuard: Protecting Video Content from Unauthorized Editing
Authors: Junjie Cao, Kaizhou Li, Xinchun Yu, Hongxiang Li, Xiaoping Zhang β€’ Published: 2025-08-05 β€’ Source: arXiv
With the rapid development of generative technology, current generative models can generate high-fidelity digital content and edit it in a controlled manner. However, there is a risk that malicious individuals might misuse these capabilities for misleading activities. Although existing research has attempted to shield photographic images from being manipulated by generative models, there remains a significant disparity in the protection offered to video content editing. To bridge the gap, we propose a protection method named VideoGuard, which can effectively protect videos from unauthorized malicious editing. This protection is achieved through the subtle introduction of nearly unnoticeable perturbations that interfere with the functioning of the intended generative diffusion models. Due to the redundancy between video frames, and inter-frame attention mechanism in video diffusion models, simply applying image-based protection methods separately to every video frame can not shield video from unauthorized editing. To tackle the above challenge, we adopt joint frame optimization, treating all video frames as an optimization entity. Furthermore, we extract video motion information and fuse it into optimization objectives. Thus, these alterations can effectively force the models to produce outputs that are implausible and inconsistent. We provide a pipeline to optimize this perturbation. Finally, we use both objective metrics and subjective metrics to demonstrate the efficacy of our method, and the results show that the protection performance of VideoGuard is superior to all the baseline methods.