AI/LLM/Agent/Workflow Papers

🤖 AI-Generated Research Summary

Comprehensive Summary of 20 Recent AI, LLM, Agent, and Workflow Research Papers

1. Key Research Trends

a. Fairness, Evaluation, and Robustness in LLMs and NLP - Growing focus on fairness (e.g., FairLangProc) and robust evaluation (e.g., CompassVerifier, FPBench) for LLMs, especially in high-stakes or real-world applications. - Increasing attention to systematic benchmarking and evaluation frameworks for LLMs and retrieval-augmented generation (RAG) systems (e.g., Double-Bench).

b. Multimodal and Domain-Specific AI - Expansion of AI into multimodal domains (e.g., event-based vision, LiDAR, satellite imagery, endoscopy, protein modeling). - Development of domain-adapted architectures for specialized tasks (e.g., evTransFER for facial expression, MetaScope for endoscopy, La La LiDAR for scene generation).

c. Generative and Diffusion Models - Application of generative models (diffusion, flow-matching, conditional flows) to complex data types: LiDAR, satellite images, protein structures, and more.

d. Autonomous Agents and Learning from Human Preferences - Advances in apprenticeship learning and inverse reinforcement learning (IRL) for aligning agent behavior with human values (e.g., PAC Apprenticeship Learning).

e. Online, Streaming, and Real-Time AI - Emphasis on online learning and streaming data (e.g., Streaming GP Experts, Inland-LOAM for real-time mapping).

f. AI for Scientific Discovery and Smart Systems - Use of AI for scientific discovery (e.g., gravitational wave detection, lattice QCD) and smart agriculture (phytobiome communication).

2. Breakthrough Findings

FairLangProc: Provides a unified Python package for fairness assessment and mitigation in NLP, filling a gap in accessible, standardized tools for bias analysis.
ReFuzzer: Introduces a feedback-driven loop to refine LLM-generated code, significantly improving the validity of test programs for compiler fuzzing.
Double-Bench: Proposes a comprehensive, large-scale benchmark for document RAG systems, addressing real-world evaluation gaps.
MetaScope: Combines meta-optics and neural networks for ultra-micro endoscopy, enabling high-resolution imaging at micron scales.
CloudBreaker: Achieves high-quality cloud removal in satellite imagery by leveraging radar data and conditional flow matching, overcoming a major remote sensing bottleneck.
FlowBack-Adjoint: Presents a physics-aware, energy-guided generative model for accurate all-atom protein backmapping from coarse-grained data.
FPBench: Establishes the first code generation evaluation framework targeting faulty premises, revealing LLMs' limitations in critical thinking and self-scrutiny.
Streaming GP Experts: Enables scalable, online Gaussian Process learning for real-time control in safety-critical systems.

3. Methodological Approaches

Transfer Learning & Domain Adaptation: Used in evTransFER for event-based vision and in other domain-specific models.
Feedback Loops & Self-Verification: ReFuzzer and CompassVerifier employ iterative feedback and verification mechanisms to improve LLM outputs.
Conditional Generative Models: Diffusion models (La La LiDAR), conditional flow matching (CloudBreaker, FlowBack-Adjoint), and layout-guided generation are prominent.
Active and Bayesian Learning: PAC Apprenticeship Learning leverages Bayesian active IRL for robust policy learning.
Physics-Informed and Energy-Guided AI: FlowBack-Adjoint and MetaScope integrate physical constraints and domain knowledge into model design.
Attention Mechanisms and Fusion Networks: DyCAF-Net and CTR-Sink introduce novel attention and fusion strategies for improved performance in detection and recommendation tasks.
Benchmarking and Evaluation Frameworks: Double-Bench, FPBench, and CompassVerifier provide new standards for evaluating LLMs and generative systems.

4. Applications and Use Cases

Fairness in Decision-Making: Tools for bias detection and mitigation in NLP for healthcare, justice, and organizational contexts.
Autonomous Navigation and Mapping: Real-time semantic mapping for inland waterways (Inland-LOAM), layout generation for robotics and autonomous driving (La La LiDAR).
Healthcare and Bioimaging: Ultra-micro endoscopy (MetaScope), deep tissue microscopy (d-FF-OCM), and protein structure modeling (FlowBack-Adjoint).
Remote Sensing and Earth Observation: Cloud removal in satellite imagery (CloudBreaker), event-based facial expression recognition (evTransFER).
Scientific Discovery: Automated algorithmic discovery for gravitational wave detection, lattice QCD computations for particle physics.
Recommendation Systems: Improved click-through rate prediction using language models (CTR-Sink).
Smart Agriculture: Decoding phytobiome communication for precision agriculture.
Code Generation and Software Testing: LLM-based code generation evaluation (FPBench), compiler fuzzing (ReFuzzer).

5. Future Directions

Standardization and Accessibility: Further development of open-source, standardized toolkits for fairness, evaluation, and benchmarking in AI.
Robustness and Trustworthiness: Enhanced self-verification, critical thinking, and error detection in LLMs, especially for code and decision-critical applications.
Multimodal and Cross-Domain AI: Expansion of generative and learning models to new data modalities (e.g., radar, LiDAR, molecular data) and integration across domains.
Human-AI Alignment: Improved methods for aligning autonomous agents with human preferences and values, especially in safety-critical domains.
Scalable, Real-Time AI: Continued focus on online, streaming, and low-latency AI for dynamic environments and real-world deployment.
Physics- and Domain-Informed AI: Deeper integration of physical laws, domain knowledge, and constraints into AI models for scientific and engineering applications.
Explainability and Interpretability: Development of more transparent and interpretable models, especially in high-stakes and regulated environments.

Conclusion

This collection of papers highlights a vibrant and rapidly evolving AI landscape, with strong trends toward fairness, robustness, domain adaptation, and real-world applicability. Researchers are pushing the boundaries of generative modeling, evaluation, and online learning, while also addressing critical challenges in trust, alignment, and multimodal integration. The future promises even greater convergence of AI with scientific discovery, real-time systems, and human-centric applications, underpinned by a growing emphasis on transparency, reliability, and societal impact.

📚 arXiv (20 papers)

1. PAC Apprenticeship Learning with Bayesian Active Inverse Reinforcement Learning

Authors: Ondrej Bajgar, Dewi S. W. Gould, Jonathon Liu, Alessandro Abate, Konstantinos Gatsis, Michael A. Osborne • Published: 2025-08-05 • Source: arXiv

As AI systems become increasingly autonomous, reliably aligning their decision-making to human preferences is essential. Inverse reinforcement learning (IRL) offers a promising approach to infer preferences from demonstrations. These preferences can then be used to produce an apprentice policy that performs well on the demonstrated task. However, in domains like autonomous driving or robotics, where errors can have serious consequences, we need not just good average performance but reliable policies with formal guarantees -- yet obtaining sufficient human demonstrations for reliability guarantees can be costly. Active IRL addresses this challenge by strategically selecting the most informative scenarios for human demonstration. We introduce PAC-EIG, an information-theoretic acquisition function that directly targets probably-approximately-correct (PAC) guarantees for the learned policy -- providing the first such theoretical guarantee for active IRL with noisy expert demonstrations. Our method maximises information gain about the regret of the apprentice policy, efficiently identifying states requiring further demonstration. We also present Reward-EIG as an alternative when learning the reward itself is the primary objective. Focusing on finite state-action spaces, we prove convergence bounds, illustrate failure modes of prior heuristic methods, and demonstrate our method's advantages experimentally.

🔗 View Paper 📄 PDF

2. La La LiDAR: Large-Scale Layout Generation from LiDAR Data

Authors: Youquan Liu, Lingdong Kong, Weidong Yang, Xin Li, Ao Liang, Runnan Chen, Ben Fei, Tongliang Liu • Published: 2025-08-05 • Source: arXiv

Controllable generation of realistic LiDAR scenes is crucial for applications such as autonomous driving and robotics. While recent diffusion-based models achieve high-fidelity LiDAR generation, they lack explicit control over foreground objects and spatial relationships, limiting their usefulness for scenario simulation and safety validation. To address these limitations, we propose Large-scale Layout-guided LiDAR generation model ("La La LiDAR"), a novel layout-guided generative framework that introduces semantic-enhanced scene graph diffusion with relation-aware contextual conditioning for structured LiDAR layout generation, followed by foreground-aware control injection for complete scene generation. This enables customizable control over object placement while ensuring spatial and semantic consistency. To support our structured LiDAR generation, we introduce Waymo-SG and nuScenes-SG, two large-scale LiDAR scene graph datasets, along with new evaluation metrics for layout synthesis. Extensive experiments demonstrate that La La LiDAR achieves state-of-the-art performance in both LiDAR generation and downstream perception tasks, establishing a new benchmark for controllable 3D scene generation.

🔗 View Paper 📄 PDF

3. CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Authors: Shudong Liu, Hongwei Liu, Junnan Liu, Linchen Xiao, Songyang Gao, Chengqi Lyu, Yuzhe Gu, Wenwei Zhang, Derek F. Wong, Songyang Zhang, Kai Chen • Published: 2025-08-05 • Source: arXiv

Answer verification is crucial not only for evaluating large language models (LLMs) by matching their unstructured outputs against standard answers, but also serves as the reward model to guide LLM optimization. Most evaluation frameworks rely on regularized matching or employ general LLMs for answer verification, which demands extensive, repetitive customization for regex rules or evaluation prompts. Two fundamental limitations persist in current methodologies: 1) the absence of comprehensive benchmarks that systematically evaluate verification capabilities across different LLMs; and 2) the nascent stage of verifier development, where existing approaches lack both the robustness to handle complex edge cases and the generalizability across different domains. In this work, we develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward. It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types, including multi-subproblems, formulas, and sequence answers, while effectively identifying abnormal/invalid responses. We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier. We anticipate that CompassVerifier and VerifierBench will facilitate answer verification, evaluation protocols, and reinforcement learning research. Code and dataset are available at https://github.com/open-compass/CompassVerifier.

🔗 View Paper 📄 PDF

4. Streaming Generated Gaussian Process Experts for Online Learning and Control

Authors: Zewen Yang, Dongfa Zhang, Xiaobing Dai, Fengyi Yu, Chi Zhang, Bingkun Huang, Hamid Sadeghian, Sami Haddadin • Published: 2025-08-05 • Source: arXiv

Gaussian Processes (GPs), as a nonparametric learning method, offer flexible modeling capabilities and calibrated uncertainty quantification for function approximations. Additionally, GPs support online learning by efficiently incorporating new data with polynomial-time computation, making them well-suited for safety-critical dynamical systems that require rapid adaptation. However, the inference and online updates of exact GPs, when processing streaming data, incur cubic computation time and quadratic storage memory complexity, limiting their scalability to large datasets in real-time settings. In this paper, we propose a \underline{s}treaming \underline{k}ernel-induced progressivel\underline{y} generated expert framework of \underline{G}aussian \underline{p}rocesses (SkyGP) that addresses both computational and memory constraints by maintaining a bounded set of experts, while inheriting the learning performance guarantees from exact Gaussian processes. Furthermore, two SkyGP variants are introduced, each tailored to a specific objective, either maximizing prediction accuracy (SkyGP-Dense) or improving computational efficiency (SkyGP-Fast). The effectiveness of SkyGP is validated through extensive benchmarks and real-time control experiments demonstrating its superior performance compared to state-of-the-art approaches.

🔗 View Paper 📄 PDF

5. FairLangProc: A Python package for fairness in NLP

Authors: Arturo Pérez-Peralta, Sandra Benítez-Peña, Rosa E. Lillo • Published: 2025-08-05 • Source: arXiv

The rise in usage of Large Language Models to near ubiquitousness in recent years has risen societal concern about their applications in decision-making contexts, such as organizational justice or healthcare. This, in turn, poses questions about the fairness of these models in critical settings, which leads to the developement of different procedures to address bias in Natural Language Processing. Although many datasets, metrics and algorithms have been proposed to measure and mitigate harmful prejudice in Natural Language Processing, their implementation is diverse and far from centralized. As a response, this paper presents FairLangProc, a comprehensive Python package providing a common implementation of some of the more recent advances in fairness in Natural Language Processing providing an interface compatible with the famous Hugging Face transformers library, aiming to encourage the widespread use and democratization of bias mitigation techniques. The implementation can be found on https://github.com/arturo-perez-peralta/FairLangProc.

🔗 View Paper 📄 PDF

6. Inland-LOAM: Voxel-Based Structural Semantic Mapping for Inland Waterways

Authors: Zhongbi Luo, Yunjia Wang, Jan Swevers, Peter Slaets, Herman Bruyninckx • Published: 2025-08-05 • Source: arXiv

Accurate geospatial information is crucial for safe, autonomous Inland Waterway Transport (IWT), as existing charts (IENC) lack real-time detail and conventional LiDAR SLAM fails in waterway environments. These challenges lead to vertical drift and non-semantic maps, hindering autonomous navigation. This paper introduces Inland-LOAM, a LiDAR SLAM framework for waterways. It uses an improved feature extraction and a water surface planar constraint to mitigate vertical drift. A novel pipeline transforms 3D point clouds into structured 2D semantic maps using voxel-based geometric analysis, enabling real-time computation of navigational parameters like bridge clearances. An automated module extracts shorelines and exports them into a lightweight, IENC-compatible format. Evaluations on a real-world dataset show Inland-LOAM achieves superior localization accuracy over state-of-the-art methods. The generated semantic maps and shorelines align with real-world conditions, providing reliable data for enhanced situational awareness. The code and dataset will be publicly available

🔗 View Paper 📄 PDF

7. CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction

Authors: Zixuan Li, Binzong Geng, Jing Xiong, Yong He, Yuxuan Hu, Jian Chen, Dingwei Chen, Xiyu Chang, Liang Zhang, Linjian Mo, Chengming Li, Chuan Yuan, Zhenan Sun • Published: 2025-08-05 • Source: arXiv

Click-Through Rate (CTR) prediction, a core task in recommendation systems, estimates user click likelihood using historical behavioral data. Modeling user behavior sequences as text to leverage Language Models (LMs) for this task has gained traction, owing to LMs' strong semantic understanding and contextual modeling capabilities. However, a critical structural gap exists: user behavior sequences consist of discrete actions connected by semantically empty separators, differing fundamentally from the coherent natural language in LM pre-training. This mismatch causes semantic fragmentation, where LM attention scatters across irrelevant tokens instead of focusing on meaningful behavior boundaries and inter-behavior relationships, degrading prediction performance. To address this, we propose $\textit{CTR-Sink}$, a novel framework introducing behavior-level attention sinks tailored for recommendation scenarios. Inspired by attention sink theory, it constructs attention focus sinks and dynamically regulates attention aggregation via external information. Specifically, we insert sink tokens between consecutive behaviors, incorporating recommendation-specific signals such as temporal distance to serve as stable attention sinks. To enhance generality, we design a two-stage training strategy that explicitly guides LM attention toward sink tokens and a attention sink mechanism that amplifies inter-sink dependencies to better capture behavioral correlations. Experiments on one industrial dataset and two open-source datasets (MovieLens, Kuairec), alongside visualization results, validate the method's effectiveness across scenarios.

🔗 View Paper 📄 PDF

8. Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search

Authors: He Wang, Liang Zeng • Published: 2025-08-05 • Source: arXiv

Computational scientific discovery increasingly relies on algorithms to process complex data and identify meaningful patterns - yet faces persistent challenges in gravitational-wave signal identification. While existing algorithmic approaches like matched filtering (MF) and deep neural networks (DNNs) have achieved partial success, their limitations directly stem from fundamental limitations: MF's excessive computational demands arise from its reliance on predefined theoretical waveform templates, while DNNs' black-box architectures obscure decision logic and introduce hidden biases. We propose Evolutionary Monte Carlo Tree Search (Evo-MCTS), a framework that addresses these limitations through systematic algorithm space exploration guided by domain-aware physical constraints. Our approach combines tree-structured search with evolutionary optimization and large language model heuristics to create interpretable algorithmic solutions. Our Evo-MCTS framework demonstrates substantial improvements, achieving a 20.2\% improvement over state-of-the-art gravitational wave detection algorithms on the MLGWSC-1 benchmark dataset. High-performing algorithm variants consistently exceed thresholds. The framework generates human-interpretable algorithmic pathways that reveal distinct performance patterns. Beyond performance improvements, our framework discovers novel algorithmic combinations, thereby establishing a transferable methodology for automated algorithmic discovery across computational science domains.

🔗 View Paper 📄 PDF

9. High-Resolution Dynamic Full-Field Optical Coherence Microscopy: Illuminating Intracellular Activity in Deep Tissue

Authors: Erikas Tarvydas, Austeja Treciokaite, Egidijus Auksorius • Published: 2025-08-05 • Source: arXiv

Dynamic full-field optical coherence microscopy (d-FF-OCM) is a label-free imaging technique that captures intrinsic subcellular motions to generate functional contrast. This dynamic approach yields images with fluorescence-like contrast, highlighting active structures without the need for fluorescent labels. However, current d-FF-OCM implementations struggle to image deep within highly scattering tissues at high resolution. Here, we present a new high-resolution d-FF-OCM system that overcomes these limitations, enabling much deeper high-resolution imaging in such tissues. The setup uses 100x oil-immersion objectives (NA = 1.25) and a high-brightness, laser-pumped incoherent white light source to achieve nanometer-scale resolution at depths up to approximately 100 micrometers in highly scattering samples. We also incorporate real-time reference arm adjustment to maintain signal strength and contrast as the focus moves deeper into the sample. Using this system, we imaged fresh ex vivo mouse liver and small intestine with unprecedented depth and detail. In these tissues, the dynamic contrast clearly revealed fine structures not visible with conventional OCT-for example, the sinusoidal microvasculature and organized cell layers in the liver, as well as neural plexuses and crypts in the intestine-all visualized label-free. By bridging the gap between high-resolution and deep imaging in highly scattering tissue, this advance provides a powerful new tool for biological microscopy, with potential applications from fundamental research to rapid intraoperative pathology.

🔗 View Paper 📄 PDF

10. Theoretical framework for lattice QCD computations of $B\to K \ell^+ \ell^-$ and $\bar{B}_s\to \ell^+\ell^- γ$ decays rates, including contributions from "Charming Penguins"

Authors: R. Frezzotti, G. Gagliardi, V. Lubicz, G. Martinelli, C. T. Sachrajda, F. Sanfilippo, L. Silvestrini, S. Simula, N. Tantalo • Published: 2025-08-05 • Source: arXiv

We develop a strategy for computing the $B\to K\ell^+\ell^-$ and $\bar{B}_s\to\gamma\ell^+\ell^-$ decay amplitudes using lattice QCD (where $\ell^\pm$ are charged leptons). We focus on those terms which contain complex contributions to the amplitude, due to on-shell intermediate states propagating between the weak operator and electromagnetic current(s). Such terms, which are generally estimated using model calculations and represent significant uncertainties in the phenomenological predictions for these decays, cannot be computed using standard lattice QCD techniques. It has recently been shown that such contributions can be computed using spectral-density methods and our proposed strategy, which we discuss in detail, is built on this approach. The complex contributions include the ``charming penguins" (matrix elements of the current-current operators $O_1^{(c)}$ and $O_2^{(c)}$ defined in Eq. (6) below), in which the charm-quark loop can propagate long distances, particularly close to the region of charmonium resonances. They also include the contributions from the chromomagnetic operator ($O_8$ in standard notation, defined in Eq. (8) below). We discuss the renormalization of the ultra-violet divergences, and in particular those which arise due to ``contact" terms, and explain how those which appear as inverse powers of the lattice spacing can be subtracted non-perturbatively. We apply the spectral density methods in an instructive exploratory computation of the charming penguin diagram in $B\to K\ell^+\ell^-$ decays in which the virtual photon is emitted from the charm-quark loop (the diagram in Fig. 1(a) below) and discuss the prospects and strategies for the reliable determination of the amplitudes in future dedicated computations.

🔗 View Paper 📄 PDF

11. Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?

Authors: Wenxuan Shen, Mingjia Wang, Yaochen Wang, Dongping Chen, Junjie Yang, Yao Wan, Weiwei Lin • Published: 2025-08-05 • Source: arXiv

Retrieval-Augmented Generation (RAG) systems using Multimodal Large Language Models (MLLMs) show great promise for complex document understanding, yet their development is critically hampered by inadequate evaluation. Current benchmarks often focus on specific part of document RAG system and use synthetic data with incomplete ground truth and evidence labels, therefore failing to reflect real-world bottlenecks and challenges. To overcome these limitations, we introduce Double-Bench: a new large-scale, multilingual, and multimodal evaluation system that is able to produce fine-grained assessment to each component within document RAG systems. It comprises 3,276 documents (72,880 pages) and 5,168 single- and multi-hop queries across 6 languages and 4 document types with streamlined dynamic update support for potential data contamination issues. Queries are grounded in exhaustively scanned evidence pages and verified by human experts to ensure maximum quality and completeness. Our comprehensive experiments across 9 state-of-the-art embedding models, 4 MLLMs and 4 end-to-end document RAG frameworks demonstrate the gap between text and visual embedding models is narrowing, highlighting the need in building stronger document retrieval models. Our findings also reveal the over-confidence dilemma within current document RAG frameworks that tend to provide answer even without evidence support. We hope our fully open-source Double-Bench provide a rigorous foundation for future research in advanced document RAG systems. We plan to retrieve timely corpus and release new benchmarks on an annual basis.

🔗 View Paper 📄 PDF

12. Refining Critical Thinking in LLM Code Generation: A Faulty Premise-based Evaluation Framework

Authors: Jialin Li, Jinzhe Li, Gengxu Li, Yi Chang, Yuan Wu • Published: 2025-08-05 • Source: arXiv

With the advancement of code generation capabilities in large language models (LLMs), their reliance on input premises has intensified. When users provide inputs containing faulty premises, the probability of code generation hallucinations rises significantly, exposing deficiencies in their self-scrutiny capabilities. This paper proposes Faulty Premises Bench (FPBench), the first code generation evaluation framework targeting faulty premises. By systematically constructing three categories of faulty premises and integrating multi-dimensional evaluation metrics, it conducts in-depth assessments of 15 representative LLMs. The key findings are as follows: (1) Most models exhibit poor reasoning abilities and suboptimal code generation performance under faulty premises, heavily relying on explicit prompts for error detection, with limited self-scrutiny capabilities; (2) Faulty premises trigger a point of diminishing returns in resource investment, leading to blindly increasing length fails to enhance quality; (3) The three types of faulty premises respectively activate distinct defect patterns in models, revealing a triple dissociation in the cognitive mechanisms of code generation models. This study not only highlights the urgent need for LLMs to proactively verify premises in code generation but also, through the proposed FPBench framework and multi-dimensional evaluation system, provides a theoretical foundation and practical pathway for developing reliable, human-centric code generation models.

🔗 View Paper 📄 PDF

13. FlowBack-Adjoint: Physics-Aware and Energy-Guided Conditional Flow-Matching for All-Atom Protein Backmapping

Authors: Alex Berlaga, Michael S. Jones, Andrew L. Ferguson • Published: 2025-08-05 • Source: arXiv

Coarse-grained (CG) molecular models of proteins can substantially increase the time and length scales accessible to molecular dynamics simulations of proteins, but recovery of accurate all-atom (AA) ensembles from CG simulation trajectories can be essential for exposing molecular mechanisms of folding and docking and for calculation of physical properties requiring atomistic detail. The recently reported deep generative model FlowBack restores AA detail to protein C-alpha traces using a flow-matching architecture and demonstrates state-of-the-art performance in generation of AA structural ensembles. Training, however, is performed exclusively on structural data and the absence of any awareness of interatomic energies or forces within training results in small fractions of incorrect bond lengths, atomic clashes, and otherwise high-energy structures. In this work, we introduce FlowBack-Adjoint as a lightweight enhancement that upgrades the pre-trained FlowBack model through a one-time, physics-aware post-training pass. Auxiliary contributions to the flow introduce physical awareness of bond lengths and Lennard-Jones interactions and gradients of a molecular mechanics force field energy are incorporated via adjoint matching to steer the FlowBack-Adjoint vector field to produce lower-energy configurations. In benchmark tests against FlowBack, FlowBack-Adjoint lowers single-point energies by a median of ~78 kcal/mol.residue, reduces errors in bond lengths by >92%, eliminates >98% of molecular clashes, maintains excellent diversity of the AA configurational ensemble, and produces configurations capable of initializing stable all-atom molecular dynamics simulations without requiring energy relaxation. We propose FlowBack-Adjoint as an accurate and efficient physics-aware deep generative model for AA backmapping from C-alpha traces.

🔗 View Paper 📄 PDF

14. evTransFER: A Transfer Learning Framework for Event-based Facial Expression Recognition

Authors: Rodrigo Verschae, Ignacio Bugueno-Cordova • Published: 2025-08-05 • Source: arXiv

Event-based cameras are bio-inspired vision sensors that asynchronously capture per-pixel intensity changes with microsecond latency, high temporal resolution, and high dynamic range, providing valuable information about the spatio-temporal dynamics of the scene. In the present work, we propose evTransFER, a transfer learning-based framework and architecture for face expression recognition using event-based cameras. The main contribution is a feature extractor designed to encode the spatio-temporal dynamics of faces, built by training an adversarial generative method on a different problem (facial reconstruction) and then transferring the trained encoder weights to the face expression recognition system. We show that this proposed transfer learning method greatly improves the ability to recognize facial expressions compared to training a network from scratch. In addition, we propose an architecture that incorporates an LSTM to capture longer-term facial expression dynamics, and we introduce a new event-based representation, referred to as TIE, both of which further improve the results. We evaluate the proposed framework on the event-based facial expression database e-CK+ and compare it to state-of-the-art methods. The results show that the proposed framework evTransFER achieves a 93.6\% recognition rate on the e-CK+ database, significantly improving the accuracy (25.9\% points or more) when compared to state-of-the-art performance for similar problems.

🔗 View Paper 📄 PDF

15. CloudBreaker: Breaking the Cloud Covers of Sentinel-2 Images using Multi-Stage Trained Conditional Flow Matching on Sentinel-1

Authors: Saleh Sakib Ahmed, Sara Nowreen, M. Sohel Rahman • Published: 2025-08-05 • Source: arXiv

Cloud cover and nighttime conditions remain significant limitations in satellite-based remote sensing, often restricting the availability and usability of multi-spectral imagery. In contrast, Sentinel-1 radar images are unaffected by cloud cover and can provide consistent data regardless of weather or lighting conditions. To address the challenges of limited satellite imagery, we propose CloudBreaker, a novel framework that generates high-quality multi-spectral Sentinel-2 signals from Sentinel-1 data. This includes the reconstruction of optical (RGB) images as well as critical vegetation and water indices such as NDVI and NDWI.We employed a novel multi-stage training approach based on conditional latent flow matching and, to the best of our knowledge, are the first to integrate cosine scheduling with flow matching. CloudBreaker demonstrates strong performance, achieving a Frechet Inception Distance (FID) score of 0.7432, indicating high fidelity and realism in the generated optical imagery. The model also achieved Structural Similarity Index Measure (SSIM) of 0.6156 for NDWI and 0.6874 for NDVI, indicating a high degree of structural similarity. This establishes CloudBreaker as a promising solution for a wide range of remote sensing applications where multi-spectral data is typically unavailable or unreliable

🔗 View Paper 📄 PDF

16. ReFuzzer: Feedback-Driven Approach to Enhance Validity of LLM-Generated Test Programs

Authors: Iti Shree, Karine Even-Mendoz, Tomasz Radzik • Published: 2025-08-05 • Source: arXiv

Existing LLM-based compiler fuzzers often produce syntactically or semantically invalid test programs, limiting their effectiveness in exercising compiler optimizations and backend components. We introduce ReFuzzer, a framework for refining LLM-generated test programs by systematically detecting and correcting compilation and runtime violations (e.g. division by zero or array out-of-bounds accesses). ReFuzzer employs a feedback loop with a local LLM to validate and filter erroneous programs before execution, improving fuzzing effectiveness beyond crash detection and enabling the generation of diverse yet valid test programs. We evaluated ReFuzzer's effectiveness across black-, grey- and white-box fuzzing approaches targeting LLVM/Clang. ReFuzzer improved test programs' validity from 47.0-49.4% to 96.6-97.3%, with an average processing time of 2.9-3.5 s per test program on a dual-GPU machine. Further, refuzzing significantly increased code coverage in critical optimization and IR generation components. For example, vectorization coverage had an absolute improvement of 9.2%, 2.3%, and 7.1% in black-, grey-, and white-box fuzzing, enhancing testing effectiveness.

🔗 View Paper 📄 PDF

17. DyCAF-Net: Dynamic Class-Aware Fusion Network

Authors: Md Abrar Jahin, Shahriar Soudeep, M. F. Mridha, Nafiz Fahad, Md. Jakir Hossen • Published: 2025-08-05 • Source: arXiv

Recent advancements in object detection rely on modular architectures with multi-scale fusion and attention mechanisms. However, static fusion heuristics and class-agnostic attention limit performance in dynamic scenes with occlusions, clutter, and class imbalance. We introduce Dynamic Class-Aware Fusion Network (DyCAF-Net) that addresses these challenges through three innovations: (1) an input-conditioned equilibrium-based neck that iteratively refines multi-scale features via implicit fixed-point modeling, (2) a dual dynamic attention mechanism that adaptively recalibrates channel and spatial responses using input- and class-dependent cues, and (3) class-aware feature adaptation that modulates features to prioritize discriminative regions for rare classes. Through comprehensive ablation studies with YOLOv8 and related architectures, alongside benchmarking against nine state-of-the-art baselines, DyCAF-Net achieves significant improvements in precision, mAP@50, and mAP@50-95 across 13 diverse benchmarks, including occlusion-heavy and long-tailed datasets. The framework maintains computational efficiency ($\sim$11.1M parameters) and competitive inference speeds, while its adaptability to scale variance, semantic overlaps, and class imbalance positions it as a robust solution for real-world detection tasks in medical imaging, surveillance, and autonomous systems.

🔗 View Paper 📄 PDF

18. MetaScope: Optics-Driven Neural Network for Ultra-Micro Metalens Endoscopy

Authors: Wuyang Li, Wentao Pan, Xiaoyuan Liu, Zhendong Luo, Chenxin Li, Hengyu Liu, Din Ping Tsai, Mu Ku Chen, Yixuan Yuan • Published: 2025-08-05 • Source: arXiv

Miniaturized endoscopy has advanced accurate visual perception within the human body. Prevailing research remains limited to conventional cameras employing convex lenses, where the physical constraints with millimetre-scale thickness impose serious impediments on the micro-level clinical. Recently, with the emergence of meta-optics, ultra-micro imaging based on metalenses (micron-scale) has garnered great attention, serving as a promising solution. However, due to the physical difference of metalens, there is a large gap in data acquisition and algorithm research. In light of this, we aim to bridge this unexplored gap, advancing the novel metalens endoscopy. First, we establish datasets for metalens endoscopy and conduct preliminary optical simulation, identifying two derived optical issues that physically adhere to strong optical priors. Second, we propose MetaScope, a novel optics-driven neural network tailored for metalens endoscopy driven by physical optics. MetaScope comprises two novel designs: Optics-informed Intensity Adjustment (OIA), rectifying intensity decay by learning optical embeddings, and Optics-informed Chromatic Correction (OCC), mitigating chromatic aberration by learning spatial deformations informed by learned Point Spread Function (PSF) distributions. To enhance joint learning, we further deploy a gradient-guided distillation to transfer knowledge from the foundational model adaptively. Extensive experiments demonstrate that MetaScope not only outperforms state-of-the-art methods in both metalens segmentation and restoration but also achieves impressive generalized ability in real biomedical scenes.

🔗 View Paper 📄 PDF

19. Decoding and Engineering the Phytobiome Communication for Smart Agriculture

Authors: Fatih Gulec, Hamdan Awan, Nigel Wallbridge, Andrew W. Eckford • Published: 2025-08-05 • Source: arXiv

Smart agriculture applications, integrating technologies like the Internet of Things and machine learning/artificial intelligence (ML/AI) into agriculture, hold promise to address modern challenges of rising food demand, environmental pollution, and water scarcity. Alongside the concept of the phytobiome, which defines the area including the plant, its environment, and associated organisms, and the recent emergence of molecular communication (MC), there exists an important opportunity to advance agricultural science and practice using communication theory. In this article, we motivate to use the communication engineering perspective for developing a holistic understanding of the phytobiome communication and bridge the gap between the phytobiome communication and smart agriculture. Firstly, an overview of phytobiome communication via molecular and electrophysiological signals is presented and a multi-scale framework modeling the phytobiome as a communication network is conceptualized. Then, how this framework is used to model electrophysiological signals is demonstrated with plant experiments. Furthermore, possible smart agriculture applications, such as smart irrigation and targeted delivery of agrochemicals, through engineering the phytobiome communication are proposed. These applications merge ML/AI methods with the Internet of Bio-Nano-Things enabled by MC and pave the way towards more efficient, sustainable, and eco-friendly agricultural production. Finally, the implementation challenges, open research issues, and industrial outlook for these applications are discussed.

🔗 View Paper 📄 PDF

20. Beyond Meme Templates: Limitations of Visual Similarity Measures in Meme Matching

Authors: Muzhaffar Hazman, Susan McKeever, Josephine Griffith • Published: 2025-08-05 • Source: arXiv

Internet memes, now a staple of digital communication, play a pivotal role in how users engage within online communities and allow researchers to gain insight into contemporary digital culture. These engaging user-generated content are characterised by their reuse of visual elements also found in other memes. Matching instances of memes via these shared visual elements, called Meme Matching, is the basis of a wealth of meme analysis approaches. However, most existing methods assume that every meme consists of a shared visual background, called a Template, with some overlaid text, thereby limiting meme matching to comparing the background image alone. Current approaches exclude the many memes that are not template-based and limit the effectiveness of automated meme analysis and would not be effective at linking memes to contemporary web-based meme dictionaries. In this work, we introduce a broader formulation of meme matching that extends beyond template matching. We show that conventional similarity measures, including a novel segment-wise computation of the similarity measures, excel at matching template-based memes but fall short when applied to non-template-based meme formats. However, the segment-wise approach was found to consistently outperform the whole-image measures on matching non-template-based memes. Finally, we explore a prompting-based approach using a pretrained Multimodal Large Language Model for meme matching. Our results highlight that accurately matching memes via shared visual elements, not just background templates, remains an open challenge that requires more sophisticated matching techniques.

🔗 View Paper 📄 PDF

🤖 AI Research Papers

🤖 AI-Generated Research Summary

Comprehensive Summary of 20 Recent AI, LLM, Agent, and Workflow Research Papers

1. Key Research Trends

2. Breakthrough Findings

3. Methodological Approaches

4. Applications and Use Cases

5. Future Directions

Conclusion