🤖 AI-Generated Research Summary

Comprehensive Summary of Recent Research on AI, LLMs, Agents, and Workflows

This summary synthesizes insights from 21 recent research papers focusing on large language models (LLMs), retrieval-augmented generation (RAG), autonomous agents, and workflow automation. The analysis is structured to highlight key research trends, breakthrough findings, methodological approaches, applications, and future directions.

1. Key Research Trends

A. Proliferation and Enhancement of RAG Systems

RAG as a Solution to LLM Limitations: Many papers (e.g., 1, 3, 4, 6, 11, 13, 14, 19, 21) focus on RAG to address LLM shortcomings such as hallucinations, outdated knowledge, and lack of transparency.
Benchmarking and Evaluation: New benchmarks (e.g., CRAG, SummHay) and systematic evaluations are being developed to assess RAG systems in diverse, real-world, and domain-specific contexts (papers 4, 10, 13).

B. Emergence of Autonomous and Domain-Specific AI Agents

General-Purpose and Specialized Agents: There is a surge in both general-purpose (e.g., Manus AI, xLAM) and domain-specific agents (e.g., SpatialAgent for biology, MusicAgent for music, FD-LLM for fault diagnosis).
Multi-Agent Simulations: Large-scale agent societies and simulations (e.g., Project Sid) are being explored to model complex, emergent behaviors.

C. Workflow and Business Process Automation

AI-Driven Automation: Integration of AI into business workflows and ERP systems is a growing trend, aiming to optimize resource use, reduce costs, and enhance decision-making (papers 9, 12, 20).
Low-Code and Visual Tools: The adoption of graphical and low-code platforms is recognized as a key enabler for AI-driven workflow automation (paper 12).

D. Security, Privacy, and Robustness

Vulnerabilities in RAG and LLMs: Research is uncovering new attack vectors (e.g., jamming, privacy leaks) and proposing methods to enhance robustness and privacy (papers 1, 3, 21).

2. Breakthrough Findings

Jamming Attacks on RAG (Paper 1): Demonstrates that RAG systems can be easily disrupted by adversarial "blocker" documents, raising concerns about security in open or untrusted databases.
Speculative RAG (Paper 19): Introduces a novel framework that uses a larger LM to efficiently verify multiple RAG drafts, improving both efficiency and accuracy.
Vul-RAG for Vulnerability Detection (Paper 21): Shows that LLMs alone are poor at distinguishing subtle code vulnerabilities, but RAG with multi-dimensional knowledge significantly boosts detection accuracy.
Manus AI (Paper 2): Presents a fully autonomous digital agent capable of end-to-end task execution, bridging reasoning and action in a general-purpose framework.
Project Sid (Paper 16): Pioneers large-scale, many-agent simulations to study emergent civilizational behaviors, enabled by the PIANO architecture.
xLAM (Paper 18): Releases a family of open-source large action models specifically designed for agent tasks, addressing the lack of standardized agent datasets and protocols.

3. Methodological Approaches

Retrieval-Augmented Generation (RAG): Used extensively to supplement LLMs with external, up-to-date, or domain-specific knowledge (papers 6, 11, 13, 14, 19, 21).
Fine-Tuning and Instruction Tuning: Models are fine-tuned on curated datasets or with additional instructions to improve domain adaptation and reduce hallucinations (papers 6, 11, 19).
Benchmarking and Evaluation Frameworks: New benchmarks (CRAG, SummHay) and systematic evaluation protocols are being developed for more realistic and challenging assessments (papers 4, 10, 13).
Multi-Modal and Multi-Source Integration: Some agents (e.g., SpatialAgent, FD-LLM) integrate textual, sensor, and multimodal data for richer reasoning and analysis (papers 5, 17).
Graphical and Low-Code Workflow Modelling: Visual tools and low-code platforms are leveraged to bridge domain and technical expertise in workflow automation (paper 12).
Security and Privacy Audits: Empirical studies and attack simulations are used to uncover vulnerabilities and privacy risks in RAG and LLM systems (papers 1, 3, 21).

4. Applications and Use Cases

Scientific Research: SpatialAgent automates spatial biology research pipelines; FD-LLM applies LLMs to machine fault diagnosis.
Healthcare and Medicine: RAG systems are benchmarked for medical QA, addressing hallucinations and knowledge gaps (paper 13).
Software Engineering and Security: Vul-RAG enhances vulnerability detection in code; Manus AI and xLAM target general and specialized agent tasks.
Business Process Automation: AI is integrated into ERP systems and business workflows for automation and efficiency (papers 9, 12, 20).
Music Understanding and Generation: MusicAgent provides a unified agent for diverse music processing tasks (paper 15).
Cloud Operations: AI agents are designed for autonomous cloud management and AIOps (paper 7).
Gaming and Simulation: MineStudio streamlines agent development in Minecraft; Project Sid explores large-scale agent societies (papers 8, 16).

5. Future Directions

Robustness and Security: Further research is needed to defend RAG and LLM systems against adversarial attacks and privacy breaches.
Standardization and Benchmarks: The field calls for more comprehensive, realistic, and domain-specific benchmarks to guide development and evaluation.
Scalable Multi-Agent Systems: Exploration of large-scale, interactive agent societies will deepen understanding of emergent behaviors and collective intelligence.
Domain Adaptation and Multimodality: Continued integration of diverse data types (text, sensor, multimodal) and domain-specific knowledge will expand AI applicability.
Human-AI Collaboration: Low-code and visual tools will democratize AI workflow automation, enabling broader adoption across industries.
Autonomous End-to-End Agents: The push toward fully autonomous agents capable of complex, real-world task execution will accelerate, with implications for both productivity and oversight.

Conclusion

This collection of papers highlights a vibrant and rapidly evolving landscape in AI research, with a strong focus on enhancing LLMs through RAG, developing robust and autonomous agents, and integrating AI into practical workflows. Security, privacy, and evaluation remain critical challenges, while new benchmarks, methodologies, and applications continue to push the boundaries of what AI systems can achieve. Researchers and practitioners should pay close attention to the interplay between robustness, usability, and domain adaptation as the field moves toward more autonomous, reliable, and impactful AI solutions.

📚 Semantic Scholar (21 papers)

1. From Mind to Machine: The Rise of Manus AI as a Fully Autonomous Digital Agent

Authors: Minjie Shen, Qikai Yang • Published: 2025-05-04 • Source: Semantic Scholar

Manus AI is a general-purpose AI agent introduced in early 2025, marking a significant advancement in autonomous artificial intelligence. Developed by the Chinese startup Monica.im, Manus is designed to bridge the gap between"mind"and"hand"- combining the reasoning and planning capabilities of large language models with the ability to execute complex, end-to-end tasks that produce tangible outcomes. This paper presents a comprehensive overview of Manus AI, exploring its core technical architecture, diverse applications across sectors such as healthcare, finance, manufacturing, robotics, and gaming, as well as its key strengths, current limitations, and future potential. Positioned as a preview of what lies ahead, Manus AI represents a shift toward intelligent agents that can translate high-level intentions into real-world actions, heralding a new era of human-AI collaboration.

🔗 View Paper

2. SpatialAgent: An autonomous AI agent for spatial biology

Authors: Hanchen Wang, Yichun He, Paula P. Coelho, Matthew Bucci, Abbas Nazir, Bob Chen, Linh Trinh, Serena Zhang, Kexin Huang, Vineethkrishna Chandrasekar, Douglas C. Chung, Minsheng Hao, A. C. Leote, Yongju Lee, Bo Li, Tianyu Liu, Jin Liu, Romain Lopez, Tawaun Lucas, Mingyu Derek Ma, Nikita Makarov, Lisa M. McGinnis, Linna Peng, Stephen Ra, Gabriele Scalia, Avtar Singh, Liming Tao, Masatoshi Uehara, Chenyu Wang, Runmin Wei, Ryan Copping, O. Rozenblatt-Rosen, J. Leskovec, Aviv Regev • Published: 2025-04-06 • Source: Semantic Scholar

Advances in AI are transforming scientific discovery, yet spatial biology, a field that deciphers the molecular organization within tissues, remains constrained by labor-intensive workflows. Here, we present SpatialAgent, a fully autonomous AI agent dedicated for spatial-biology research. SpatialAgent integrates large language models with dynamic tool execution and adaptive reasoning. SpatialAgent spans the entire research pipeline, from experimental design to multimodal data analysis and hypothesis generation. Tested on multiple datasets comprising two million cells from human brain, heart, and a mouse colon colitis model, SpatialAgent’s performance surpassed the best computational methods, matched or outperformed human scientists across key tasks, and scaled across tissues and species. By combining autonomy with human collaboration, SpatialAgent establishes a new paradigm for AI-driven discovery in spatial biology.

🔗 View Paper

3. Graphical AI workflow modelling: Identifying relevant competencies in AI-based automation of business processes

Authors: Iris Graessler, Deniz Oezcan • Published: 2025-01-01 • Source: Semantic Scholar

This research investigates how Artificial Intelligence (AI) can be systematically integrated into existing business processes by combining suitable competencies with graphical AI workflow modelling. While AI offers a high potential for automation and increased efficiency, its implementation often fails due to a lack of interdisciplinary competencies that bridge the gap between domain expertise and IT know-how. Low-code platforms and visual modelling tools are increasingly recognised as enablers, empowering non-programmers to intuitively create graphical AI- based workflows. Nevertheless, specific competencies are required to realise the full potential of AI, the domain specific knowledge and align technical understanding with AI capabilities. The paper reviews the state of the art in AI-driven business process automation and competencies for visual low-code approaches. It then presents a practical solution to identify and systematise essential competence areas. Based on this, a practical competence model is developed to support the design of user-friendly, AI-enabled workflows. This is tested in a practical application context — emergency management — where it supports critical decision-making processes and is validated through expert feedback. The study concludes by offering actionable recommendations to help organisations foster the necessary competencies and methods for competently integrating AI into their digital processes.

🔗 View Paper

4. MineStudio: A Streamlined Package for Minecraft AI Agent Development

Authors: Shaofei Cai, Zhancun Mu, Kaichen He, Bowei Zhang, Xinyue Zheng, Anji Liu, Yitao Liang • Published: 2024-12-24 • Source: Semantic Scholar

Minecraft's complexity and diversity as an open world make it a perfect environment to test if agents can learn, adapt, and tackle a variety of unscripted tasks. However, the development and validation of novel agents in this setting continue to face significant engineering challenges. This paper presents MineStudio, an open-source software package designed to streamline the development of autonomous agents in Minecraft. MineStudio represents the first comprehensive integration of seven critical engineering components: simulator, data, model, offline pre-training, online fine-tuning, inference, and benchmark, thereby allowing users to concentrate their efforts on algorithm innovation. We provide a user-friendly API design accompanied by comprehensive documentation and tutorials. Our project is released at https://github.com/CraftJarvis/MineStudio.

🔗 View Paper

5. FD-LLM: Large Language Model for Fault Diagnosis of Machines

Authors: Hamzah A. A. M. Qaid, Bo Zhang, Dan Li, See-Kiong Ng, Wei Li • Published: 2024-12-02 • Source: Semantic Scholar

Large language models (LLMs) are effective at capturing complex, valuable conceptual representations from textual data for a wide range of real-world applications. However, in fields like Intelligent Fault Diagnosis (IFD), incorporating additional sensor data-such as vibration signals, temperature readings, and operational metrics-is essential but it is challenging to capture such sensor data information within traditional text corpora. This study introduces a novel IFD approach by effectively adapting LLMs to numerical data inputs for identifying various machine faults from time-series sensor data. We propose FD-LLM, an LLM framework specifically designed for fault diagnosis by formulating the training of the LLM as a multi-class classification problem. We explore two methods for encoding vibration signals: the first method uses a string-based tokenization technique to encode vibration signals into text representations, while the second extracts statistical features from both the time and frequency domains as statistical summaries of each signal. We assess the fault diagnosis capabilities of four open-sourced LLMs based on the FD-LLM framework, and evaluate the models' adaptability and generalizability under various operational conditions and machine components, namely for traditional fault diagnosis, cross-operational conditions, and cross-machine component settings. Our results show that LLMs such as Llama3 and Llama3-instruct demonstrate strong fault detection capabilities and significant adaptability across different operational conditions, outperforming state-of-the-art deep learning (DL) approaches in many cases.

🔗 View Paper

6. Project Sid: Many-agent simulations toward AI civilization

Authors: AL Altera., Andrew Ahn, Nic Becker, Stephanie Carroll, Nico Christie, Manuel Cortes, Arda Demirci, Melissa Du, Frankie Li, Shuying Luo, Peter Y Wang, Mathew Willows, Feitong Yang, Guangyu Robert Yang • Published: 2024-10-31 • Source: Semantic Scholar

AI agents have been evaluated in isolation or within small groups, where interactions remain limited in scope and complexity. Large-scale simulations involving many autonomous agents -- reflecting the full spectrum of civilizational processes -- have yet to be explored. Here, we demonstrate how 10 - 1000+ AI agents behave and progress within agent societies. We first introduce the PIANO (Parallel Information Aggregation via Neural Orchestration) architecture, which enables agents to interact with humans and other agents in real-time while maintaining coherence across multiple output streams. We then evaluate agent performance in agent simulations using civilizational benchmarks inspired by human history. These simulations, set within a Minecraft environment, reveal that agents are capable of meaningful progress -- autonomously developing specialized roles, adhering to and changing collective rules, and engaging in cultural and religious transmission. These preliminary results show that agents can achieve significant milestones towards AI civilizations, opening new avenues for large simulations, agentic organizational intelligence, and integrating AI into human civilizations.

🔗 View Paper

7. xLAM: A Family of Large Action Models to Empower AI Agent Systems

Authors: Jianguo Zhang, Tian Lan, Ming Zhu, Zuxin Liu, Thai Hoang, Shirley Kokane, Weiran Yao, Juntao Tan, Akshara Prabhakar, Haolin Chen, Zhiwei Liu, Yihao Feng, T. Awalgaonkar, Rithesh Murthy, Eric Hu, Zeyuan Chen, Ran Xu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong • Published: 2024-09-05 • Source: Semantic Scholar

Autonomous agents powered by large language models (LLMs) have attracted significant research interest. However, the open-source community faces many challenges in developing specialized models for agent tasks, driven by the scarcity of high-quality agent datasets and the absence of standard protocols in this area. We introduce and publicly release xLAM, a series of large action models designed for AI agent tasks. The xLAM series includes five models with both dense and mixture-of-expert architectures, ranging from 1B to 8x22B parameters, trained using a scalable, flexible pipeline that unifies, augments, and synthesizes diverse datasets to enhance AI agents' generalizability and performance across varied environments. Our experimental results demonstrate that xLAM consistently delivers exceptional performance across multiple agent ability benchmarks, notably securing the 1st position on the Berkeley Function-Calling Leaderboard, outperforming GPT-4, Claude-3, and many other models in terms of tool use. By releasing the xLAM series, we aim to advance the performance of open-source LLMs for autonomous AI agents, potentially accelerating progress and democratizing access to high-performance models for agent tasks. Models are available at https://huggingface.co/collections/Salesforce/xlam-models-65f00e2a0a63bbcd1c2dade4

🔗 View Paper

8. AI-Enhanced Workflow Automation within ERP Systems

Authors: M. Khaing, Than Than Htike • Published: 2024-08-07 • Source: Semantic Scholar

Enterprise Resource Planning (ERP) systems are critical for managing diverse business processes, including finance, human resources, supply chain management, and customer relationship management. Traditional ERP systems often struggle with workflow automation and real-time decision-making due to their reliance on manual configurations and limited automation capabilities. This paper explores the integration of Artificial Intelligence (AI) into ERP systems to enhance workflow automation. Specifically, it examines the use of Artificial Neural Networks (ANNs) to improve predictive accuracy and operational efficiency. The study includes a detailed analysis of the impact of AI-enhanced workflows on process cycle times, error rates, and resource utilization, supported by a case study involving a retail company. The results demonstrate significant improvements in operational efficiency, accuracy of predictions, and user satisfaction. The paper also discusses the challenges associated with AI integration, such as data quality, system complexity, and user acceptance, and provides recommendations for successful implementation. Future research directions are suggested, including the exploration of additional emerging technologies and the development of comprehensive integration frameworks.

🔗 View Paper

9. Building AI Agents for Autonomous Clouds: Challenges and Design Principles

Authors: Manisha M Shetty, Yinfang Chen, Gagan Somashekar, Ming-Jie Ma, Yogesh L. Simmhan, Xuchao Zhang, Jonathan Mace, Dax Vandevoorde, P. Las-Casas, Shachee Mishra Gupta, Suman Nath, Chetan Bansal, S. Rajmohan • Published: 2024-07-16 • Source: Semantic Scholar

The rapid growth in the use of Large Language Models (LLMs) and AI Agents as part of software development and deployment is revolutionizing the information technology landscape. While code generation receives significant attention, a higher-impact application lies in using agents for the operational resilience of cloud services, which currently require significant human effort and domain knowledge. There is a growing interest in AI for IT Operations (AIOps), which aims to automate complex operational tasks, like fault localization and root cause analysis, reducing human intervention and customer impact. However, achieving the vision of autonomous and self-healing clouds through AIOps is hampered by the lack of standardized frameworks for building, evaluating, and improving AIOps agents. This vision paper lays the groundwork for such a framework by framing the requirements and then discussing design decisions that satisfy them. We also propose AIOpsLab, a prototype implementation leveraging agent-cloud-interface that orchestrates an application, injects real-time faults using chaos engineering, and interfaces with an agent to localize and resolve the faults. We report promising results and lay the groundwork to build a modular and robust framework for building, evaluating, and improving agents for autonomous clouds.

🔗 View Paper

10. Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Authors: Zilong Wang, Zifeng Wang, Long T. Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, Chen-Yu Lee, Tomas Pfister • Published: 2024-07-11 • Source: Semantic Scholar

Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Speculative RAG - a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. Each draft is generated from a distinct subset of retrieved documents, offering diverse perspectives on the evidence while reducing input token counts per draft. This approach enhances comprehension of each subset and mitigates potential position bias over long context. Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts. Extensive experiments demonstrate that Speculative RAG achieves state-of-the-art performance with reduced latency on TriviaQA, MuSiQue, PopQA, PubHealth, and ARC-Challenge benchmarks. It notably enhances accuracy by up to 12.97% while reducing latency by 50.83% compared to conventional RAG systems on PubHealth.

🔗 View Paper

11. Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Authors: Philippe Laban, A. R. Fabbri, Caiming Xiong, Chien-Sheng Wu • Published: 2024-07-01 • Source: Semantic Scholar

LLMs and RAG systems are now capable of handling millions of input tokens or more. However, evaluating the output quality of such systems on long-context tasks remains challenging, as tasks like Needle-in-a-Haystack lack complexity. In this work, we argue that summarization can play a central role in such evaluation. We design a procedure to synthesize Haystacks of documents, ensuring that specific insights repeat across documents. The “Summary of a Haystack” (SummHay) task then requires a system to process the Haystack and generate, given a query, a summary that identifies the relevant insights and precisely cites the source documents. Since we have precise knowledge of what insights should appear in a haystack summary and what documents should be cited, we implement a highly reproducible automatic evaluation that can score summaries on two aspects – Coverage and Citation. We generate Haystacks in two domains (conversation, news), and perform a large-scale evaluation of 10 LLMs and corresponding 50 RAG systems. Our findings indicate that SummHay is an open challenge for current systems, as even systems provided with an Oracle signal of document relevance lag our estimate of human performance (56%) by 10+ points on a Joint Score. Without a retriever, long-context LLMs like GPT-4o and Claude 3 Opus score below 20% on SummHay. We show SummHay can also be used to study enterprise RAG systems and position bias in long-context models. We hope future systems can equal and surpass human performance on SummHay.

🔗 View Paper

12. Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Authors: Xueying Du, Geng Zheng, Kaixin Wang, Jiayi Feng, Wentai Deng, Mingwei Liu, Bihuan Chen, Xin Peng, Tao Ma, Yiling Lou • Published: 2024-06-17 • Source: Semantic Scholar

Although LLMs have shown promising potential in vulnerability detection, this study reveals their limitations in distinguishing between vulnerable and similar-but-benign patched code (only 0.06 - 0.14 accuracy). It shows that LLMs struggle to capture the root causes of vulnerabilities during vulnerability detection. To address this challenge, we propose enhancing LLMs with multi-dimensional vulnerability knowledge distilled from historical vulnerabilities and fixes. We design a novel knowledge-level Retrieval-Augmented Generation framework Vul-RAG, which improves LLMs with an accuracy increase of 16% - 24% in identifying vulnerable and patched code. Additionally, vulnerability knowledge generated by Vul-RAG can further (1) serve as high-quality explanations to improve manual detection accuracy (from 60% to 77%), and (2) detect 10 previously-unknown bugs in the recent Linux kernel release with 6 assigned CVEs.

🔗 View Paper

13. Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents

Authors: Avital Shafran, R. Schuster, Vitaly Shmatikov • Published: 2024-06-09 • Source: Semantic Scholar

Retrieval-augmented generation (RAG) systems respond to queries by retrieving relevant documents from a knowledge database and applying an LLM to the retrieved documents. We demonstrate that RAG systems that operate on databases with untrusted content are vulnerable to denial-of-service attacks we call jamming. An adversary can add a single ``blocker'' document to the database that will be retrieved in response to a specific query and result in the RAG system not answering this query, ostensibly because it lacks relevant information or because the answer is unsafe. We describe and measure the efficacy of several methods for generating blocker documents, including a new method based on black-box optimization. Our method (1) does not rely on instruction injection, (2) does not require the adversary to know the embedding or LLM used by the target RAG system, and (3) does not employ an auxiliary LLM. We evaluate jamming attacks on several embeddings and LLMs and demonstrate that the existing safety metrics for LLMs do not capture their vulnerability to jamming. We then discuss defenses against blocker documents.

🔗 View Paper

14. CRAG - Comprehensive RAG Benchmark

Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Y. Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar, Wen-tau Yih, Xin Dong • Published: 2024-06-07 • Source: Semantic Scholar

Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation of this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve<=34% accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% of questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge and attracted thousands of participants and submissions. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions. CRAG is available at https://github.com/facebookresearch/CRAG/.

🔗 View Paper

15. Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases

Authors: Jiarui Li, Ye Yuan, Zehua Zhang • Published: 2024-03-15 • Source: Semantic Scholar

We proposed an end-to-end system design towards utilizing Retrieval Augmented Generation (RAG) to improve the factual accuracy of Large Language Models (LLMs) for domain-specific and time-sensitive queries related to private knowledge-bases. Our system integrates RAG pipeline with upstream datasets processing and downstream performance evaluation. Addressing the challenge of LLM hallucinations, we finetune models with a curated dataset which originates from CMU's extensive resources and annotated with the teacher model. Our experiments demonstrate the system's effectiveness in generating more accurate answers to domain-specific and time-sensitive inquiries. The results also revealed the limitations of fine-tuning LLMs with small-scale and skewed datasets. This research highlights the potential of RAG systems in augmenting LLMs with external datasets for improved performance in knowledge-intensive tasks. Our code and models are available on Github.

🔗 View Paper

16. RAFT: Adapting Language Model to Domain Specific RAG

Authors: Tianjun Zhang, Shishir G. Patil, Naman Jain, Sheng Shen, M. Zaharia, Ion Stoica, Joseph Gonzalez • Published: 2024-03-15 • Source: Semantic Scholar

Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain such new knowledge remains an open question. In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a"open-book"in-domain settings. In RAFT, given a question, and a set of retrieved documents, we train the model to ignore those documents that don't help in answering the question, which we call, distractor documents. RAFT accomplishes this by citing verbatim the right sequence from the relevant document that would help answer the question. This coupled with RAFT's chain-of-thought-style response helps improve the model's ability to reason. In domain-specific RAG, RAFT consistently improves the model's performance across PubMed, HotpotQA, and Gorilla datasets, presenting a post-training recipe to improve pre-trained LLMs to in-domain RAG. RAFT's code and demo are open-sourced at github.com/ShishirPatil/gorilla.

🔗 View Paper

17. The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)

Authors: Shenglai Zeng, Jiankun Zhang, Pengfei He, Yue Xing, Yiding Liu, Han Xu, Jie Ren, Shuaiqiang Wang, Dawei Yin, Yi Chang, Jiliang Tang • Published: 2024-02-23 • Source: Semantic Scholar

Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data, where data privacy is a pivotal concern. Whereas extensive research has demonstrated the privacy risks of large language models (LLMs), the RAG technique could potentially reshape the inherent behaviors of LLM generation, posing new privacy issues that are currently under-explored. In this work, we conduct extensive empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database. Despite the new risk brought by RAG on the retrieval data, we further reveal that RAG can mitigate the leakage of the LLMs' training data. Overall, we provide new insights in this paper for privacy protection of retrieval-augmented LLMs, which benefit both LLMs and RAG systems builders. Our code is available at https://github.com/phycholosogy/RAG-privacy.

🔗 View Paper

18. Benchmarking Retrieval-Augmented Generation for Medicine

Authors: Guangzhi Xiong, Qiao Jin, Zhiyong Lu, Aidong Zhang • Published: 2024-02-20 • Source: Semantic Scholar

While large language models (LLMs) have achieved state-of-the-art performance on a wide range of medical question answering (QA) tasks, they still face challenges with hallucinations and outdated knowledge. Retrieval-augmented generation (RAG) is a promising solution and has been widely adopted. However, a RAG system can involve multiple flexible components, and there is a lack of best practices regarding the optimal RAG setting for various medical purposes. To systematically evaluate such systems, we propose the Medical Information Retrieval-Augmented Generation Evaluation (MIRAGE), a first-of-its-kind benchmark including 7,663 questions from five medical QA datasets. Using MIRAGE, we conducted large-scale experiments with over 1.8 trillion prompt tokens on 41 combinations of different corpora, retrievers, and backbone LLMs through the MedRAG toolkit introduced in this work. Overall, MedRAG improves the accuracy of six different LLMs by up to 18% over chain-of-thought prompting, elevating the performance of GPT-3.5 and Mixtral to GPT-4-level. Our results show that the combination of various medical corpora and retrievers achieves the best performance. In addition, we discovered a log-linear scaling property and the"lost-in-the-middle"effects in medical RAG. We believe our comprehensive evaluations can serve as practical guidelines for implementing RAG systems for medicine.

🔗 View Paper

19. THE ECONOMIC IMPACT OF AI ON LABOR PRODUCTIVITY AND WORKFLOW AUTOMATION

Authors: V. K. Ayupova, Islam A. Magomedov, Dzhamilya A. Borlakova • Published: 2024-01-01 • Source: Semantic Scholar

The article examines the impact of artificial intelligence (AI) on labor productivity and automation of work processes. The process of digitalization and the introduction of AI into various sectors of the economy has a significant impact on improving production efficiency and reducing operating costs. One of the key aspects is the automation of routine tasks and the transformation of work processes, which allows you to optimize the use of resources and speed up business processes. The article also discusses the social and economic consequences of the introduction of AI, including changes in the labor market, the creation of new professions, as well as the risks associated with automation, such as job cuts in low-skilled segments and in-creased inequality. Special attention is paid to the prospects for the further development of AI and its impact on the economy in the long term.

🔗 View Paper

20. Retrieval-Augmented Generation for Large Language Models: A Survey

Authors: Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, Haofen Wang • Published: 2023-12-18 • Source: Semantic Scholar

Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval, the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces up-to-date evaluation framework and benchmark. At the end, this article delineates the challenges currently faced and points out prospective avenues for research and development.

🔗 View Paper

21. MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

Authors: Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian • Published: 2023-10-18 • Source: Semantic Scholar

AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks. Consequently, it is necessary to build a system to organize and integrate these tasks, and thus help practitioners to automatically analyze their demand and call suitable tools as solutions to fulfill their requirements. Inspired by the recent success of large language models (LLMs) in task automation, we develop a system, named MusicAgent, which integrates numerous music-related tools and an autonomous workflow to address user requirements. More specifically, we build 1) toolset that collects tools from diverse sources, including Hugging Face, GitHub, and Web API, etc. 2) an autonomous workflow empowered by LLMs (e.g., ChatGPT) to organize these tools and automatically decompose user requests into multiple sub-tasks and invoke corresponding music tools. The primary goal of this system is to free users from the intricacies of AI-music tools, enabling them to concentrate on the creative aspect. By granting users the freedom to effortlessly combine tools, the system offers a seamless and enriching music experience.

🔗 View Paper

🤖 AI Research Papers

🤖 AI-Generated Research Summary

Comprehensive Summary of Recent Research on AI, LLMs, Agents, and Workflows

1. Key Research Trends

A. Proliferation and Enhancement of RAG Systems

B. Emergence of Autonomous and Domain-Specific AI Agents

C. Workflow and Business Process Automation

D. Security, Privacy, and Robustness

2. Breakthrough Findings

3. Methodological Approaches

4. Applications and Use Cases

5. Future Directions

Conclusion