DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published 4 days ago • 109
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published Aug 22, 2025 • 160
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios Paper • 2509.21766 • Published Sep 26, 2025 • 23