AI·23 minutes agoMicrosoft tested 19 LLMs as document editors. Even the best ones corrupted 25% of the content.The DELEGATE-52 benchmark tests AI editing across 52 professional domains. Frontier models corrupt a quarter of document content over long workflows.