Summary

  • Salesforce AI Research has developed MCP-Universe, an open-source benchmark that assesses the performance of language models (LM) based on their interactions with the Model Context Protocol (MCP) in real-world situations.
  • MCP captures how well a model performs by testing it on various tasks, including location navigation, financial analysis and browser automation, and assigning marks based on its functionality.
  • In testing GPT-5 achieved the highest success rate, but the models struggled with long speeches, in particular with navigation, automation of browsers and financial analysis, where their performance dropped significantly
  • These findings show that current LMs are yet unable to perform diverse real-world tasks and MCP-Universe provides a necessary testbed for their evaluation.

By Emilia David

Original Article