Stop guessing why your LLMs break: Anthropic’s new tool shows you exactly what goes wrong

Large language models (LLMs) are posing a variety of challenges to enterprises as they become increasingly common in operations, but one of the most difficult to overcome is their “black box” nature, leaving users without answers to why the AI has made a specific decision or given a certain response.
Anthropic, an AI research and development laboratory, has launched an open-source circuit tracing tool that allows researchers and developers to access a model’s internal workings, which can help them tune LLMs, as well as investigate unexplained errors.
The tool generates attribution graphs, which show the interrelation between different features as the model processes information; the user can then intervene and observe how this affects the output.
Though the tool currently has associated memory costs and complexities, as it is still a relatively new area of research, Anthropic’s move to open source the tool should enable these challenges to be overcome more rapidly via community collaboration.

Fast Feed