About Me

Hi, my name is Eric Todd. I'm a third-year PhD student at Northeastern University advised by Professor David Bau. Prior to beginning my PhD, I studied Applied and Computational Mathematics at Brigham Young University (BYU).

I'm interested in understanding the learned structure of large neural networks, and how their internal representations enable their impressive generalization capabilities.

My research interests generally include machine learning and interpretability. I'm particularly excited by generative models and their applications in natural language and computer vision.


News

2025

January 2025

Our NNsight and NDIF paper was accepted to ICLR 2025! I'm really excited about this framework for enabling research on large-scale AI, and about the mission of NDIF in general.
Contributed to a new review-style preprint (with many others!), Open Problems in Mechanistic Interpretability, which details what kinds of problems we're currently thinking about as interpretability researchers and also what kinds of questions we still don't know the answers to.

2024

August 2024

Our causal interpretability survey is out on arXiv. As interpretability researchers, we're still trying to understand the right level of abstraction for thinking about neural network computation, but causal methods have become a promising method for studying them.

July 2024

Our preprint about NNsight and NDIF is out on arXiv. I'm excited about this framework for enabling access to the internal computations of large foundation models!

May 2024

Invited talk on Function Vectors at the Princeton Neuroscience Institute.
Had a great time presenting our Function Vectors work at ICLR 2024.

January 2024

Our Function Vectors paper was accepted to ICLR 2024!

Selected Publications

nnsight-paper NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals. Jaden Fiotto-Kaufman*, Alexander R. Loftus*, Eric Todd, Jannik Brinkmann, Koyena Pal, Dmitrii Troitskii, Michael Ripa, Adam Belfki, Can Rager, Caden Juang, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Nikhil Prakash, Carla Brodley, Arjun Guha, Jonathan Bell, Byron C. Wallace, David Bau. Proceedings of the 2025 International Conference on Learning Representations. (ICLR 2025)

We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of very large neural networks. NNsight is an open-source system that extends PyTorch to introduce deferred remote execution. NDIF is a scalable inference service that executes NNsight requests, allowing users to share GPU resources and pretrained models. These technologies are enabled by the intervention graph, an architecture developed to decouple experiment design from model runtime. Together, this framework provides transparent and efficient access to the internals of deep neural networks such as very large language models (LLMs) without imposing the cost or complexity of hosting customized models individually. We conduct a quantitative survey of the machine learning literature that reveals a growing gap in the study of the internals of large-scale AI. We demonstrate the design and use of our framework to address this gap by enabling a range of research methods on huge models. Finally, we conduct benchmarks to compare performance with previous approaches. Code, documentation, and tutorials are available at https://nnsight.net/.

fv-paper Function Vectors in Large Language Models.
Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, David Bau. Proceedings of the 2024 International Conference on Learning Representations. (ICLR 2024)

We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV). FVs are robust to changes in context, i.e., they trigger execution of the task on inputs such as zero-shot and natural text settings that do not resemble the ICL contexts from which they are collected. We test FVs across a range of tasks, models, and layers and find strong causal effects across settings in middle layers. We investigate the internal structure of FVs and find while that they often contain information that encodes the output space of the function, this information alone is not sufficient to reconstruct an FV. Finally, we test semantic vector composition in FVs, and find that to some extent they can be summed to create vectors that trigger new complex tasks. Our findings show that compact, causal internal vector representations of function abstractions can be explicitly extracted from LLMs.