About me

I am a third year PhD student researching language model interpretability at the MIT ORC advised by Dimitris Bertsimas. I am interested in developing a mechanistic understanding of neural networks and using this to better monitor, control, and align advanced AI systems. I am fortunate to also collaborate with Neel Nanda and Max Tegmark and am graciously supported by an Open Philanthropy early career grant.

Previously, I was a software engineer at Google working on data pipelines for the storage analytics team, and researched fairness-optimized political redistricting with David Shmoys.

Publications

  • Universal Neurons in GPT2 Language Models
    by Wes Gurnee, Theo Horsley, Zifan Carl Guo, Tara Rezaei Kheirkhah, Qinyi Sun, Will Hathaway, Neel Nanda, Dimitris Bertsimas
    Under review [arXiv] [Twitter]
  • Language Models Represent Space and Time
    by Wes Gurnee and Max Tegmark.
    Published at ICLR ‘24 [arXiv] [Twitter]
  • Training Dynamics of Contextual N-Grams in Language Models
    by Lucia Quirke, Lovis Heindrich, Wes Gurnee, and Neel Nanda
    Published in NeurIPS 2023 ATTRIB Workshop [arXiv] [Twitter]
  • Finding Neurons in a Haystack: Case Studies with Sparse Probing
    by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dimitrii Troitskii, and Dimitris Bertsimas
    Published in TMLR [Paper] [arXiv] [Twitter]
  • Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization
    by Wes Gurnee and Dimitris Bertsimas.
    Published in Nonlinear Dynamics [Paper] [arXiv] [Twitter]
  • Combatting gerrymandering with social choice: The design of multi-member districts
    by Nikhil Garg, Wes Gurnee, David Rothschild, David Shmoys
    Published in EC ‘23 [Paper] [arXiv] [Talk]
  • Fairmandering: A column generation heuristic for fairness-optimized political districting
    by Wes Gurnee and David Shmoys
    Best paper award at SIAM ACDA ‘21 [Paper] [arXiv] [Talk]

Other Projects and Writing