Our People

Our Mentors

Abram Demski

Abram Demski

MENTOR

Abram Demski is a research fellow at MIRI, whose work includes logical uncertainty, decision theory, and other topics close to the heart of agency. Current interests include untangling the roots of semantics, finding where ontologies come from, creating formal models of denotation as opposed to connotation, uncovering new generalizations of probability theory, and formulating mechanisms that facilitate cooperation and coordination.

Alan Chan

Alan Chan

MENTOR

Alan is a PhD student at Mila, where he works on both technical and sociotechnical approaches to improve coordination for AI safety. His technical work on language models focuses on developing rigorous methodologies for identifying capabilities that could be potentially dangerous or contribute positively to differential technological progress. His sociotechnical work includes characterizing key concepts in AI risk and performing policy research to improve coordination.

Alexander Gietelink Oldenziel

MENTOR

Alexander likes to think about philosophical questions with a mathematical lens: 'What are abstractions and are they convergent'? What are good frameworks to think about minds and goal-directed behaviour? What are the atomic units of computation? How do symbols acquire meaning?
Alexander directs academic outreach at Timaeus, the singular learning theory alignment org, organizing a number of workshops on singular learning theory, computational mechanics and agent foundations in the context of AI alignment. He is also a sometimes PhD candidate at University College London working on theory of computation.

David A Dalrymple

David A. Dalrymple

MENTOR

David is the author of the Open Agency Architecture, an agenda for safe AI. Specific research directions within that include new forms of mild optimization, governance via bargaining solutions in policy-space, category-theoretic world-modeling, and using LLMs and diffusion models to accelerate probabilistic inference and symbolic model-checking. David is a Senior Research Scientist at Protocol Labs where he coinvented Filecoin and invented Hypercerts, until recently was a Research Fellow at Oxford FHI, and studied neuroscience at Harvard and AI at MIT.

Jan Kulveit

Jan Kulveit

MENTOR

Jan is a Research Fellow at the Future of Humanity Institute at the University of Oxford and a researcher at the Center for Theoretical Study in Prague.

His research is centered on studying the behavior and interactions of boundedly rational agents and, more generally, on making AI aligned with human interests. Jan is also interested in modeling complex interacting systems, and strategies to influence the long-term future. He co-organizes the Human-aligned AI Summer School.

Jan Hendrik Kirchner

Jan Hendrik Kirchner

MENTOR

I am a researcher of minds - artificial and biological - with a background in cognitive science and computational neuroscience. After researching the early development of the brain in my PhD, I am now working towards aligning artificial intelligence with human values at OpenAI. I write blog posts “On Brains, Minds, And Their Possible Uses” and care about doing good, better.

John Wentworth

John Wentworth

MENTOR

Though rumored to be an escaped cartoon villain, John is currently an independent AI alignment researcher. He works on questions like: What makes 'trees' or 'cars' natural high-level abstract objects to think about? Would other minds also organize their knowledge this way? Why are evolved biological organisms so modular? Does such modularity carry over to ML systems?

Joseph Bloom

MENTOR

Joseph Bloom is an independent mechanistic interpretability researcher and alumni of the ML Alignment and Theory Scholars (MATS) and Alignment Research Engineering Accelerator (ARENA) programs. Their research focuses on developing and applying techniques for understanding neural network internals, with a view to building a deeper scientific understanding to underpin AI Safety and Alignment agendas. For this purpose, they are currently working on projects to research and build infrastructure for Sparse Autoencoders, a promising new technique which enables us to enumerate network internal representations and use them to understand the behavior of AI systems. .

Patrick Butlin

Patrick Butlin

MENTOR

Patrick Butlin is a Research Fellow at the FHI, where he is a member of the Digital Minds research group. His work is in philosophy of mind and cognitive science, with a current focus on desire, agency and moral status in AI.

Tan Zhi-Xuan

Tan Zhi-Xuan

MENTOR

Xuan is a PhD candidate with the MIT Probabilistic Computing Project and the Computational Cognitive Science research group. Their research focuses on efficient inference over Bayesian models of human decision-making and normativity more broadly, with an eye towards inferring human goals, values, and norms. To that end, they are broadly interested in inter-subjective accounts of human normativity (what do people agree upon as right?) and metaphysics (how do people develop shared/contested conceptual representations of the world?), and how they can be formalized to a sufficient degree that they can serve as targets for AI alignment.

Tomáš Gavenčiak

Tomáš Gavenčiak

MENTOR

Tomáš is currently a researcher at the Center for Theoretical Study in Prague and for the past few years worked as an independent researcher in AI alignment, epidemic modeling, and machine learning. He currently works on topics such as modeling game theory and cooperation on complex networks, hierarchical models of agency, and more generally complex risks.

Tsvi Benson-Tilsen

Tsvi Benson-Tilsen

MENTOR

Minds drive the world in directions. We don't understand why minds drive the world in the directions that they do, well enough to make super-humanly-intelligent minds that drive the world in good directions. What are the essential relationships between creative expansion of thought---mathematical definitions, scientific insight, algorithm search---and pushing the world in directions? Tsvi is employed by the Machine Intelligence Research Institute.

Other Mentors

Lionel Levine
Nicolas Macé
TJ
Clem Von Stengel

We also cooperate with new mentors every year, all experienced scholars in AI risk, safety and/or governance

Affiliates of 2024

Adam Shai

I am broadly interested in how networks, both biological and artificial, compute. My background is in experimental and computational neuroscience, where I studied the mechanisms underlying prediction and understanding of the world in the cortex. My current research focuses on similar issues in transformers, where I aim to understand how self-supervised prediction training gives rise to understanding. Using tools from physics, dynamical systems, and information theory I aim to uncover the representations we should expect of transformers, as well as the mechanisms transformers perform with these representations.

Ann-Kathrin Dombrowski

I'm Annah, I started my AI safety journey with the MATS summer 2023 cohort under the mentorship of Dan Hendrycks. I previously did a PhD in machine learning at TU-Berlin. During MATS I worked on concept extraction, activation steering and knowledge removal. I also coauthored a paper on representation engineering with my colleagues at CAIS. I'm excited to explore new research directions with PIBBSS.

Clem von Stengel

I am a researcher in Alignment of Complex Systems, focussing on formal models of phenomena in evolutionary ecology which could shed light on the AI alignment problem. I've also made my way through a variety of academic disciplines : I'm currently PhD student in both Informatics and Macroecology, with a background in Mathematics and Theoretical Physics (Bachelors), and History and Philosophy of Science (Masters).

Guillaume Corlouer

Guillaume Corlouer

My research aims to understand the development of internal representations and capabilities during the training of deep learning models. I am interested in reducing uncertainty about the emergence of deceptive alignment during training and developing mathematically principled techniques to better detect deceptively aligned goals. In this context, I am looking into the relevance of singular learning theory to understanding the training dynamics of deep learning models, and adapting measures of emergence from multivariate information theory to deep neural networks.

Nischal Mainali

Nischal Mainali

I am Nischal, a PhD student in theoretical neuroscience at the Hebrew University of Jerusalem interested in mathematical theories of brain functions. I'm curious if and how we can use tools and ideas from neuroscience to understand AI.

Fellows of 2024

Agustín Martinez Suñé

I am a Computer Science Ph.D. from Argentina, deeply passionate about understanding the intricate technological landscape of our modern world. My goal is to leverage my expertise to drive positive societal change through advancements in science and technology.
My specialization lies in developing formal methods to analyze software artifacts. These methods employ techniques and tools grounded in logical-mathematical foundations, offering provable guarantees about their output.
In recent years, my interest in ensuring the safety of machine learning systems has grown significantly. This has motivated me to explore the broader social and economic factors that intersect with this technology, often leading to potential harm. This evolving interest has also reshaped my research trajectory at a technical level, and I am now transitioning to a career in AI safety and AI risk reduction.

Aron Vallinder

I'm an independent researcher. My primary academic background is in philosophy, with a PhD on Bayesian epistemology from the London School of Economics. I'm currently interested in using lessons from cultural evolution to think about AI safety and development.

Baram Sosis

I’m a PhD student in mathematical neuroscience at the University of Pittsburgh. My research focuses on understanding the mechanisms of learning and decision-making in the basal ganglia. I’m currently transitioning to work in AI safety, where I’m interested in exploring a variety of approaches.

Euan McLean

I have a PhD in theoretical particle physics, worked in ML engineering, technical comms, and macrostrategy research at the centre on long-term risk. I'm interested in questions regarding phenomenal consciousness and wellbeing in AI systems.

Jan Bauer

I'm interested in the tension between expressivity and stability in intelligent systems. How can capricious components give rise to reliable cognition? For example, in the brain, synaptic noise and strong connectivity give rise to chaotic dynamics, whereas in artificial systems, adversarial attacks sometimes prevent robust generalization from training data. Yet, both systems are highly capable. As a strong believer in synergies between fields, I approach this question from theoretical neuroscience, biased with a background in statistical physics.

Magdalena Wache

Causality enthusiast trying to become less confused about agency and abstractions. Previously I did my master's in machine learning with a minor in mathematics and I have worked on interpretability in the course of the Machine Learning Alignment Theory Scholars program.

Matthew Clarke

I am interested in how networks make decisions, both in machines and in biology. My work as a postdoctoral researcher has focused on understanding the networks that underlie decision making in human cells. Specifically, I research how these decisions go wrong in cancer or are hijacked in viral disease, and how we can best perturb them to treat disease. I am now interested in applying the lessons from this work to the mechanistic understanding of neural networks, as well as bringing methods for interpreting synthetic networks back to biology.

Nadine Spychala

I’m a doctoral researcher in computational neuroscience & complex systems at Sussex University as well as a research software engineer at King’s College London. During the PIBBSS fellowship, I aim to bring together various strands of research (philosophical, formal/mathematical and empirical) on the concept of emergence to inform & bring progress on research in AI capabilities. I ultimately want to explore whether gained insights can be channelled into evals-type of work to produce a deployable “emergence-assessment pipeline” for assessing AIs w. r. t. their emergent capabilities.

Shaun Raviv

I'm a freelance print and audio journalist based in Atlanta. I've written features for Wired, Smithsonian, The Intercept, The Ringer, and The Washington Post, as well as several podcast series. Topics I've covered include the free energy principle, the history of facial recognition technology, phone hacking in 1980's Sweden, and ethics and hereditary disease.

Wesley Erickson

I have an PhD in physics, with a specialization in stochastic processes, computational physics, and laser-cooled atoms. My research has involved investigating universal aspects of rare but extreme events, with models that can be applied to systems ranging from atomic motion of cold atoms to optimal animal foraging strategies. I am interested in exploring similar universal behavior in machine learning algorithms, especially to better understand how to detect signatures of "insight" in the learning process.

Yevgeniy
Liokumovich

I am a mathematician interested in using methods from geometry and topology to contribute to the AI safety and alignment problem.

Alumni Fellows of 2023

Aysja Johnson

Aysja Johnson

My academic background is in neuro and cognitive science; now, I'm learning about biology in search of a better understanding of entities which can cause reality to warp to their goals. Things I like to think about: how life manages to robustly hit narrow targets (such as making a human being starting from one cell), what exactly "levels of abstraction" are and how life uses them, what the dial is that causes "agency" to vary across different systems (e.g., skin cells seem much less "agentic" than immune cells—why?)
Final presentation - Searching For a Science of Abstraction

Brady Pelkey

Brady Pelkey

I am an independent student with a background in math and philosophy. I'm currently exploring ways to formalize embedded agents and goal-directed subsystems. Other topics I like to think about include maps between causal models, and interactive preference construction.

Cecilia Wood

Cecilia Wood

I'm a PhD student in Economics at the London School of Economics. My research focuses on using techniques from economic theory, especially mechanism design, to AI safety.
Final presentation - Beyond vNM Self-modification and Reflective Stability

Eleni Angelou

Eleni Angelou

Eleni is a PhD student in the philosophy program at the CUNY Graduate Center. She is currently a visiting researcher at the Center for Science, Technology, Medicine, and Society at UC Berkeley. Her research focuses on scientific cognition in both human and artificial agents. Eleni is also interested in questions related to technological progress, innovation, and the metascience of AI Safety.
Final presentation - Overview of Problems in the Study of Language Model Behavior

Erin Cooper

Erin Cooper

I am a PhD candidate in Philosophy at Stanford. I specialize in Political Philosophy and Ethics and am completing a dissertation on trust in political philosophy. For the fellowship, I will be doing a project summarizing philosophical approaches to distinguishing between manipulation and non-manipulation.

Gabriel Weil

Gabriel Weil

I am an Assistant Professor at Touro University Law Center. Prior to joining the Touro faculty, I was a research manager at the Climate Leadership Council. My primary research focus is climate governance governance, but I am interested in applying the tools and methods I have developed in that domain to AI safety.
Final presentation - Tort law as a tool for mitigating catastrophic risk from AI

George Deane

George Deane

I am a philosopher, currently a postdoctoral researcher on artificial consciousness on the Digital Minds project — a collaborative project between philosophers and computer scientists (Yoshua Bengio and his group at MILA) based at the University of Montreal, and the University of Oxford. I received my PhD from the University of Edinburgh in 2021, on consciousness, the self, and altered sense of self in the active inference framework. At the moment I am very interested in the possibility of a sense of self and agency emerging in AI systems.

Giles Howdle

Giles Howdle

My research background is primarily in the philosophy of action. I am particularly interested in the nature and emergence of agency (and normativity) in humans, social entities, and artificial intelligence. I am also working on the relationship between instrumental rationality and the adoption of values and policies, particularly in the context of cognitively, computationally, and/or temporally bounded agents. I am also keen to investigate the AI risk and ethical implications of these issues.
Final presentation - Auto-Intentional Agency and AI Risk

Guillaume Corlouer

Guillaume Corlouer

My research aims to understand the development of internal representations and capabilities during the training of deep learning models. I am interested in reducing uncertainty about the emergence of deceptive alignment during training and developing mathematically principled techniques to better detect deceptively aligned goals. In this context, I am looking into the relevance of singular learning theory to understanding the training dynamics of deep learning models, and adapting measures of emergence from multivariate information theory to deep neural networks.
Final presentation - The role of model degeneracy in the dynamics of SGD

Jason Hoelscher-Obermaier

Jason Hoelscher-Obermaier

I am an ML research engineer with a Ph.D. in experimental quantum physics and a background in philosophy. I am interested in robust evaluations of AI systems and how to use AI to improve rather than damage our collective epistemics and decision-making.
Final presentation - How LLM Evaluations Influence AI Risks

Martín Soto

Martín Soto

I am a Mathematical Logic grad student from Barcelona, working towards understanding intelligence in order to reduce future disvalue. I'm working with Vivek Hebbar (Researcher, MIRI) on theoretical threat models and interpretable architectures. While finishing my studies, I'm also exploring different directions in agent foundations with Abram Demski (Researcher, MIRI), and collaborating with the Center on Long-Term Risk for the reduction of suffering-risks.
Final presentation - Constructing Logically Updateless Decision Theory

Matthew Lutz

Matthew Lutz

I am a behavioral ecologist and architect with a PhD in Ecology and Evolutionary Biology from Princeton, where I studied self-assembled structures built by army ants from their own bodies. My current work as a postdoc at the University of Roehampton seeks to understand the evolution of building behavior in termites by comparing nest morphologies among related species. At PIBBSS, I will apply insights drawn from mathematical modeling of these complex insect societies to alignment and coordination problems in multi-agent systems, with the aim of avoiding the evolution of novel predatory AI superorganisms.
Final presentation - Detecting emergent capabilities in multi-agent AI Systems

Ninell Oldenburg

Ninell Oldenburg

I just graduated from a Master's program in IT and Cognition at the University of Copenhagen and have a background in linguistics and computational linguistics. I am broadly interested in cooperation amongst humans, computers, and in-between those two, currently with a focus on social norms.
Final presentation - Learning and Sustaining Social Norms as Normative Equilibria

Nischal Mainali

Nischal Mainali

I am Nischal, a PhD student in theoretical neuroscience at the Hebrew University of Jerusalem interested in mathematical theories of brain functions. I'm curious if and how we can use tools and ideas from neuroscience to understand AI.
Final presentation - A Geometry Viewpoint for Interpretability

Sambita Modak

Sambita Modak

I have a PhD in Behavioral Ecology from Indian Institute of Science, Bangalore, and I am currently working as a researcher at National Centre for Biological Sciences in Bangalore. While my research background is rooted in examining determinants of animal behavior in an evolutionary biology framework, I am deeply motivated by transdisciplinary approaches to research and problem solving. My current interest is to explore how concepts and skills from my doctoral research in animal behavior and evolution can be applied to other cause areas like AI alignment.

Samuel Martin

Sammy Martin

I'm currently working with CLR on a project that investigates AI misuse scenarios. I'm also involved with running the Modelling Transformative AI Risk (MTAIR) forecasting project and conducting technical research in cooperative AI (benchmarking cooperative intelligence). I'm currently most interested in AI strategy and forecasting, with a strong inclination towards incorporating expertise from diverse fields such as politics, international relations, and other disciplines to address AI strategy questions. I'm also keen to explore methods to aggregate knowledge from various sources and reason better under deep uncertainty.
Final presentation - An overview of AI misuse risks and what to do about them

Tom Ringstrom

Tom Ringstrom

I am a Computer Scientist who is interested in the foundations of reward-free compositional planning and intrinsic motivation. I develop theory for constructing compositional representations that agents can use to rapidly stitch together plans. My theory allows advanced agents to plan in dynamic hierarchical environments and also evaluate why achieving some state of the world is good or bad, without succumbing to objectives that accumulate "reward signals", as is common in AI.
Final presentation - A Mathematical Model of Deceptive Policy Optimization

Urte Laukaityte

Urte Laukaityte

I am a late-stage PhD candidate in the Philosophy Department at UC Berkeley, focusing on cognitive science, biology, and psychiatry. I am interested in exploring the issues around building artificial systems with respect to some of the recent developments within the life and mind sciences - particularly basal cognition, soft robotics, and the biogenic approach more generally.

Alumni Fellows of 2022

Adam Prada

Adam Prada

I am a PhD student at the Yusuf Hamied Department of Chemistry, University of Cambridge working on quantum chemical dynamics. During my PIBBSS fellowship, I will be working on the problem of agency and hierarchical agents.

Anand Siththaranjan

Anand Siththaranjan

I'm a PhD student at UC Berkeley advised by Stuart Russell and Claire Tomlin. I'm interested in leveraging ideas from control theory, learning, and economics as a means of creating principled, beneficial intelligent systems.

Anson Ho

Anson Ho

I’m a researcher at Epoch, investigating and forecasting the development of advanced AI to help inform AI governance. I’m particularly interested in neural network interpretability, AI forecasting, and theoretical AI alignment research.

Jan Hendrik Kirchner

Jan Hendrik Kirchner

I am a researcher of minds - artificial and biological - with a background in cognitive science and computational neuroscience. After researching the early development of the brain in my PhD, I am now working towards aligning artificial intelligence with human values at OpenAI. I write blog posts “On Brains, Minds, And Their Possible Uses” and care about doing good, better.

Daniel Hermann

Daniel Hermann

I am a PhD candidate in the department of Logic and Philosophy of Science at the University of California, Irvine. My primary research areas are decision/game theory and formal epistemology, in which I develop models of agents who reason about the ways in which they might be embedded in their world. I also have work clarifying the connection between computational learning theory and Occam's razor, modeling the invention and evolution of conventions and language, and applying prediction aggregation methods to social epistemology and policy making.

Holly Elmore

Holly Elmore

I have a PhD in Evolutionary Biology from Harvard, where I also did EA community organizing. Now I work as a researcher at Rethink Priorities on wild animal welfare and am interested in applying my evolutionary background to other important cause areas.

Lux Miranda

Lux Miranda

I would describe myself as a social scientist of intelligent agents such as humans and AI. My research draws from complexity science, anthropology, cognitive science, and (inverse) generative computational modeling. At Uppsala, I will study ethics and alignment surrounding human-like identity cues in social robots and other AI. I do my best to be a source of light. Find me at https://luxmiranda.com/

Martin Stoffel

Martin Stoffel

I'm an Evolutionary Geneticist at the University of Edinburgh, trying to work out how genetic variants spread and disappear and contribute to traits and fitness in wild animal populations. With a background in Psychology and Molecular Ecology, I'm curious how ideas connect across disciplines, and what we can learn about AI alignment from biological systems.

Zachary Peck

Zachary Peck

I am a PhD student in Philosophy of Science at the University of Cincinnati. Within academic philosophy, my research lies at the intersection of cognitive science, artificial intelligence, social and political philosophy, and the life sciences. Generally speaking, my AI-alignment research interests fall into two categories: agency and abstraction. In particular, I'm interested in how the capacity for acting agentially and thinking abstractly emerges in complex systems (both biological and artificial).

Other Fellows

Aanjaneya Kumar
Abra Ganz
Andrea Luppi
Blake Elias
Ivo Andrews
Jeffery Andrade
Josiah Lopez-Wild
Kai Sandbrink
Mel Andrews
Orowa Sikder
Simon McGregor

Organizing Team

Nora Ammann

Nora Ammann

DIRECTOR & CO-FOUNDER

Nora co-founded PIBBSS in 2022, driven by the question of how a naturalized understanding of intelligent behavior (across systems, scales and substrates) can be translated into concrete progress towards making AI systems safe and beneficial. Beyond the natural sciences, she draws inspiration from philosophy and history of science to understand the specific scientific and epistemological challenges to making progress on questions in AI risk, governance and safety. She is a Research Affiliate with the Alignment of Complex Systems group at Charles University, and is pursuing a PhD in Philosophy and AI. Her prior work included complex systems inspired research on understanding group decision making and political processes, and she has several years of experience in research organization, among others from working at the Future of Humanity Institute at the University if Oxford.

Dušan D. Nešić

Dušan D. Nešić

OPERATIONS LEAD

Dušan is a professor of Finance and Economics at Emlyon Business School and a consultant in the fields of Communication, Operations, and Education. He is a systems thinker with a passion for making the world a better place. In the past, he has volunteered with AIESEC, and he is the President of the Rotary Club Belgrade-Dedinje Belgrade. He has founded EA Serbia and SpEAk and coaches EA Organizations on how to communicate their knowledge efficiently.

Lucas Teixeira

Lucas Teixeira

PROGRAM LEAD

Coming from an interdisciplinary background, Lucas spent time studying Philosophy, Anthropology and Computer Science before fully devoting themselves to the alignment problem. As the PIBBSS program lead their time is mostly divided between providing technical support for the various research projects at PIBBSS, and gleaming insights from the history and philosophy of science to support pluralistic and non-paradigmatic research practices. Prior to joining the PIBBSS team, they worked at Conjecture as an Applied Epistemologist and Research Engineer.

PIBBSS was co-founded in late 2021 by Tushant (TJ) Jha and Nora Ammann.