Speaker Series

Speaker Series is an ongoing PIBBSS project featuring researchers from both AI Alignment and adjacent fields studying intelligent behavior in some shape or form. The goal is to create a space where we can explore the connections between the work of these scholars and questions in AI Alignment.

By default, our speaker series consists of virtual talks followed by questions and discussions, happening over Zoom. Past recorded talks can be seen on our YouTube page or in their separate playlists (2022) (2023) (2024).

Going forward, our speaker series will be grouped into two-week sprints, happening several times per year. The next time we run them will be in February 2024. You can find the speakers and registration links for each below, and you can add our Google Calendar to always know when the next events will happen..

Sign up for updates here

Upcoming Talks:

Micah Carroll

July 10th, 16:00 UTC, 9:00 PDT, noon ET, 18:00 CET [Zoom Link]

Micah Carroll is an AI PhD student at UC Berkeley advised by Anca Dragan and Stuart Russell. His research investigates what AI Alignment should entail with changing and influenceable humans. In particular, he has worked on user influence brought about by algorithmic choices in recommender systems, and more broadly on characterizing AI manipulation.

AI Alignment with Changing and Influenceable Reward Functions

Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. I’ll argue that despite its convenience, the static-preference assumption may undermine the soundness of existing alignment techniques, leading them to implicitly reward AI systems for influencing user preferences in ways users may not truly want. I’ll formalize different notions of AI alignment that account for preference change from the outset, using the language of Dynamic Reward MDPs, which we introduce for analyzing this setting. Comparing the strengths and limitations of 8 such notions of alignment, I’ll argue that they all either err towards causing undesirable AI influence, or are overly risk-averse, suggesting that a straightforward solution to the problems of changing preferences may not exist. As there is no avoiding grappling with changing preferences in real-world settings, this makes it all the more important to handle these issues with care, balancing risks and capabilities.

Randall D. Beer

July 15th, 16:00 UTC, 9:00 PDT, noon ET, 18:00 CET [Zoom Link]

Randall is a Provost Professor in the Cognitive Science Program, the Neuroscience Program, and the Dept. of Informatics at Indiana University. Broadly speaking, his research concerns how organisms operate as integrated wholes, with a particular focus on how behavior arises from the interaction between brains, bodies and environments. Toward this end, he works on the evolution and analysis of dynamical “nervous systems” for model agents, neuromechanical modeling of animals, biologically-inspired robotics, and dynamical systems and information theoretic approaches to behavior and cognition. He is also interested in computational and theoretical biology, including models of metabolism, gene regulation and development.

Autopoiesis and Enaction in the Game of Life

Enaction plays a central role in the broader fabric of so-called 4E (embodied, embedded, extended, enactive) cognition. Although the origin of the enactive approach is widely dated to the 1991 publication of the book “The Embodied Mind” by Varela, Thompson and Rosch, many of the central ideas trace to much earlier work. Over 40 years ago, the Chilean biologists Humberto Maturana and Francisco Varela put forward the notion of autopoiesis as a way to understand living systems and the phenomena that they generate, including cognition. Varela and others subsequently extended this framework to an enactive approach that places biological autonomy at the foundation of situated and embodied behavior and cognition. Unfortunately, these ideas have mostly been expressed purely verbally, making rigorously evaluate and debate. I will describe a research program aimed at placing these ideas on a firmer theoretical foundation by studying them within the context of a toy model universe, the Game of Life (GoL) cellular automata. This work has both pedagogical and theoretical goals. Simple concrete models provide an excellent vehicle for introducing some of the core concepts of autopoiesis and enaction and explaining how these concepts fit together into a broader whole. In addition, a careful analysis of such toy models can hone our intuitions about these concepts, probe their strengths and weaknesses, and move the entire enterprise in the direction of a more mathematically rigorous theory.

[Postponed] Seth Lazar

~~August 14th, 09:00 UTC, 02:00 PDT, 05:00 EDT, 11:00 CET~~ [Postponed to winter 2024]

Seth Lazar is Professor of Philosophy at the Australian National University, a Distinguished Research Fellow of the University of Oxford Institute for Ethics in AI, a fellow of the Carnegie Endowment for International Peace, and a member of the Executive Committee of the ACM Fairness, Accountability and Transparency Conference. He founded the Machine Intelligence and Normative Theory (MINT) Lab, where he leads research projects in normative philosophy of computing and sociotechnical AI safety. His Connected by Code: How AI Structures, and Governs, the Ways We Relate, based on his 2023 Tanner Lecture on AI and Human Values, is forthcoming with Oxford University Press. His recent work can be found at linktr.ee/sethlazar.

LM Agents: Prospects and Impacts

Large Language Models’ growing capabilities in tool-use, reasoning and planning have enabled some to propose their deployment as the executive centre of complex agentic systems, able to conduct (relatively) long sequences of (relatively) consequential unsupervised actions in dynamic settings. While the ‘embers of autoregression’ remain a barrier to robust LLM Agents being developed at present, the pace of scientific progress and the scale of investment suggest these limitations may soon be overcome. But the massive investment in expanding LLM agents’ capabilities is not being matched by comparable investment to ensure that they are normatively defensible. And there are significant gaps in our knowledge: not only don’t we know how to align LLM agents to defensible norms, we don’t even know what those norms should be. This talk explores the limitations of current technical alignment paradigms as applied to Agents, and outlines the kinds of normative questions that must be answered first, for any approach to alignment to succeed.

Ekdeep Singh Lubana

August 7th, 16:00 UTC, 9:00 PDT, noon ET, 18:00 CET [Zoom Link]

Ekdeep Singh Lubana is a postdoc at Center for Brain Science, Harvard University. Broadly, his research is focused on model systems for identifying novel challenges and better understanding existing challenges in alignment of AI systems. His recent work has revolved around developing mechanistic explanations for emergent capabilities in neural networks and demonstrating the brittleness of fine-tuning based approaches (e.g., RLHF) for alignment.

Explaining emergence in NN with model systems analysis

A fascinating phenomenon often seen in modern neural networks’ training is the sudden emergence of certain capabilities with scale. Specifically, such capabilities seem to be inexistent in the model until a critical amount of compute, data, or model size is reached, showing consistently and controllably thereafter. Since most policy frameworks for AI regulation are grounded in risk regulation, emergent capabilities are a big hurdle for such frameworks: regulating models for capabilities that are not yet present seems likely to be challenging (if not impossible). In this talk, we borrow the approach of model systems analysis from natural sciences to develop mechanistic hypotheses for what leads to the sudden emergence of capabilities in neural networks, identifying several unrelated mechanisms for this effect. These mechanisms have characteristic signatures that indicate preemptive estimation of the scale at which said capabilities will be learned may in fact be feasible.

Rio Popper

August 12th, 16:00 UTC, 9:00 PDT, noon ET, 18:00 CET [Zoom Link]

Rio Popper is a Research Fellow at the Global Priorities Institute, University of Oxford. She is also a Ph.D. candidate in economics at Stanford and a J.D. candidate at Yale Law School.

Popper on Popper

This talk addresses Karl Popper’s epistemology and its implications for AGI and ML. It first introduces Popper’s epistemology in its historical context and spells out the influence the theory has had since—both in science and in philosophy—and the implications it has for AI. However, while Popperian epistemology does have important implications for AI, some later philosophers have misapplied the theory. This talk also criticizes those misapplications.

Past Talks: