PIBBSS presents:
Summer Symposium 2024
The Symposium presents an opportunity to learn about the work of PIBBSS fellows conducted over the summer program. The program has concluded and recordings are available on our YouTube page.
The symposium took place in person in LISA offices in London, and online over several days in the week of September 9th.
Find a program overview here.
Find the full agenda (including brief descriptions for each talk) below by toggling between the days.
Register at this link. We recommend joining through the Zoom application and making sure that you have the latest version of Zoom to ensure you can join the breakout rooms at the end and chat with the speakers more privately
Click here to add the schedule to your Google calendar.
Agenda
TUESDAY (10/09)
WEDNESDAY (11/09)
THURSDAY (12/09)
FRIDAY (13/09)
TUESDAY (10/09)
17:00 GMT [09:00 San Francisco, 18:00 London] – Solvable models of in-context learning
Nischal Mainali
Recent work has shown that linear transformers can learn an in-context regression algorithm at the global minimum of SGD dynamics. We extend this by analytically solving nonlinear training dynamics of these models. Despite the model’s linearity, the learning dynamics exhibits highly nonlinear training phenomena similar to those observed in practice. Properties of these dynamics can then be used to identify feature learning in real-world models and compare between them. Finally, we also explore conservation laws in training dynamics and how they, combined with initialization, influence learning capabilities and dynamics.
18:00 GMT [10:00 San Francisco, 19:00 London] – Factored Space Models: Causality Between Levels of Abstraction
Magdalena Wache
Causality plays an important role in understanding intelligent behavior, and there is a wealth of literature on models for causality, most of which is focused on causal graphs. However, causal graphs are limited when it comes to modeling variables that are deterministically related. In the presence of deterministic relationships there is generally no causal graph that satisfies both the Markov condition and the faithfulness condition. This is an important limitation, since deterministic relationships appear in many applications in mathematics, physics and computer science, and when modeling systems at different levels of abstraction. We introduce Factored Space Models as an alternative to causal graphs which naturally represent both probabilistic and deterministic relationships at all levels of abstraction. Moreover, we introduce structural independence and establish that it is equivalent to statistical independence in every distribution that factorizes over the factored space. This theorem generalizes the classical soundness and completeness theorem for d-separation.
19:00 GMT [11:00 San Francisco, 20:00 London] – Fixing our concepts to understand minds and agency: preliminary results
Mateusz Bagiński
The hermeneutic net is a provisional method for large-scale analysis of concepts and cognitive frameworks relevant to a particular domain of investigation. It aims to examine the roles that various elements play in our thinking, their relationships, and inadequacies, in order to put us in a better position to comprehensively revise our thinking about the domain. Applying this method to the domain of minds and agency was my main focus during the PIBBSS Fellowship. In the presentation I will cover the methodology and trajectory of my project, the results it has produced so far, and how I see its potential contributions to agent foundations and adjacent areas of research.20:00 GMT – Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
WEDNESDAY (11/09)
17:00 GMT [09:00 San Francisco, 18:00 London] – Features that Fire Together Wire Together: Examining Co-occurence of SAE Features
Matthew A. Clarke
18:00 GMT [10:00 San Francisco, 19:00 London] – Minimum Description Length for singular models
Yevgeny Liokumovich
Understanding generalization properties of neural networks is crucial for AI safety and alignment. One approach towards explaining modern models’ remarkable generalization abilities is via the Minimum Description Length Principle, a mathematical realization of the Occam’s razor. For regular models the MDL principle leads to the Bayesian Information Criterion, but neural networks are generally singular. In this talk I will describe a new formula for the MDL for a class of singular models and discuss its implications for the choice of prior and the geometry of the parameter space.
19:00 GMT [11:00 San Francisco, 20:00 London] – Are Neuro-Symbolic Approaches the Path to Safe LLM-Based Agents?
Agustín Martinez-Suñé
Large Language Model (LLM)-based agents are increasingly used to autonomously operate in both digital and physical environments, given natural language instructions. However, ensuring these agents act safely and avoid unintended harmful consequences remains a significant challenge. This project explores how neuro-symbolic approaches might address this issue. Specifically, we will discuss how integrating automated planning techniques with LLMs’ ability to formalize natural language can not only improve these agents’ safety but also help provide measurable safety guarantees.
20:00 GMT [12:00 San Francisco, 21:00 London] – Heavy-tailed Noise & Stochastic Gradient Descent
Wesley Erickson
Heavy-tailed distributions describe systems where extreme events dominate the dynamics. These distributions are prominent in the gradient noise of Stochastic Gradient Descent, and have strong connections to the ability for SGD to generalize. Despite this, it is common in machine learning to assume Normal/Gaussian dynamics, which may impact safety by underestimating the possibility of rare events or sudden changes in behavior or ability. In this talk I explore simulations of ML systems exhibiting heavy-tailed noise, discuss how this noise arises, and how its characteristics may act as a useful statistical signature for monitoring and interpreting ML systems.
21:00 GMT — Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
THURSDAY (12/09)
17:00 GMT [9:00 San Francisco, 18:00 London] – Exploring the potential of formal approaches to emergence for AI safety
Nadine Spychala
Multi-scale relationships in emergent phenomena and complex systems are studied across various disciplines. They explore the unresolved subject of how macro- and micro-scales relate to each other – are they independent, reducible, or is their relationship more complex? Historically, the lack of formalism hindered empirical investigation, but recent research introduced quantitative measures based on information theory to quantify emergence in systems whose components evolve over time. In this talk, I elaborate on how such multi-scale measures can be leveraged to characterize and understand evolving macro-scales in AI. This bears direct relevance for AI safety, as those macro-scales potentially relate to an AI’s behaviour and capabilities – measures of emergence may thus help detect, predict and perhaps even control them.
Measuring emergence is new terrain, hence the applicability of measures thereof to empirical data as well as their validity and informativeness form active research areas themselves. During my fellowship, I explored a) the feasibility of applying those measures to neural networks, and b) the value of such an endeavour. I identify important sub-questions and bottlenecks that refine the research agenda, and outline what further progress on quantifying emergence in AI (with decision-making relevance w. r. t. their capabilities) would entail.
Measuring emergence is new terrain, hence the applicability of measures thereof to empirical data as well as their validity and informativeness form active research areas themselves. During my fellowship, I explored a) the feasibility of applying those measures to neural networks, and b) the value of such an endeavour. I identify important sub-questions and bottlenecks that refine the research agenda, and outline what further progress on quantifying emergence in AI (with decision-making relevance w. r. t. their capabilities) would entail.
18:00 GMT [10:00 San Francisco, 19:00 London] – What I’ve learned as a PIBBSS fellow, and what I plan to do with it
Shaun Raviv
Starting from scratch on AI safety as a journalist with a non-technical background.
19:00 GMT [11:00 San Francisco, 20:00 London] – Searching for indicators of phenomenal consciousness in LLMs: Metacognition & higher-order theory
Euan McLean
It’s time to start the hard grind of testing LMs for indicators of phenomenal consciousness under all the different popular (computational functionalist) theories of mind. We might as well start with higher-order theory (HOT). Some versions of HOT imply that to have phenomenal consciousness, a thing must have the ability to distinguish between reliable internal/mental states from noise. Can language models do this?
20:00 GMT — Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
FRIDAY (13/09)
17:00 GMT [09:00 San Francisco, 18:00 London]– Dynamics of LLM beliefs during chain-of-thought reasoning
Baram Sosis
Chain-of-thought and other forms of scaffolding can significantly enhance the capabilities of language models, making a better understanding of how they work an important goal for safety research. Benchmark accuracy can measure how much scaffolding improves performance, but this does not give us insight into how the LLMs arrive at their answers. I will present the results of my work studying how the internal beliefs of LLMs of different sizes evolve over the course of chain-of-thought reasoning as measured by linear probes.
18:00 GMT [10:00 San Francisco, 19:00 London] – Cultural Evolution of Cooperation in LLMs
Aron Vallinder
Cultural evolution has been a critical driver of the scope and sophistication of cooperation in humans, and may plausibly play a similar role in interactions among advanced AI systems. In this talk, I present results from a simulation study in which LLM agents are assigned Big-5 personalities, generate a strategy, and then play an indirect reciprocity game with cultural transmission of strategy. These results indicate that cultural transmission can indeed drive increases in cooperative behavior, although there is large variation across models. Moreover, Big-5 personality plays a substantial role in determining the level of cooperation.
19:00 GMT [11:00 San Francisco, 20:00 London] – The geometry of in-context learning
Jan Bauer
Reasoning is a deductive process: A conclusion is reached through a series of intermediate states. Recent work has shown that in-context learning capabilities of sequence models can accomodate such a stateful process, termed Chain-of-Thought (CoT). While CoT empirically often leads to superior capabilities, it is unclear what facilitates it on a neural level. For such a setting where neural activations depend on each other, the study of the biological brain has developed normative analyses that explain global structure through geometric manifolds. We here adopt this framework in a simple example that suggests that CoT-like sequence completetion is implemented as extrapolation on a manifold whose geometry has been shaped during in-weight learning. Overall, we argue that the global structure of features can be normatively interpreted by studying the neural geometry, and thereby complement descriptive analyses that focus on local features in isolation.
20:00 GMT — Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
For any questions, please write to us: