PIBBSS presents:
Summer Symposium 2023
The Symposium ‘23 has finished! You can find materials from the event below, and you can find recordings of some talks on our YouTube page playlist.
The symposium has taken place online over several days in the week of September 18th.
Find a program overview here. Find the full agenda (including brief descriptions for each talk) below by toggling between the days.
Agenda
TUESDAY (19/09)
WEDNESDAY (20/09)
THURSDAY (21/09)
FRIDAY (22/09)
TUESDAY (19/09)
18:00 — Auto-Intentional Agency and AI Risk
Giles Howdle
Abstract: The dynamic we identify, ‘auto-intentional agency’, is found in systems which create abstract explanations of their own behaviour — in other words, apply the intentional stance (or something functionally similar) to themselves. We argue that auto-intentional agents acquire distinctive planning capacities distinct from goal-directed systems which, while amenable to being understood via the intentional stance, are not fruitfully understood as applying the intentional stance to themselves. We unpack this notion of auto-intentional agency with reference to hierarchically deep self-models in the active inference framework. We also show how auto-intentional agency dovetails with insights from the philosophy of action and moral psychology. We then show the implications of this distinct form of agency for AI safety. In particular, we argue that auto-intentional agents, in modelling themselves as temporally extended, are likely to have more sophisticated planning capacities and to be more prone to explicit self-preservation and power-seeking than other artificially intelligent systems.
18:30 — Allostasis emergence of auto-intentional agency
George Deane
Abstract: This talk argues that ‘thick’ agency and conscious experience are likely to be co-emergent as systems come to form abstractions of themselves and their own control that they can use for the basis of action selection. I consider this argument by looking at minimal biological systems and then apply consider the possibility of artificial agency and consciousness through this perspective.
19:00 — (Basal) Memory and the Cognitive Light Cone of Artificial Systems
Urte Laukaityte
Abstract: What’s memory got to do with AI? On the face of it – not much, perhaps. However, memory is increasingly conceptualised as crucial in enabling future planning and prediction in biological organisms – capacities which, in artificial systems, worry the AI safety community. One way to flesh out the relationship is in terms of the notion of a cognitive light cone. Specifically, it references the set of “things [the] system can possibly care about” (Levin, 2019). The boundary of this light cone is hence “the most distant (in time and space) set of events” of significance to the system in question – with spatial and temporal distances on the horizontal and vertical axes, respectively. Drawing on the basal cognition framework more broadly, this talk will explore the implications that the cognitive light cone idea, if taken seriously, might have vis-à-vis restricting the space of possible AIs.
19:30 — Searching For a Science of Abstraction
Aysja Johnson
Abstract: I suspect it tends to be easier to achieve goals insofar as you’re better able to exploit abstractions in your environment. To take a simple example: it’s easier to play chess if you have access to an online chess program, rather than just assembly code. But what is it about abstractions, exactly, that makes learning some of them far more useful for helping you achieve goals than others? I’m trying to figure out how to answer this and similar questions, in the hope that they may help us better characterize phenomena like agency and intelligence.
20:00 – Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
WEDNESDAY (20/09)
18:00 — Agent, behave! Learning and Sustaining Social Norms as Normative Equilibria.
Ninell Oldenburg
Abstract: Learning social norms is an inherent, cooperative feature of human societies. How can we build learning agents that do the same, so they cooperate with the human institutions they are embedded in? We hypothesize that agents can achieve this by assuming there is a shared set of rules that most others comply with and enforce while pursuing their individual interests. By assuming shared rules, a newly introduced agent can infer the rules that are practiced by an existing population from observations of compliance and violation. Furthermore, groups of agents can converge to a shared set of rules, even if they initially diverge in their beliefs about the rules. This, in turn, enables the stability of a shared rule system: since agents can bootstrap common knowledge of the rules, this gives them a reason to trust that rules will continue being practiced by others, and hence an incentive to participate in sustaining this normative infrastructure. We demonstrate this in a multi-agent environment which allows for a range of cooperative institutions, including property norms and compensation for pro-social labour.
18:30 — Detecting emergent capabilities in multi-agent AI Systems
Matthew Lutz
Abstract: After summarizing my research on army ants as a model system of collective misalignment, I will discuss potential alignment failure modes that may arise from the emergence of analogous collective capabilities in AI systems (predatory or otherwise). I will present initial results from my research at PIBBSS – designing and conducting experiments to predict and detect emergent capabilities in LLM collectives, informed by social insects, swarm robotics, and collective decision making. I am interested in what fundamentally novel capabilities might emerge in LLM collectives vs. single instances of more powerful models (if any), and how we might design experiments to safely explore this possibility space.
19:00 — An overview of AI misuse risks (and what to do about them)
Sammy Martin
Abstract: One of the most significant reasons why Transformative AI might be dangerous is its ability to make powerful and potentially cheap and easy to proliferate weapons. This talk aims to provide a structured framework for understanding the misuse risks associated with AI-enabled weaponry across three timelines: near-term, intermediate, and long-term. We will explore the gamut of threats, from AI-driven cyberattacks and bioweapons to advanced lethal autonomous weapons and nanotech. I will then explain how an understanding of the dangers we face can inform the overall governance strategies required to mitigate these risks, and outline five general categories of overall solutions to these problems.
19:30 — Tort law as a tool for mitigating catastrophic risk from artificial intelligence
Gabriel Weil
Abstract: Building and deploying potentially dangerous AI systems generates risk externalities that the tort liability system should seek to internalize. I address several practical and doctrinal barriers to fully aligning AI companies’ incentives in order to induce a socially optimal investment in AI safety measures. The single largest barrier is that, under plausible assumptions, most of the expected harm from AI systems comes in truly catastrophic scenarios where the harm would be practically non-compensable. To address this, I propose a form of punitive damages designed to pull forward this expected liability into cases of practically compensable harm. To succeed in internalizing the full risk of legally compensable harm, such punitive damages would need to be available even in the absence of human malice or recklessness. Another key doctrinal change considered is recognizing the training and deployment of advanced AI systems as an abnormally dangerous activity subject to strict liability.
20:00 — Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
THURSDAY (21/09)
18:00 — The role of model degeneracy in the dynamics of SGD
Guillaume Corlouer
Abstract: Stochastic Gradient Descent (SGD) is a fundamental algorithm enabling deep neural networks to learn. We lack a comprehensive understanding of how the training dynamics governs the selection of specific internal representations within these networks, which consequently hampers their interpretability. In the Bayesian paradigm, Singular Learning Theory (SLT) shows that the accumulation of the posterior in a learning machine with a degenerate statistical model – i.e. a singular model – can be influenced by the degree of degeneracy in such models. However, it remains unclear how SLT predictions translate to the deep learning paradigm. In this talk I will present research in progress about the potential of SLT to help elucidate the dynamics of SGD. The talk will begin with a review of SLT and its relevance to AI safety. Subsequently, ongoing experiments looking at the dynamics of SGD on singular toy models will be discussed. Preliminary observations suggest that the degeneracy of statistical models influence the convergence of SGD.
18:30 — A Geometry Viewpoint for Interpretability
Nishal Mainali
Abstract: Computational Neuroscience has fruitfully posited that computations and representations in brains of behaving animals can be understood in terms of the geometric features of neural population activity. This has lead to a shift from circuit search to understanding population geometry directly to understand and theorize about neural system. Can this viewpoint be usefully imported into interpretability? I’ll present some simple initial findings that show geometric regularities in toy LLMs. These regularities can be understood both as non-behavioral measures that might identify model capabilities or as empirical findings in search for a theory. I’ll end with a brief sketch of a further research program this viewpoint suggests.
19:00 — An Overview of Problems in the Study of Language Model Behavior
Eleni Angelou
Abstract: There are at least two distinct ways to approach Language Model (LM) cognition. The first is the equivalent of behavioral psychology for LMs and the second is the equivalent of neuroscience for human brains, i.e., interpretability. I focus on the behavioral study of LMs and discuss some key problems that are observed in attempts to interpret LM outputs. A broader question about studying the behavior of models concerns the potential contribution to solving the AI alignment problem. While it is unclear to what extent LM behavior is indicative of the internal workings of a system, and consequently, of the degree of danger a model may pose, it surely seems that further work in LM behavioral psychology would at least provide some tools for evaluating novel behaviors and informing governance regimes.
19:30 — Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
FRIDAY (22/09)
18:00 — Beyond vNM: Self-modification and Reflective Stability
Cecilia Wood
Abstract: Any tool we use to increase AI safety should be robust to self-modification. I present a formalism of an agent able to self-modify and argue this captures a wide range of goals or preferences beyond standard von Neumann-Morgenstern utility. In particular, I address mild or soft optimization approaches, such as quantilizers or maximising worse case over credal sets, by applying existing preference axiomatisations from economic theory which are more general than the axiomatisation given in the von Neumann-Morgenstern utility theorem.
18:30 — Constructing Logically Updateless Decision Theory
Martín Soto
Abstract: Both CDT and EDT, the two most prominent decision theories, update on all the information they are provided. As a consequence, they present apparently irrational behavior in Parfit’s Hitchhiker and Counterfactual Mugging. These are particular examples of the general phenomenon of dynamic instability: different instantiations of the agent (in different time positions or counterfactual realities) aren’t able to cooperate. Updateless Decision Theory aims to solve dynamic instability completely, by having a single agent-moment decide all future policy. This is straightforward for the case of empirical uncertainty, where we can assume logical omniscience. But for logical uncertainty, we face more complicated tradeoffs, and in fact not even the correct formalization is clear. We propose a formalization using Garrabrant’s Logical Inductors, develop desiderata for UDT, and present an algorithm satisfying most of them. We also explore fundamental impossibilities for certain dynamically stable agents, and sketch ways forward by relaxing dynamic stability.
19:00 — A Mathematical Model of Deceptive Policy Optimization
Tom Ringstrom
Abstract: A pressing concern in AI alignment is the possibility of agents intentionally deceiving other agents for their own gain. However, little is known how this could be achieved in a rapid, deliberative, and real-time manner within the simpler domain of Markov decision processes. I will outline the representations and computations that might be necessary for an agent to intentionally deceive an observer in the domain of discrete-state planning. I will close by formalizing how we might be able to compute the probability that an agent is being intentionally deceptive.
19:30 — Causal approaches to agency and directedness
Brady Pelkey
Abstract: Within AI safety, one way to describe intentional directedness and adaptive behavior comes from the ‘causal incentives’ framework. I’ll discuss some connections this has with work on causal dynamical systems and equilibration. I’ll also explore where these tools might fit into broader debates in philosophy of biology.
20:00 — Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
For any questions, please write to us: