Emergent Linear Representations in World Models of Self-Supervised Sequence Models

Akisamb@programming.devM to

Machine Learning@programming.dev • 6 months ago

0 Comments

abstract :

How do sequence models represent their decision-making process? Prior work suggests that Othello-playing neural network learned nonlinear models of the board state (Li et al., 2023). In this work, we provide evidence of a closely related linear representation of the board. In particular, we show that probing for "my colour" vs. "opponent's colour" may be a simple yet powerful way to interpret the model's internal state. This precise understanding of the internal representations allows us to control the model's behaviour with simple vector arithmetic. Linear representations enable significant interpretability progress, which we demonstrate with further exploration of how the world model is computed.

You must log in or register to comment.

Chat

Machine Learning@programming.dev

machine_learning@programming.dev

Create a post

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !machine_learning@programming.dev

A community for posting things related to machine learning

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
1 user / week
1 user / month
36 users / 6 months
4 local subscribers
369 subscribers
35 Posts
18 Comments
Modlog