HAVA: Hybrid Approach to Value Alignment through Reward Weighing for Reinforcement Learning

IRIS Institutional Research Information System - OPENBS Open Archive UniBS

Our society is governed by a set of norms which together bring about the values we cherish such as safety, fairness or trustworthiness. The goal of value alignment is to create agents that not only do their tasks but through their behaviours also promote these values. Many of the norms are written as laws or rules (legal/safety norms) but even more remain unwritten (social norms). Furthermore, the techniques used to represent these norms also differ. Safety/legal norms are often represented explicitly, for example, in some logical language while social norms are typically learned and remain hidden in the parameter space of a neural network. There is a lack of approaches in the literature that could combine these various norm representations into a single algorithm. We propose a novel method that integrates these norms into the reinforcement learning process. Our method monitors the agent's compliance with the given norms and summarizes it in a quantity we call the agent's reputation. This quantity is used to weigh the received rewards to motivate the agent to become value aligned. We carry out a two experiments including a continuous state space traffic problem to demonstrate the importance of the written and unwritten norms and show how our method can find the value aligned policies. Furthermore, we carry out ablations to demonstrate why it is better to combine these two groups of norms rather than using either separately.

HAVA: Hybrid Approach to Value Alignment through Reward Weighing for Reinforcement Learning

Varys K.;Cerutti F.;Sobey A.;Norman T. J.

2025-01-01

Abstract

Our society is governed by a set of norms which together bring about the values we cherish such as safety, fairness or trustworthiness. The goal of value alignment is to create agents that not only do their tasks but through their behaviours also promote these values. Many of the norms are written as laws or rules (legal/safety norms) but even more remain unwritten (social norms). Furthermore, the techniques used to represent these norms also differ. Safety/legal norms are often represented explicitly, for example, in some logical language while social norms are typically learned and remain hidden in the parameter space of a neural network. There is a lack of approaches in the literature that could combine these various norm representations into a single algorithm. We propose a novel method that integrates these norms into the reinforcement learning process. Our method monitors the agent's compliance with the given norms and summarizes it in a quantity we call the agent's reputation. This quantity is used to weigh the received rewards to motivate the agent to become value aligned. We carry out a two experiments including a continuous state space traffic problem to demonstrate the importance of the written and unwritten norms and show how our method can find the value aligned policies. Furthermore, we carry out ablations to demonstrate why it is better to combine these two groups of norms rather than using either separately.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo del volume
	
				Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
			
	Fonte principale del progetto
	
				Altre Istituz. pubb. estere
			
	Collana/Serie
	
				PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS
			
	Codice ISI
	
				WOS:001532048100229
			
	Codice Scopus
	
				2-s2.0-105009866449
			
	Lingua/e
	
				Inglese
			
	Titolo del convegno
	
				24th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2025
			
	Periodo del Convegno
	
				2025
			
	Luogo del Convegno
	
				usa
			
	Da pagina
	
				2096
			
	A pagina
	
				2104
			
	Numero di pagine
	
				9
			
	Nome editore
	
				International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
			
	Parole chiave
	
				Reinforcement Learning; Reward Shaping; Value Alignment
			
	Presenza di coautori internazionali
	
				sì
			
	Sustainable Development Goals
	
				Not applicable
			
	Fulltext
	
				open
			
	Tutti gli autori
	
						Varys, K.; Cerutti, F.; Sobey, A.; Norman, T. J.
					
	Tipologia sito docente
	
				273
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Numero autori
	
				4
			
	Tipologia
	
				4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
paper_aamas_HAVA.pdf accesso aperto Licenza: PUBBLICO - Creative Commons 4.0 Dimensione 2.98 MB Formato Adobe PDF Visualizza/Apri	2.98 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/640506

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

0

social impact