APSIPA Transactions on Signal and Information Processing > Vol 11 > Issue 1

Bayesian Multi-Temporal-Difference Learning

Jen-Tzung Chien, Institute of Electrical and Computer Engineering, National Yang Ming Chiao Tung University, Taiwan, jtchien@nycu.edu.tw , Yi-Chung Chiu, Institute of Electrical and Computer Engineering, National Yang Ming Chiao Tung University, Taiwan
 
Suggested Citation
Jen-Tzung Chien and Yi-Chung Chiu (2022), "Bayesian Multi-Temporal-Difference Learning", APSIPA Transactions on Signal and Information Processing: Vol. 11: No. 1, e34. http://dx.doi.org/10.1561/116.00000037

Publication Date: 24 Nov 2022
© 2022 J.-T. Chien and Y.-C. Chiu
 
Subjects
 
Keywords
Bayesian learningvariational autoencodersequential learningtemporal-difference learningstate machine
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 79 times

In this article:
Introduction 
Bayesian Sequential Learning 
Bayesian Temporal-Difference Learning 
Bayesian Multi-temporal-difference Learning 
Experiments 
Conclusions 
References 

Abstract

This paper presents a new sequential learning via a planning strategy where the future samples are predicted by reflecting the past experiences. Such a strategy is appealing to implement an intelligent machine which foresees multiple time steps instead of predicting step by step. In particular, a flexible sequential learning is developed to directly predict future states without visiting all intermediate states. A Bayesian approach to multi-temporal-difference neural network is accordingly proposed to calculate the stochastic belief state for an abstract state machine so as to capture large-span context as well as make high-level prediction. Importantly, the sequence data are represented by multiple jumpy states with varying temporal differences. A Bayesian state machine is trained by maximizing the variational lower bound of log likelihood of sequence data. A generalized sequence model with various number of Markov states is derived with the simplified realization to the previous temporal-difference variational autoencoder. The predictive states are learned to roll forward with jumps. Experiments show that this approach is substantially trained to predict jumpy states in various types of sequence data.

DOI:10.1561/116.00000037