Linear Quadratic Regulator (LQR) is one of the most popular structures in optimal control problems. This talk introduces some basic properties of the LQR problems and recent progress in learning LQR. There are two approaches to measuring the performance of the algorithm, Bayesian and Frequentist regret. The advantages and disadvantages will be discussed particularly g on Thompson sampling. Thompson sampling (TS) is known to effectively address the exploration-exploitation trade-off in online learning problems including reinforcement learning for linear-quadratic regulators (LQR). However, in TS for learning LQR, its theoretical analysis is often limited to the case of Gaussian noises. The sampling can be performed directly when we further assume that the unknown system parameters lie in a prespecified compact set which is seemingly restrictive. We propose a new TS algorithm for LQR, exploiting Langevin dynamics to handle a larger class of problems including those with non-Gaussian noises. The notion of preconditioner is introduced to generate samples from non-conjugate posterior distributions.
현장강의만 진행합니다.
1. 일시: 2023년 2월 3일(금), 15:00-17:00
2. 장소: 산업수학혁신센터 세미나실
경기 성남시 수정구 대왕판교로 815, 기업지원허브 231호 국가수리과학연구소
무료주차 2시간 등록 가능
3. 발표자: 김연응 교수(가천대학교)
4. 주요내용: Discrete optimal control - learning LQR
Linear Quadratic Regulator (LQR) is one of the most popular structures in optimal control problems. This talk introduces some basic properties of the LQR problems and recent progress in learning LQR. There are two approaches to measuring the performance of the algorithm, Bayesian and Frequentist regret. The advantages and disadvantages will be discussed particularly g on Thompson sampling. Thompson sampling (TS) is known to effectively address the exploration-exploitation trade-off in online learning problems including reinforcement learning for linear-quadratic regulators (LQR). However, in TS for learning LQR, its theoretical analysis is often limited to the case of Gaussian noises. The sampling can be performed directly when we further assume that the unknown system parameters lie in a prespecified compact set which is seemingly restrictive. We propose a new TS algorithm for LQR, exploiting Langevin dynamics to handle a larger class of problems including those with non-Gaussian noises. The notion of preconditioner is introduced to generate samples from non-conjugate posterior distributions.