본문 바로가기 주 메뉴 바로가기

연구교류 세미나Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models

작성일2024-05-23 작성자 박세영
event02
접수기간2024-05-14 10:59 ~ 2024-05-30 14:00
행사기간2024-05-30 00:00
발표자윤상웅 박사(고등과학원)
1. 일시: 2024년 5월 30일(목), 14:00-16:00 2. 장소: 판교 테크노밸리 산업수학혁신센터 세미나실 - 경기 성남시 수정구 대왕판교로 815, 기업지원허브 231호 국가수리과학연구소 - 무료주차는 2시간 지원됩니다. 3. 발표자: 윤상웅 박사(고등과학원) 4. 주요내용: Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models We present a maximum entropy inverse reinforcement learning (IRL) approach for improving the sample quality of diffusion generative models, especially when the number of generation time steps is small. As IRL trains a policy using the reward function learned from expert demonstrations, we train (or fine-tune) a diffusion model using the log probability density estimated from training data. Since we employ an energy-based model (EBM) to represent the log density, our approach boils down to the joint training of a diffusion model and an EBM. Our IRL formulation, named Generalized Contrastive Divergence (GCD), is a minimax problem that reaches equilibrium when both models converge to the data distribution. The entropy maximization plays a key role in GCD, facilitating the exploration of the diffusion model and ensuring the convergence of the EBM. We also propose Diffusion by Dynamic Programming (DiDP), a novel RL approach for diffusion models, as a subroutine in GCD. DiDP makes the diffusion model update in GCD efficient by transforming the original problem into an optimal control formulation where value functions replace back-propagation in time. Our empirical studies show that diffusion models fine-tuned using GCD and DiDP can generate high-quality samples with fewer steps. Additionally, GCD-DiDP enables the training of an EBM without MCMC, stabilizing EBM training dynamics and enhancing outlier detection performance. # 유튜브 스트리밍 예정입니다.