본문 바로가기 주메뉴 바로가기
검색 검색영역닫기 검색 검색영역닫기 ENGLISH 메뉴 전체보기 메뉴 전체보기

학술행사

세미나

ICIM 연구교류 세미나(5.30.목)

등록일자 : 2024-05-14

https://www.nims.re.kr/icim/post/event/1074

  • 발표자  윤상웅 박사(고등과학원)
  • 개최일시  2024-05-30 14:00-16:00

1. 장소: 판교 테크노밸리 산업수학혁신센터 세미나실

2. 경기 성남시 수정구 대왕판교로 815, 기업지원허브 231호 국가수리과학연구소

3. 무료주차는 2시간 지원됩니다.

4. 주요내용: Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models


We present a maximum entropy inverse reinforcement learning (IRL) approach for improving the sample quality of diffusion generative models, especially when the number of generation time steps is small. As IRL trains a policy using the reward function learned from expert demonstrations, we train (or fine-tune) a diffusion model using the log probability density estimated from training data. Since we employ an energy-based model (EBM) to represent the log density, our approach boils down to the joint training of a diffusion model and an EBM. Our IRL formulation, named Generalized Contrastive Divergence (GCD), is a minimax problem that reaches equilibrium when both models converge to the data distribution. The entropy maximization plays a key role in GCD, facilitating the exploration of the diffusion model and ensuring the convergence of the EBM. We also propose Diffusion by Dynamic Programming (DiDP), a novel RL approach for diffusion models, as a subroutine in GCD. DiDP makes the diffusion model update in GCD efficient by transforming the original problem into an optimal control formulation where value functions replace back-propagation in time. Our empirical studies show that diffusion models fine-tuned using GCD and DiDP can generate high-quality samples with fewer steps. Additionally, GCD-DiDP enables the training of an EBM without MCMC, stabilizing EBM training dynamics and enhancing outlier detection performance.

유튜브 스트리밍 예정입니다.

1. 장소: 판교 테크노밸리 산업수학혁신센터 세미나실

2. 경기 성남시 수정구 대왕판교로 815, 기업지원허브 231호 국가수리과학연구소

3. 무료주차는 2시간 지원됩니다.

4. 주요내용: Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models


We present a maximum entropy inverse reinforcement learning (IRL) approach for improving the sample quality of diffusion generative models, especially when the number of generation time steps is small. As IRL trains a policy using the reward function learned from expert demonstrations, we train (or fine-tune) a diffusion model using the log probability density estimated from training data. Since we employ an energy-based model (EBM) to represent the log density, our approach boils down to the joint training of a diffusion model and an EBM. Our IRL formulation, named Generalized Contrastive Divergence (GCD), is a minimax problem that reaches equilibrium when both models converge to the data distribution. The entropy maximization plays a key role in GCD, facilitating the exploration of the diffusion model and ensuring the convergence of the EBM. We also propose Diffusion by Dynamic Programming (DiDP), a novel RL approach for diffusion models, as a subroutine in GCD. DiDP makes the diffusion model update in GCD efficient by transforming the original problem into an optimal control formulation where value functions replace back-propagation in time. Our empirical studies show that diffusion models fine-tuned using GCD and DiDP can generate high-quality samples with fewer steps. Additionally, GCD-DiDP enables the training of an EBM without MCMC, stabilizing EBM training dynamics and enhancing outlier detection performance.

유튜브 스트리밍 예정입니다.

이 페이지에서 제공하는 정보에 대해 만족하십니까?