본문 바로가기 주 메뉴 바로가기

연구교류 세미나Clustered hidden Markov models

작성일2024-10-16 작성자 박세영
event02
접수기간2024-10-16 10:19 ~ 2024-10-30 14:00
행사기간2024-10-30 00:00
발표자신선영 교수(포항공대)
# 유튜브 스트리밍 예정입니다. 1. 일시: 2024년 10월 30일(수), 14:00-16:00 2. 장소: 판교 테크노밸리 산업수학혁신센터 세미나실 - 경기 성남시 수정구 대왕판교로 815, 기업지원허브 231호 국가수리과학연구소 - 무료주차는 2시간 지원됩니다. 3. 발표자: 신선영 교수(포항공대) 4. 주요내용: Clustered hidden Markov models We consider cases where a hidden Markov model (HMM) has so many distinct states that its state space needs a simpler structure. A novel framework we develop, named the clustered hidden Markov model (CHMM), decomposes the state space into sets of clustered states, by adding a hidden process to the HMM. The CHMM consists of two hidden processes and an observable process, where the first hidden process is a Markov process with clustered state representations and the second hidden process has the original state space. The second hidden process is second-order dependent on the first hidden process; that is, the current state of the second hidden process is dependent on the current and previous states of the first hidden process. The CHMM is a new approach to designing the structure of the hidden states while maintaining the information in the HMM. We derive two hidden Markov submodels from the CHMM. The first hidden process and the observable process of the CHMM form a hidden Markov model with additional dependencies while the second hidden process and the observable process of the CHMM form a classical hidden Markov model. A novel finding is that the hidden states corresponding to a clustered state share the same transition probabilities, which are represented as the identical transition matrix rows of the second hidden Markov model. The identical row structure of the transition matrix motivates us to consider the penalization method for learning the CHMM, simultaneously recovering the clustered state space. The penalized estimation maximizes the likelihood regularized by group smooth clipped absolute deviation (SCAD) penalty to all pairwise differences of the transition matrix rows. We establish the asymptotic properties of the penalized estimator. Simulation studies support the outperformance of our proposed method. Its application to protein structure sequence data demonstrates that the CHMM simplifies the classification of protein segments, and consequently improves interpretability.