일시: 2024년 10월 30일(수), 14:00-16:00
장소: 판교 테크노밸리 산업수학혁신센터 세미나실
발표자: 신선영 교수(포항공대)
주요내용: Clustered hidden Markov models
We consider cases where a hidden Markov model (HMM) has so many distinct states that its state space needs a simpler structure. A novel framework we develop, named the clustered hidden Markov model (CHMM), decomposes the state space into sets of clustered states, by adding a hidden process to the HMM. The CHMM consists of two hidden processes and an observable process, where the first hidden process is a Markov process with clustered state representations and the second hidden process has the original state space. The second hidden process is second-order dependent on the first hidden process; that is, the current state of the second hidden process is dependent on the current and previous states of the first hidden process. The CHMM is a new approach to designing the structure of the hidden states while maintaining the information in the HMM. We derive two hidden Markov submodels from the CHMM. The first hidden process and the observable process of the CHMM form a hidden Markov model with additional dependencies while the second hidden process and the observable process of the CHMM form a classical hidden Markov model. A novel finding is that the hidden states corresponding to a clustered state share the same transition probabilities, which are represented as the identical transition matrix rows of the second hidden Markov model. The identical row structure of the transition matrix motivates us to consider the penalization method for learning the CHMM, simultaneously recovering the clustered state space. The penalized estimation maximizes the likelihood regularized by group smooth clipped absolute deviation (SCAD) penalty to all pairwise differences of the transition matrix rows. We establish the asymptotic properties of the penalized estimator. Simulation studies support the outperformance of our proposed method. Its application to protein structure sequence data demonstrates that the CHMM simplifies the classification of protein segments, and consequently improves interpretability.
일시: 2024년 10월 30일(수), 14:00-16:00
장소: 판교 테크노밸리 산업수학혁신센터 세미나실
발표자: 신선영 교수(포항공대)
주요내용: Clustered hidden Markov models
We consider cases where a hidden Markov model (HMM) has so many distinct states that its state space needs a simpler structure. A novel framework we develop, named the clustered hidden Markov model (CHMM), decomposes the state space into sets of clustered states, by adding a hidden process to the HMM. The CHMM consists of two hidden processes and an observable process, where the first hidden process is a Markov process with clustered state representations and the second hidden process has the original state space. The second hidden process is second-order dependent on the first hidden process; that is, the current state of the second hidden process is dependent on the current and previous states of the first hidden process. The CHMM is a new approach to designing the structure of the hidden states while maintaining the information in the HMM. We derive two hidden Markov submodels from the CHMM. The first hidden process and the observable process of the CHMM form a hidden Markov model with additional dependencies while the second hidden process and the observable process of the CHMM form a classical hidden Markov model. A novel finding is that the hidden states corresponding to a clustered state share the same transition probabilities, which are represented as the identical transition matrix rows of the second hidden Markov model. The identical row structure of the transition matrix motivates us to consider the penalization method for learning the CHMM, simultaneously recovering the clustered state space. The penalized estimation maximizes the likelihood regularized by group smooth clipped absolute deviation (SCAD) penalty to all pairwise differences of the transition matrix rows. We establish the asymptotic properties of the penalized estimator. Simulation studies support the outperformance of our proposed method. Its application to protein structure sequence data demonstrates that the CHMM simplifies the classification of protein segments, and consequently improves interpretability.