본문 바로가기 주메뉴 바로가기
검색 검색영역닫기 검색 검색영역닫기 ENGLISH 메뉴 전체보기 메뉴 전체보기

학술행사

세미나

ICIM 연구교류 세미나(8.21.수)

등록일자 : 2024-08-09

https://www.nims.re.kr/icim/post/event/1083

  • 발표자  박예찬 박사(고등과학원)
  • 개최일시  2024-08-21 10:30-12:30
  • 장소  국가수리과학연구소 산업수학혁신센터(판교)
  1. 일시: 2024년 8월 21일(수), 10:30-12:30

  2. 장소: 판교 테크노밸리 산업수학혁신센터 세미나실

    • 경기 성남시 수정구 대왕판교로 815, 기업지원허브 231호 국가수리과학연구소
    • 무료주차는 2시간 지원됩니다.
  3. 발표자: 박예찬 박사(고등과학원)

  4. 주요내용: Understanding and Acceleration of Grokking Phenomena in Learning Arithmetic Operations via Kolmogorov-Arnold Representation

We propose novel methodologies aimed at accelerating the grokking phenomenon, which refers to the rapid increment of test accuracy after a long period of overfitting as reported in~\cite{power2022grokking}. Focusing on the grokking phenomenon that arises in learning arithmetic binary operations via the transformer model, we begin with a discussion on data augmentation in the case of commutative binary operations. To further accelerate, we elucidate arithmetic operations through the lens of the Kolmogorov-Arnold (KA) representation theorem, revealing its correspondence to the transformer architecture: embedding, decoder block, and classifier. Observing the shared structure between KA representations associated with binary operations, we suggest various transfer learning mechanisms that expedite grokking. This interpretation is substantiated through a series of rigorous experiments. In addition, our approach is successful in learning two nonstandard arithmetic tasks: composition of operations and a system of equations. Furthermore, we reveal that the model is capable of learning arithmetic operations using a limited number of tokens under embedding transfer, which is supported by a set of experiments as well.

유튜브 스트리밍 예정입니다.

  1. 일시: 2024년 8월 21일(수), 10:30-12:30

  2. 장소: 판교 테크노밸리 산업수학혁신센터 세미나실

    • 경기 성남시 수정구 대왕판교로 815, 기업지원허브 231호 국가수리과학연구소
    • 무료주차는 2시간 지원됩니다.
  3. 발표자: 박예찬 박사(고등과학원)

  4. 주요내용: Understanding and Acceleration of Grokking Phenomena in Learning Arithmetic Operations via Kolmogorov-Arnold Representation

We propose novel methodologies aimed at accelerating the grokking phenomenon, which refers to the rapid increment of test accuracy after a long period of overfitting as reported in~\cite{power2022grokking}. Focusing on the grokking phenomenon that arises in learning arithmetic binary operations via the transformer model, we begin with a discussion on data augmentation in the case of commutative binary operations. To further accelerate, we elucidate arithmetic operations through the lens of the Kolmogorov-Arnold (KA) representation theorem, revealing its correspondence to the transformer architecture: embedding, decoder block, and classifier. Observing the shared structure between KA representations associated with binary operations, we suggest various transfer learning mechanisms that expedite grokking. This interpretation is substantiated through a series of rigorous experiments. In addition, our approach is successful in learning two nonstandard arithmetic tasks: composition of operations and a system of equations. Furthermore, we reveal that the model is capable of learning arithmetic operations using a limited number of tokens under embedding transfer, which is supported by a set of experiments as well.

유튜브 스트리밍 예정입니다.

이 페이지에서 제공하는 정보에 대해 만족하십니까?