JournalEngineering Applications of Artificial Intelligence (0952-1976), 134(2024), 108682 ~ -
Enrollment typeSCIE
publication date 20240801
In this paper, we present a unified spatio-temporal attention MixFormer framework for visual object tracking. Within the vision transformer framework, we design a cohesive network consisting of target template and search region feature extraction, cross-attention utilizing spatial and temporal information, and task-specific heads, all operating in an end-to-end manner.