• Led the implementation of groundbreaking architecture for Zero-shot Online Temporal Action Localization in Mixed Reality application; achieved precise detection of action start and end points along with accurate labels for both seen and novel video categories.
• Integrated Large Language Model with structural text descriptions to significantly enhance the classification results while guiding the visual branch to precisely predict time stamp of the action without accessing the future information and post refinement.
• Applied Pytorch framework with parallel and distributed computing techniques to ensure the capability of handling large model.