PhD Defense: Natural Language Visual Grounding via Multimodal Learning

20 January 2020

Abstract: Natural language visual grounding aims to locate target objects within images given natural language queries, and natural language visual grounding can create a natural communication channel between human beings, physical environments, and intelligent agents. This thesis aims to exploit multimodal learning-based approaches for natural language visual grounding and achieve natural language visual grounding without auxiliary information, such as dialogues and gestures. To this end, this thesis proposes three architectures to ground explicit natural language queries and intention-related natural language instructions

Monday, 20 January 2020, 12:30, Seminar Room F334, Informatikum, Hamburg

Speaker: Jinpeng Mi, Informatics Dept., Univ. Hamburg