Vision-and-Language Grounding

Speaker: Dr Qi Wu

Abstract: Vision-and-Language Navigation is a recently raised research direction which has attracted a lot of attention from the computer vision, natural language processing and robotics communities. We lighted up this direction in 2018 by proposing the first benchmarked VLN task and dataset, known as Room-to-Room (R2R). Now two years passed, many new models and datasets are proposed, including our recently released REVERIE (Remote Embodied Visual Referring Expression in Real Indoor Environments). In this talk, I will first present the original VLN task and dataset and then discuss some of our recently proposed methods based on it. I will also introduce our REVERIE dataset and show a new general model that can solve all the VLN tasks in a single framework.


Speaker: Dr Abhinav Dhall

Abstract: Availability of image and video manipulation software have made it easier to create deepfake videos. In this work, we analyse the effectiveness of human implicit signals for aiding deepfake content analysis. We will present user-centric and content-centric approaches for detecting fake videos based on user gaze, audio and video signals. Furthermore, we will show how to localise the manipulation in time.