David Romero

foto_perfil.jpg

I’m a PhD student in Computer Vision at MBZUAI, working with Ivan Laptev. Previously I worked with Thamar Solorio at the Ritual Lab in Vision-Language models. Before that, I worked in NLP and Speech processing topics at the Ixa Research Group at the University of the Basque Country, working with Eneko Agirre and at the Universidad Politecnica de Madrid with Luis Fernando D’Haro. I am an Electronic Engineer and I’m originally from Ecuador 🇪🇨.

My current research is focused on large-scale difussion models for world modeling and its applications for Embodied AI. I’m also interested in Vision-Language Models for Video Understanding.

News

Mar 15, 2025 I gave a talk on CVQA at the Adapt Centre, School of Computing, Dublin City University.
Dec 15, 2024 Microsoft has promoted my work CVQA in a blog and a podcast
Oct 15, 2024 CVQA has been accepted to Neurips 2024 Datasets and Benchmarks as an ORAL paper.
Aug 15, 2024 I started my PhD at MBZUAI.
Mar 15, 2024 I gave a talk on Vision-Language Models for Video Understanding at the International Research Experience for Students IRES - University of Houston (UH) and INAOE - 2024
Feb 15, 2024 Q-ViD has been accepted to ACL-Findings 2024.
Sep 15, 2023 I’ve joined the Ritual Lab to work with Thamar Solorio.
Jun 15, 2023 I am one of the authors of a book called “Tos por COVID-19. Caracterización desde la inteligencia artificial” that recently came out.
Nov 01, 2022 I won the HAP-LAP scholarship given by the Ixa Research Group at the University of the Basque Country.
Oct 01, 2022 My paper has been acepted to ICASSP 2022.

Selected publications

  1. cvqa.png
    CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
    David Romero, Chenyang Lyu, and Haryo Akbarianto Wibowo et-al.
    In NeurIPS - ORAL , 2024
  2. qvid.png
    Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
    David Romero and Thamar Solorio
    In ACL Findings , 2024
  3. icasp.png
    Phonotactic Language Recognition Using A Universal Phoneme Recognizer and A Transformer Architecture
    David Romero, Luis Fernando D’Haro, Marcos Estecha-Garitagoitia, and 1 more author
    In ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022

Professional Service

Reviewer: NeurIPS 2025, ICCV 2025, EMNLP 2024.

Talks

Virtual Talk - CVQA Adapt Centre, School of Computing, Dublin City University - 2024
Virtual Talk - Vision-Language Models for Video Understanding International Research Experience for Students IRES - University of Houston (UH) and INAOE - 2024