About me:
I'm an Ecuadorian Electronic Engineer working on AI. Currently I'm a PhD Student in Computer Vision at MBZUAI working with Ivan Laptev. Previously I worked as a research assistant with Thamar Solorio in Vision-Language models. Before that, I worked in CV, NLP and Speech processing topics in different research groups, in Ecuador at the Universidad Politecnica Salesiana , in Spain in the Speech Technology and Machine Learning Group at the Universidad Politecnica de Madrid with Luis Fernando D'Haro and in the Ixa Research Group at the University of the Basque Country with Eneko Agirre
News
Selected Publications
Please see myGoogle Scholar for all my publications
"CVQA: Culturally-diverse Multilingual
Visual Question Answering Benchmark".
NeurIPS 2024 Datasets and Benchmarks - ORAL Presentation
D.Romero, Chenyang Liu, Haryo Akbarianto Wibowo, Thamar Solorio, Alham Fikri Aji
Vancouver, Canada
[PDF]
"Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering".
Findings of the Association for Computational Linguistics ACL 2024
D.Romero, T.Solorio
Bangkok, Thailand
[PDF]
"Phonotactic Language Recognition using a Universal Phoneme Recognizer
and a Transformer Architecture".
International Conference on Acoustics, Speech, & Signal Processing - ICASSP 2022
D.Romero, L.F.D'Haro, C. Salamea
Singapore
[PDF]
[POSTER]
“Exploring Transformer-based Language Recognition using Phonotactic
Information".
Iberspeech 2021
D.Romero, L.F.D'Haro, C. Salamea
Spain - Valladolid
[PDF]
"Convolutional Models for the Detection of Firearms
in Surveillance Videos".
Applied Science – MDPI 2019
D.Romero, C. Salamea
Basel - Switzerland
[PDF]
[VIDEO]