Afri-MCQA: Multimodal Cultural Question Answering for African Languages
Abstract
Afri-MCQA benchmark demonstrates poor performance of open-weight LLMs in African languages, highlighting the need for culturally grounded pretraining and speech-first approaches in AI development.
Africa is home to over one-third of the world's languages, yet remains underrepresented in AI research. We introduce Afri-MCQA, the first Multilingual Cultural Question-Answering benchmark covering 7.5k Q&A pairs across 15 African languages from 12 countries. The benchmark offers parallel English-African language Q&A pairs across text and speech modalities and was entirely created by native speakers. Benchmarking large language models (LLMs) on Afri-MCQA shows that open-weight models perform poorly across evaluated cultures, with near-zero accuracy on open-ended VQA when queried in native language or speech. To evaluate linguistic competence, we include control experiments meant to assess this specific aspect separate from cultural knowledge, and we observe significant performance gaps between native languages and English for both text and speech. These findings underscore the need for speech-first approaches, culturally grounded pretraining, and cross-lingual cultural transfer. To support more inclusive multimodal AI development in African languages, we release our Afri-MCQA under academic license or CC BY-NC 4.0 on HuggingFace (https://huggingface.co/datasets/Atnafu/Afri-MCQA)
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG (2025)
- Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries (2025)
- HinTel-AlignBench: A Framework and Benchmark for Hindi-Telugu with English-Aligned Samples (2025)
- Can LLMs Solve My Grandma's Riddle? Evaluating Multilingual Large Language Models on Reasoning Traditional Bangla Tricky Riddles (2025)
- IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages (2025)
- See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models (2025)
- Multilingual VLM Training: Adapting an English-Trained VLM to French (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper