MuMiN


A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset

The MuMiN dataset is a challenging misinformation benchmark for automatic misinformation detection models. The dataset is structured as a heterogeneous graph and features 21,565,018 tweets and 1,986,354 users, belonging to 26,048 Twitter threads, discussing 12,914 fact-checked claims from 115 fact-checking organisations in 41 different languages, spanning a decade.

Tasks

The dataset has three different sizes and features two graph classification tasks:

Getting Started

See Getting Started for a quickstart as well as an in-depth tutorial, including the building and training of multiple misinformation classifiers on MuMiN.

Tutorial

We have created a tutorial which takes you through the dataset as well as shows how one could create several kinds of misinformation classifiers on the dataset. The tutorial can be found here.

Leaderboard

See the leaderboard for a list of the best performing models. For new submissions, please email ryan.mcconville@bristol.ac.uk.

Citation

If you use this dataset, you can cite it as follows:

Dan Saattrup Nielsen and Ryan McConville. MuMiN: A large-scale multilingual multimodal fact-checked misinformation social network dataset. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 2022 (to appear). bib

or

Dan Saattrup Nielsen and Ryan McConville. “MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset.” arXiv preprint arXiv:2202.11684 (2022). bib

Other MuMiN repos

You can also check out the following MuMiN-related Github repositories: