The MuMiN dataset is a challenging misinformation benchmark for automatic misinformation detection models. The dataset is structured as a heterogeneous graph and features 21,565,018 tweets and 1,986,354 users, belonging to 26,048 Twitter threads, discussing 12,914 fact-checked claims from 115 fact-checking organisations in 41 different languages, spanning a decade.
The dataset has three different sizes and features two graph classification tasks:
misinformation
or factual
.misinformation
or factual
.See Getting Started for a quickstart as well as an in-depth tutorial, including the building and training of multiple misinformation classifiers on MuMiN.
We have created a tutorial which takes you through the dataset as well as shows how one could create several kinds of misinformation classifiers on the dataset. The tutorial can be found here.
See the leaderboard for a list of the best performing models. For new submissions, please email ryan.mcconville@bristol.ac.uk.
If you use this dataset, you can cite it as follows:
Dan Saattrup Nielsen and Ryan McConville. MuMiN: A large-scale multilingual multimodal fact-checked misinformation social network dataset. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 2022 (to appear). bib
or
Dan Saattrup Nielsen and Ryan McConville. “MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset.” arXiv preprint arXiv:2202.11684 (2022). bib
You can also check out the following MuMiN-related Github repositories:
mumin
, which allows easy compilation
and export of the dataset.