• Project name: WikiDAT.
  • Main author: Felipe Ortega.
  • License: GPLv3.

WikiDAT (pronounced Wiki-D-A-T) is a project led and maintained by Felipe Ortega. Originally, it was created to support my PhD. thesis and solve the problem of extracting historical data from the public dump files provided by Wikimedia Foundation.

Over the last years, I have added new functionalities, like support for different types of dump files, or speeding up data recovery and preparation through multiprocessing techniques (see examples on this site and the project's wiki on GitHub for additional details). Another important goal is to support as many different Wikipedia languages as possible, since too many previous studies have only focused on the English Wikipedia. For instance, WikiDAT can detect featured articles in more than 39 different Wikipedia languages.

Finally, due to the lack of documentation describing Wikipedia dump files (content, format, caveats to interpret fields, etc.) I have included detailed information about them on this site. This information may help other researchers, practitioners or data scientists working with Wikipedia data.

Please feel free to contact me with any comments or suggestions. You can also help reporting any bugs you may find, or asking for new features through the project's issue tracker on GitHub.


Felipe Ortega
Office 041. Departamental II
University Rey Juan Carlos
Tulipán s/n. 28935
Móstoles, Madrid. SPAIN.