These XML dump files contain metadata about events logged by the MediaWiki engine that powers each Wikipedia site. The list of events tracked by the system is long (see below) and corresponds to the information stored in Mediawiki's logging database table.

Some examples of logged events include administrative actions performed by admins and other users with special attributions, like blocking/unblocking other editors, page protection/unprotection, page deletion, page moving, granting/revoking privileges to other users, etc.

As a result, these metadata are quite valuable to study the behaviour of Wikipedia power users, as well as to track special actions on Wikipedia pages.

Standard filename format

These files are named using the following format:

File: {lang}-{date}-pages-logging.xml.{compress}

Example: eswiki-20150429-pages-logging.xml.gz

Mandatory elements are shown in {}, while optional elements are shown in []. The meaning of each field is:

  • lang: Identifier of the Wikipedia language. The current convention is to prepend the corresponding ISO 639 code for the language to the term wiki, identifying Wikipedia dumps. Dumps from other Wikimedia projects use their own identifier ("wikiquote", "wikibooks", etc.).
  • date: The date on which the dump file was produced. For large dump files, this date does not correspond to the date of the last revision included in the file (compression and integrity checks may take some time).
  • pages-logging: Identifier of the type of dump file, in this case metadata describing all relevant events logged by MediaWiki on each Wikipedia site.
  • num: In large Wikipedia languages with too many pages and edits, producing a single dump file would be impractical. In these cases, the dump is split in different files, usually by page id in ascending order. Thus, we will need to process all these individual files to recover all metadata about pages and revisions in that language.
  • xml: Type of data file. Currently, dumps of logged events are only provided in XML format.
  • compress: Algorithm used to compress the file. It is customary to use gzip.

File content

Heading elements

The heading elements are identical as for the full history version.

  • <mediawiki>: This is the root element. It provides information about the XML namespace and the URL of the schema describing some of the XML elements (v0.6, still work in progress).
  • <siteinfo>: Includes general information about this Wikipedia site, with the following subelements:
    • <sitename>: Name of this Wikipedia site (in the corresponding language).
    • <base>: Base URL for all pages in this Wikipedia site.
    • <generator>: Version of the MediaWiki engine that produced this dump.
    • <case>: Convention for text case (usually, first letter of each word).
    • <namespaces>: A list of the name (localized for that language) and the internal numerical code for each namespace in this MediaWiki site. Codes are important, since they identify the namespace for each <page> element.

Below you can find XML code snippets with examples for each of these elements.


<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.10/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.10/
  http://www.mediawiki.org/xml/export-0.10.xsd" version="0.10" xml:lang="en">

<siteinfo>
    <sitename>Wikipedia</sitename>
    <dbname>enwiki</dbname>
    <base>http://en.wikipedia.org/wiki/Main_Page</base>
    <generator>MediaWiki 1.25wmf10</generator>
    <case>first-letter</case>
    <namespaces>
      <namespace key="-2" case="first-letter">Media</namespace>
      <namespace key="-1" case="first-letter">Special</namespace>
      <namespace key="0" case="first-letter" />
      <namespace key="1" case="first-letter">Talk</namespace>
      <namespace key="2" case="first-letter">User</namespace>
      [... rest of namespace definitions ...]
    </namespaces>
  </siteinfo>
Namespace Code
Media -2
Special -1
Main (none) 0
Talk 1
User 2
User talk 3
Wikipedia 4
Wikipedia talk 5
File 6
Namespace Code
File talk 7
Mediawiki 8
Mediawiki talk 9
Template 10
Template talk 11
Help 12
Help talk 13
Category 14
Category talk 15

Warning: The above list of namespaces is non-exhaustive. It only shows some of the most common namespaces and their standard codes in Wikipedia. Please refer to the database table namespaces created by WikiDAT for the actual list of namespaces and codes in the Wikipedia dump that you are analyzing.

Main elements

The main body of this XML dump files is a list of <logitem> elements. Each <logitem> block provides metadata describing the special action or event recorded by MediaWiki. Types of events and modalities for each event are identified by the <type> and <action> subelements.

The following tabs show a list of possible events and their modalities.

<type> <action> Description
delete delete Deletion of an entire page (and all revisions).
restore Restore a page (and all revisions) or individual revisions.
revision Deletion of individual revisions.
event Deletion of particular events (for instance, personal attack on another editor).
<type> <action> Description
block block Block an editor for some period or permanently.
unblock Unblock an editor who was previously blocked.
reblock Block again an editor for some period.
<type> <action> Description
protect protect Put a page under protection or semiprotection.
unprotect Release the protection over a page.
modify Change protection parameters (like type or duration).
move_prot Move protection target so that the protected status follows moved pages (keeping protection for the new title.
<type> <action> Description
rights rights Grant/revoke rights on a user.
autopromote Self-assing or self-revoke rights to my own user (for instance, admin self-demotions).
<type> <action> Description
newusers newusers
create
create2
autocreate
byemail
<type> <action> Description
review approve
approve-a
unapprove
approve-i
approve-ia

Warning: The collection of events above is non-exhaustive. More types will be added as soon as WikiDAT includes new features to interpret them.

The format and content of XML elements in this file is:

  • <logitem>: Element containing information about special event logged by MediaWiki).
    • <id>: Positive integer, unique numerical identifier for each logged event.
    • <timestamp>: String, timestamp (date and time) indicating when the event was logged. There is no information about time zones. See code example below for details about the specific format.
    • <contributor>: Metadata about the user who made the event logged by the system. See code example below for details about the specific format.
      • <id>: Unique numerical identifier of the user who made the event logged by the system.
      • <contributor>: Login of the user who made the event recorded by the system.
    • <comment>: Comment from the user, describing the type and motivations to make the special action.
    • <type>: Type of special action logged by the system.
    • <action>: Modality of the special action logged by the system. For some actions, different subtypes can be performed by users (see examples in the tabs above).
    • <logtitle>: String, the full title of the page on which the action was performed (for example, the title of the article that has been protected, or the user page of the editor who has been blocked).
    • <params>: Any additional metadata describing the action logged by the system (for instance, extra information about reviews performed with the flagged-revisions plugin).

<logitem>
    <id>9</id>
    <timestamp>2005-07-02T05:06:42Z</timestamp>
    <contributor>
      <username>Derek Ross</username>
      <id>9</id>
    </contributor>
    <comment>Needed for copyricht purposes</comment>
    <type>protect</type>
    <action>protect</action>
    <logtitle>Wikipedia:Text o the GNU Free Documentation License</logtitle>
    <params xml:space="preserve" />
  </logitem>

Format: YYYY-MM-DDTHH:MM:SSZ
The 'T' character marks end of the date info, 'Z' marks the end of the time info.
Example: <timestamp>2005-06-22T10:17:05Z</timestamp>

<contributor>
  <username>MyLogin Name</username>
  <id>3344555</id>
</contributor>

Example content

The following code snippet shows a complete excerpt of the content in one of these files:

<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.10/" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.10/ 
http://www.mediawiki.org/xml/export-0.10.xsd" version="0.10" xml:lang="sco">
  <siteinfo>
    <sitename>Wikipedia</sitename>
    <dbname>scowiki</dbname>
    <base>http://sco.wikipedia.org/wiki/Main_Page</base>
    <generator>MediaWiki 1.25wmf24</generator>
    <case>first-letter</case>
    <namespaces>
      <namespace key="-2" case="first-letter">Media</namespace>
      <namespace key="-1" case="first-letter">Special</namespace>
      <namespace key="0" case="first-letter" />
      <namespace key="1" case="first-letter">Talk</namespace>
      <namespace key="2" case="first-letter">User</namespace>
      <namespace key="3" case="first-letter">User talk</namespace>
      <namespace key="4" case="first-letter">Wikipedia</namespace>
      <namespace key="5" case="first-letter">Wikipedia talk</namespace>
      <namespace key="6" case="first-letter">File</namespace>
      <namespace key="7" case="first-letter">File talk</namespace>
      <namespace key="8" case="first-letter">MediaWiki</namespace>
      <namespace key="9" case="first-letter">MediaWiki talk</namespace>
      <namespace key="10" case="first-letter">Template</namespace>
      <namespace key="11" case="first-letter">Template talk</namespace>
      <namespace key="12" case="first-letter">Help</namespace>
      <namespace key="13" case="first-letter">Help talk</namespace>
      <namespace key="14" case="first-letter">Category</namespace>
      <namespace key="15" case="first-letter">Category talk</namespace>
      <namespace key="100" case="first-letter">Portal</namespace>
      <namespace key="101" case="first-letter">Portal talk</namespace>
      <namespace key="828" case="first-letter">Module</namespace>
      <namespace key="829" case="first-letter">Module talk</namespace>
    </namespaces>
  </siteinfo>
  <logitem>
    <id>1</id>
    <timestamp>2005-06-22T12:18:33Z</timestamp>
    <contributor>
      <username>Jdforrester</username>
      <id>6</id>
    </contributor>
    <comment>From [[en:Image:Flag of Scotland Pantone300.png]].</comment>
    <type>upload</type>
    <action>upload</action>
    <logtitle>File:Flag of Scotland Pantone300.png</logtitle>
    <params xml:space="preserve" />
  </logitem>
  <logitem>
    <id>2</id>
    <timestamp>2005-06-22T20:57:46Z</timestamp>
    <contributor>
      <username>Jock</username>
      <id>11</id>
    </contributor>
    <type>upload</type>
    <action>upload</action>
    <logtitle>File:ScotsPairlament.png</logtitle>
    <params xml:space="preserve" />
  </logitem>
  <logitem>
    <id>3</id>
    <timestamp>2005-06-22T21:11:19Z</timestamp>
    <contributor>
      <username>Jock</username>
      <id>11</id>
    </contributor>
    <type>upload</type>
    <action>upload</action>
    <logtitle>File:ScotsPairlament.png</logtitle>
    <params xml:space="preserve" />
  </logitem>
  <logitem>
    <id>4</id>
    <timestamp>2005-06-26T13:01:52Z</timestamp>
    <contributor>
      <username>Colin Angus Mackay</username>
      <id>20</id>
    </contributor>
    <comment>labelled map of scotland</comment>
    <type>upload</type>
    <action>upload</action>
    <logtitle>File:ScotlandLabelled.png</logtitle>
    <params xml:space="preserve" />
  </logitem>