These XML dump files contain metadata about events logged by the MediaWiki engine that
powers each Wikipedia site. The list of events tracked by the system is long (see
below) and corresponds to the information stored in Mediawiki's logging
database table.
Some examples of logged events include administrative actions performed by admins and other users with special attributions, like blocking/unblocking other editors, page protection/unprotection, page deletion, page moving, granting/revoking privileges to other users, etc.
As a result, these metadata are quite valuable to study the behaviour of Wikipedia power users, as well as to track special actions on Wikipedia pages.
These files are named using the following format:
File: {lang}-{date}-pages-logging.xml.{compress}
Example: eswiki-20150429-pages-logging.xml.gz
Mandatory elements are shown in {}, while optional elements are shown in []. The meaning of each field is:
The heading elements are identical as for the full history version.
<mediawiki>
: This is the root element. It provides
information about the XML namespace and the URL of the schema describing
some of the XML elements (v0.6, still work in progress).
<siteinfo>
: Includes general information about
this Wikipedia site, with the following subelements:
<sitename>
: Name of this Wikipedia site (in
the corresponding language).
<base>
: Base URL for all pages in this
Wikipedia site.
<generator>
: Version of the MediaWiki engine
that produced this dump.
<case>
: Convention for text case (usually,
first letter of each word).
<namespaces>
: A list of the name (localized
for that language) and the internal numerical code for each namespace
in this MediaWiki site. Codes are important, since they identify the
namespace for each <page>
element.
Below you can find XML code snippets with examples for each of these elements.
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.10/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.10/
http://www.mediawiki.org/xml/export-0.10.xsd" version="0.10" xml:lang="en">
<siteinfo>
<sitename>Wikipedia</sitename>
<dbname>enwiki</dbname>
<base>http://en.wikipedia.org/wiki/Main_Page</base>
<generator>MediaWiki 1.25wmf10</generator>
<case>first-letter</case>
<namespaces>
<namespace key="-2" case="first-letter">Media</namespace>
<namespace key="-1" case="first-letter">Special</namespace>
<namespace key="0" case="first-letter" />
<namespace key="1" case="first-letter">Talk</namespace>
<namespace key="2" case="first-letter">User</namespace>
[... rest of namespace definitions ...]
</namespaces>
</siteinfo>
Namespace | Code |
---|---|
Media | -2 |
Special | -1 |
Main (none) | 0 |
Talk | 1 |
User | 2 |
User talk | 3 |
Wikipedia | 4 |
Wikipedia talk | 5 |
File | 6 |
Namespace | Code |
---|---|
File talk | 7 |
Mediawiki | 8 |
Mediawiki talk | 9 |
Template | 10 |
Template talk | 11 |
Help | 12 |
Help talk | 13 |
Category | 14 |
Category talk | 15 |
Warning: The above list of namespaces is non-exhaustive. It only shows some of the most common namespaces and their standard codes in Wikipedia. Please refer to the database table namespaces created by WikiDAT for the actual list of namespaces and codes in the Wikipedia dump that you are analyzing.
The main body of this XML dump files is a list of <logitem>
elements.
Each <logitem>
block provides metadata describing the special action or
event recorded by MediaWiki. Types of events and modalities for each event are identified
by the <type>
and <action>
subelements.
The following tabs show a list of possible events and their modalities.
<type> | <action> | Description |
---|---|---|
delete | delete | Deletion of an entire page (and all revisions). |
restore | Restore a page (and all revisions) or individual revisions. | |
revision | Deletion of individual revisions. | |
event | Deletion of particular events (for instance, personal attack on another editor). |
<type> | <action> | Description |
---|---|---|
block | block | Block an editor for some period or permanently. |
unblock | Unblock an editor who was previously blocked. | |
reblock | Block again an editor for some period. |
<type> | <action> | Description |
---|---|---|
protect | protect | Put a page under protection or semiprotection. |
unprotect | Release the protection over a page. | |
modify | Change protection parameters (like type or duration). | |
move_prot | Move protection target so that the protected status follows moved pages (keeping protection for the new title. |
<type> | <action> | Description |
---|---|---|
rights | rights | Grant/revoke rights on a user. |
autopromote | Self-assing or self-revoke rights to my own user (for instance, admin self-demotions). |
<type> | <action> | Description |
---|---|---|
newusers | newusers | |
create | ||
create2 | ||
autocreate | ||
byemail |
<type> | <action> | Description |
---|---|---|
review | approve | |
approve-a | ||
unapprove | ||
approve-i | ||
approve-ia |
Warning: The collection of events above is non-exhaustive. More types will be added as soon as WikiDAT includes new features to interpret them.
The format and content of XML elements in this file is:
<logitem>
: Element containing
information about special event logged by MediaWiki).
<id>
: Positive integer,
unique numerical identifier for each logged event.
<timestamp>
: String,
timestamp (date and time) indicating when the event was logged. There
is no information about time zones. See code example below for details about
the specific format.
<contributor>
: Metadata
about the user who made the event logged by the
system. See code example below for details about the specific format.
<id>
: Unique
numerical identifier of the user who made the event logged by the
system.
<contributor>
: Login
of the user who made the event recorded by the system.
<comment>
: Comment
from the user, describing the type and motivations to make the
special action.
<type>
: Type of
special action logged by the system.
<action>
: Modality
of the special action logged by the system. For some actions, different
subtypes can be performed by users (see examples in the tabs above).
<logtitle>
: String,
the full title of the page on which the action was performed (for example,
the title of the article that has been protected, or the user page of the
editor who has been blocked).
<params>
: Any additional
metadata describing the action logged by the system (for instance, extra
information about reviews performed with the flagged-revisions
plugin).
<logitem>
<id>9</id>
<timestamp>2005-07-02T05:06:42Z</timestamp>
<contributor>
<username>Derek Ross</username>
<id>9</id>
</contributor>
<comment>Needed for copyricht purposes</comment>
<type>protect</type>
<action>protect</action>
<logtitle>Wikipedia:Text o the GNU Free Documentation License</logtitle>
<params xml:space="preserve" />
</logitem>
Format: YYYY-MM-DDTHH:MM:SSZ
The 'T' character marks end of the date info, 'Z' marks the end of the time info.
Example: <timestamp>2005-06-22T10:17:05Z</timestamp>
<contributor>
<username>MyLogin Name</username>
<id>3344555</id>
</contributor>
The following code snippet shows a complete excerpt of the content in one of these files:
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.10/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.10/
http://www.mediawiki.org/xml/export-0.10.xsd" version="0.10" xml:lang="sco">
<siteinfo>
<sitename>Wikipedia</sitename>
<dbname>scowiki</dbname>
<base>http://sco.wikipedia.org/wiki/Main_Page</base>
<generator>MediaWiki 1.25wmf24</generator>
<case>first-letter</case>
<namespaces>
<namespace key="-2" case="first-letter">Media</namespace>
<namespace key="-1" case="first-letter">Special</namespace>
<namespace key="0" case="first-letter" />
<namespace key="1" case="first-letter">Talk</namespace>
<namespace key="2" case="first-letter">User</namespace>
<namespace key="3" case="first-letter">User talk</namespace>
<namespace key="4" case="first-letter">Wikipedia</namespace>
<namespace key="5" case="first-letter">Wikipedia talk</namespace>
<namespace key="6" case="first-letter">File</namespace>
<namespace key="7" case="first-letter">File talk</namespace>
<namespace key="8" case="first-letter">MediaWiki</namespace>
<namespace key="9" case="first-letter">MediaWiki talk</namespace>
<namespace key="10" case="first-letter">Template</namespace>
<namespace key="11" case="first-letter">Template talk</namespace>
<namespace key="12" case="first-letter">Help</namespace>
<namespace key="13" case="first-letter">Help talk</namespace>
<namespace key="14" case="first-letter">Category</namespace>
<namespace key="15" case="first-letter">Category talk</namespace>
<namespace key="100" case="first-letter">Portal</namespace>
<namespace key="101" case="first-letter">Portal talk</namespace>
<namespace key="828" case="first-letter">Module</namespace>
<namespace key="829" case="first-letter">Module talk</namespace>
</namespaces>
</siteinfo>
<logitem>
<id>1</id>
<timestamp>2005-06-22T12:18:33Z</timestamp>
<contributor>
<username>Jdforrester</username>
<id>6</id>
</contributor>
<comment>From [[en:Image:Flag of Scotland Pantone300.png]].</comment>
<type>upload</type>
<action>upload</action>
<logtitle>File:Flag of Scotland Pantone300.png</logtitle>
<params xml:space="preserve" />
</logitem>
<logitem>
<id>2</id>
<timestamp>2005-06-22T20:57:46Z</timestamp>
<contributor>
<username>Jock</username>
<id>11</id>
</contributor>
<type>upload</type>
<action>upload</action>
<logtitle>File:ScotsPairlament.png</logtitle>
<params xml:space="preserve" />
</logitem>
<logitem>
<id>3</id>
<timestamp>2005-06-22T21:11:19Z</timestamp>
<contributor>
<username>Jock</username>
<id>11</id>
</contributor>
<type>upload</type>
<action>upload</action>
<logtitle>File:ScotsPairlament.png</logtitle>
<params xml:space="preserve" />
</logitem>
<logitem>
<id>4</id>
<timestamp>2005-06-26T13:01:52Z</timestamp>
<contributor>
<username>Colin Angus Mackay</username>
<id>20</id>
</contributor>
<comment>labelled map of scotland</comment>
<type>upload</type>
<action>upload</action>
<logtitle>File:ScotlandLabelled.png</logtitle>
<params xml:space="preserve" />
</logitem>