History dump files with metadata describing all edits performed by any user on any page in a Wikipedia language site. We can find the complete history of edits for pages in all namespaces, not only encyclopedic articles (Talk, User, User talk, etc.).

These dump files do not contain the text for every revision. Instead, only descriptive metadata fields are provided. Hence, the size of these files can be quite smaller compared to the full history (pages-meta-history files).

Standard filename format

These files are named using the following format:

File: {lang}-{date}-stub-meta-history[num].xml.{compress}

Example: eswiki-20150429-stub-meta-history1.xml.gz

Mandatory elements are shown in {}, while optional elements are shown in []. The meaning of each field is:

  • lang: Identifier of the Wikipedia language. The current convention is to prepend the corresponding ISO 639 code for the language to the term wiki, identifying Wikipedia dumps. Dumps from other Wikimedia projects use their own identifier ("wikiquote", "wikibooks", etc.).
  • date: The date on which the dump file was produced. For large dump files, this date does not correspond to the date of the last revision included in the file (compression and integrity checks may take some time).
  • stub-meta-history: Identifier of the type of dump file, in this case historical metadata of all edits on all pages in a Wikipedia language (text content not included).
  • num: In large Wikipedia languages with too many pages and edits, producing a single dump file would be impractical. In these cases, the dump is split in different files, usually by page id in ascending order. Thus, we will need to process all these individual files to recover all metadata about pages and revisions in that language.
  • xml: Type of data file. Currently, edit dumps are only provided in XML format.
  • compress: Algorithm used to compress the file. It is customary to use gzip , as these files are smaller than the full history version and the compression algorithm is fast.

File content

Heading elements

The heading elements are identical as for the full history version.

  • <mediawiki>: This is the root element. It provides information about the XML namespace and the URL of the schema describing some of the XML elements (v0.6, still work in progress).
  • <siteinfo>: Includes general information about this Wikipedia site, with the following subelements:
    • <sitename>: Name of this Wikipedia site (in the corresponding language).
    • <base>: Base URL for all pages in this Wikipedia site.
    • <generator>: Version of the MediaWiki engine that produced this dump.
    • <case>: Convention for text case (usually, first letter of each word).
    • <namespaces>: A list of the name (localized for that language) and the internal numerical code for each namespace in this MediaWiki site. Codes are important, since they identify the namespace for each <page> element.

Below you can find XML code snippets with examples for each of these elements.


<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.10/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.10/
  http://www.mediawiki.org/xml/export-0.10.xsd" version="0.10" xml:lang="en">

<siteinfo>
    <sitename>Wikipedia</sitename>
    <dbname>enwiki</dbname>
    <base>http://en.wikipedia.org/wiki/Main_Page</base>
    <generator>MediaWiki 1.25wmf10</generator>
    <case>first-letter</case>
    <namespaces>
      <namespace key="-2" case="first-letter">Media</namespace>
      <namespace key="-1" case="first-letter">Special</namespace>
      <namespace key="0" case="first-letter" />
      <namespace key="1" case="first-letter">Talk</namespace>
      <namespace key="2" case="first-letter">User</namespace>
      [... rest of namespace definitions ...]
    </namespaces>
  </siteinfo>
Namespace Code
Media -2
Special -1
Main (none) 0
Talk 1
User 2
User talk 3
Wikipedia 4
Wikipedia talk 5
File 6
Namespace Code
File talk 7
Mediawiki 8
Mediawiki talk 9
Template 10
Template talk 11
Help 12
Help talk 13
Category 14
Category talk 15

Warning: The above list of namespaces is non-exhaustive. It only shows some of the most common namespaces and their standard codes in Wikipedia. Please refer to the database table namespaces created by WikiDAT for the actual list of namespaces and codes in the Wikipedia dump that you are analyzing.

Main elements

The main body of this XML dump files is a list of <page> elements. Each <page> block shows descriptive information about a wiki page, along with the complete collection of all <revision> elements (edit actions) performed on that page over time.

The main difference with respect to the full history version files is that <text> elements inside <revision> do not contain any text. However, additional attributes provide metadata about the text (in particular, its length in bytes).

  • <page>: Element containing information about a wiki page and its complete collection of edits (as a sublist of <revision> elements).
    • <title>: String, the title of this page.
    • <ns>: Integer, the code of the namespace in which this page is stored.
    • <id>: Positive integer, unique numerical identifier for this page.
    • <revision>: Element encapsulating information about a single edit on this page. For each page element, there will be a list of subelements of this type describing the complete record of all edits performed on this page.
  • <revision>: An edit performed on the corresponding page, indicated by its parent <page> element.
    • <id>: Positive integer, unique numerical identifier for this revision. This identifier is globally unique (not within this page, but for the entire database).
    • <parentid>: Positive integer, identifier of the previous revision (the parent of this version), following the same coding as for <id>. If absent, this is the first revision for this page and the parent revision is assumed to be NULL.
    • <timestamp>: String, timestamp (date and time) indicating when this revision was performed. There is no information about time zones. See example below for details about the specific format.
    • <contributor>: Element showing information about the user who performed this revision. There are two options:
      • Anonymous editor: Only the IP address is provided, as a subelement.
      • Registered editor: Both the unique numerical identifier of the user (for the whole database) and the login name of the user are provided as subelements. See the example code below.
    • <minor />: If present, this single tag indicates that this is not a minor revision. If absent, this is a minor revision.
    • <comment>: String, contains the comment introduced by the user to summarize the changes introduced in this revision.
    • <model>: String, identifies the model to interpret the content inside <text>.
    • <format>: String, indicates the format to parse the wiki text inside <text>.
    • <text>: Empty element, with the following attributes:
      • id: Same id of the revision.
      • bytes: Length of the text of the page on this revision, in bytes.
    • <sha1>: String, provides the SHA-1 hash computed on the text content of this revision.


<page>
    <title>Inglis leid</title>
    <ns>0</ns>
    <id>2</id>
    <revision>
    [... metadata and content of first revision ...]
    </revision>
    <revision>
    [... metadata and content of second revision ...]
    </revision>
    [... rest of revisions for this page ...]
  </page>

<revision>
  <id>7</id>
  <timestamp>2005-06-22T10:17:05Z</timestamp>
  <contributor>
    <ip>24.251.198.251</ip>
  </contributor>
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text id="7" bytes="334" />
  <sha1>6m5yxiaalrm6te7e3x3fiw1aq7wk9ir</sha1>
</revision>

Format: YYYY-MM-DDTHH:MM:SSZ
The 'T' character marks end of the date info, 'Z' marks the end of the time info.
Example: <timestamp>2005-06-22T10:17:05Z</timestamp>

<contributor>
  <ip>24.251.243.233</ip>
</contributor>

<contributor>
  <username>MyLogin Name</username>
  <id>3344555</id>
</contributor>

Example content

The following code snippet shows a complete excerpt of the content in one of these files:

<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.10/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.10/ http://www.mediawiki.org/xml/export-0.10.xsd" version="0.10" xml:lang="sco">
  <siteinfo>
    <sitename>Wikipedia</sitename>
    <dbname>scowiki</dbname>
    <base>http://sco.wikipedia.org/wiki/Main_Page</base>
    <generator>MediaWiki 1.25wmf24</generator>
    <case>first-letter</case>
    <namespaces>
      <namespace key="-2" case="first-letter">Media</namespace>
      <namespace key="-1" case="first-letter">Special</namespace>
      <namespace key="0" case="first-letter" />
      <namespace key="1" case="first-letter">Talk</namespace>
      <namespace key="2" case="first-letter">User</namespace>
      <namespace key="3" case="first-letter">User talk</namespace>
      <namespace key="4" case="first-letter">Wikipedia</namespace>
      <namespace key="5" case="first-letter">Wikipedia talk</namespace>
      <namespace key="6" case="first-letter">File</namespace>
      <namespace key="7" case="first-letter">File talk</namespace>
      <namespace key="8" case="first-letter">MediaWiki</namespace>
      <namespace key="9" case="first-letter">MediaWiki talk</namespace>
      <namespace key="10" case="first-letter">Template</namespace>
      <namespace key="11" case="first-letter">Template talk</namespace>
      <namespace key="12" case="first-letter">Help</namespace>
      <namespace key="13" case="first-letter">Help talk</namespace>
      <namespace key="14" case="first-letter">Category</namespace>
      <namespace key="15" case="first-letter">Category talk</namespace>
      <namespace key="100" case="first-letter">Portal</namespace>
      <namespace key="101" case="first-letter">Portal talk</namespace>
      <namespace key="828" case="first-letter">Module</namespace>
      <namespace key="829" case="first-letter">Module talk</namespace>
    </namespaces>
  </siteinfo>
  <page>
    <title>Inglis leid</title>
    <ns>0</ns>
    <id>2</id>
    <revision>
      <id>7</id>
      <timestamp>2005-06-22T10:17:05Z</timestamp>
      <contributor>
        <ip>24.251.198.251</ip>
      </contributor>
      <model>wikitext</model>
      <format>text/x-wiki</format>
      <text id="7" bytes="334" />
      <sha1>6m5yxiaalrm6te7e3x3fiw1aq7wk9ir</sha1>
    </revision>
    <revision>
      <id>8</id>
      <parentid>7</parentid>
      <timestamp>2005-06-22T12:13:55Z</timestamp>
      <contributor>
        <username>Saforrest</username>
        <id>5</id>
      </contributor>
      <minor/>
      <model>wikitext</model>
      <format>text/x-wiki</format>
      <text id="8" bytes="351" />
      <sha1>p09d2l9c1gc8tat2e3x2e72o9fxori2</sha1>
    </revision>
  </page>
</mediawiki>