Jump to content

Help talk:Export

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 5 years ago by Incnis Mrsi in topic Exporting page logs?

For demo

[edit]

first line
second line

new paragraph

  • a
  • b
  • c

end of list

some more

After creating this page and one more edit the XML-source was copied to the wikitext of this talk page:

<?xml version="1.0" encoding="UTF-8" ?> <mediawiki version="0.1" xml:lang="en">

 <page>
   <title>Help talk:Export</title>
   <revision>
     <timestamp>2005-05-06T11:59:48Z</timestamp>
     <contributor><username>Patrick</username></contributor>
     <comment>for demo</comment>
     <text>For demo:

first line<br>second line

new paragraph

  • a
  • b
  • c

end of list</text>

   </revision>
   <revision>
     <timestamp>2005-05-06T12:00:09Z</timestamp>
     <contributor><username>Patrick</username></contributor>
     <text>For demo:

first line<br>second line

new paragraph

  • a
  • b
  • c

end of list

some more</text>

   </revision>
 </page>

</mediawiki>

and also the text rendered by the browser:

 <?xml version="1.0" encoding="UTF-8" ?> 

- <mediawiki version="0.1" xml:lang="en"> - <page>

 <title>Help talk:Export</title> 

- <revision>

 <timestamp>2005-05-06T11:59:48Z</timestamp> 

- <contributor>

 <username>Patrick</username> 
 </contributor>
 <comment>for demo</comment> 
 <text>For demo: first line
second line new paragraph *a *b *c end of list</text> </revision>

- <revision>

 <timestamp>2005-05-06T12:00:09Z</timestamp> 

- <contributor>

 <username>Patrick</username> 
 </contributor>
 <text>For demo: first line
second line new paragraph *a *b *c end of list some more</text> </revision> </page> </mediawiki>

After copying the first result to the edit box, we get:

For demo: first line
second line

new paragraph

a b c end of list

some more

Caveats

[edit]

Caveats: should namespaces be the text, or symbolic names? Or should we leave them out entirely and let the parser deal with such a thing?

The parser needs to know the namespace's prefixes of the article's language anyway to parse the article content so it does not matter. BTW cur_counter is missing. -- Nichtich 18:09, 1 Dec 2003 (UTC)

I'm sorry, but [I] still can't get how can I e.g. export all the pages? What should I write in query window at Special:Export?

I am having the same problem. How does it work? --Donrob 08:40, 3 Jan 2005 (UTC)

Seems like instruction for import is needed too.

Yes! It is definetely needed --65.94.224.235 17:14, 20 Jan 2005 (UTC)

To start with an explanation on how to export all pages would be nice. --146.50.205.252 00:54, 1 Feb 2005 (UTC)

First, write a script that expports all pages. --brion 01:37, 2 Feb 2005 (UTC)

To my knowledge exporting all wiki pages involves several manual steps

  • get a list of all pages with specialAllPages
  • save that list into a text file
  • replace tab characters with newline (e.g. with sed) giving a list of all wiki pages, each page in a separate line
  • paste that list into special:export
  • ensure you save the XML, not the HTML representation

Support for import in another wiki seems to be on the list of tasks for mediawiki 1.5.

Some thoughts

[edit]

Sorry, these were two stupid thoughts. What about this: How can I access the "oldid" of a revision? It would be nice to have it for linking revisions to the Wikipedia like it is done in "history". --Vlado 11:53, 9 May 2005 (UTC)Reply

I think the oldid on the other system is not necessarily the same, it should be assigned by the importing system.--Patrick 12:47, 9 May 2005 (UTC)Reply
Or perhaps you mean importing the current versions, plus the history pages, but not the old version pages themselves. That I don't know.--Patrick 13:00, 9 May 2005 (UTC)Reply
There was code for including the page and revision ID numbers in the export file, but it wasn't normally enabled before. It will be on by default in 1.5, see eg [1]. (Also every revision, including the current, will have an id number. This wasn't previously the case.) Of course if you were importing pages from one wiki to a different one, the final id number would be whatever gets assigned there rather than the original wiki's id number. --brion 21:03, 9 May 2005 (UTC)Reply

If an error happens during export, it quietly inserts an error message in your XML, which you might not necessarily notice. Not sure if there's an easy way to improve that.

On my mediawiki installation such an error ocurrs if you have any blank lines in the text box, but I notice that's been improved on this wiki for example -- Harry Wood 12:07, 30 November 2005 (UTC)Reply

Python script to download all namespaces

[edit]

You can also try Victor Stinner script to download all pages:

wget http://svn.berlios.de/viewvc/*checkout*/happyboom/haypo/mediawiki/import_pages.py?revision=260&pathrev=468
python import_pages.py http://www.example.com/wiki/

(where svn is SubVersion)

It will create a subdirectory containing files like "namespace_0.xml". TODO:

  • Support page list (Special:Allpages) on several subpages.
  • Support split into small files (for fast re-import).
  • Support "only last version" import.

This script doesn't seem to work any more, are there any later revisions, it seems like that source tree is gone? 85.229.120.9 15:57, 1 May 2011 (UTC)Reply

Copying the whole database using mysqldump

[edit]

There's another page called 'How to move a MediaWiki Database' which describes using the mysqldump tool to move the whole lot. This should be mentioned or linked on this page somehow. I would do it myself, but having not tried it, there's some things I'm not clear on. Is the mysqldump approach better, easier, and more complete, than doing XML export then XML import? Obviously it's not always possible (e.g. if you don't have system admin access). Using the XML export/import approach, can you do other tricks such as merging two wikis? -- Harry Wood 16:44, 29 November 2005 (UTC)Reply


MediaWiki XML to PDF/LaTeX etc?

[edit]

Are there any tools which convert MediaWiki's export XML to a PDF e.g.? WikiPDF See also WikiPDF wiki and also pdf_export

There is one for LaTeX and thus PDF. MediaWiki to LaTeX

Images

[edit]

How can i dump them? I don't have access to mysqldump!

32.59.2.18 18:28, 28 December 2005 (UTC)Reply

Need them JWatley7089k (talk) 15:01, 5 February 2018 (UTC)Reply

Export User- Accounts

[edit]

Hi i export a wiki to my local wiki with dumpBackup.php and importDump.php. I have alle pages, which were createt by an user on my new local wiki. but i am missing all user accounts and all user pages. How can I export/ import these?

Special:Export on En.Wikipedia.Org Not Working

[edit]

Hi. I just tried to export the full history of "Allegations of Israeli apartheid" but I keep getting a blank result. It seems I can export other pages. Is there a restriction on exporting pages with long edit histories? If you respond, can you respond to my Wikipedia user page User:Bhouston. --64.230.121.208 22:00, 22 July 2006 (UTC)Reply

I also cannot get this to work. I really want to get the template "Otheruses4" off wikipedia and tried all kinds of different ways( on the wikipedia site). I tried the latest versions of firefox and internet explorer. I have tried random articles on this site as well and it does nothing. The page essentially does nothing, it acts as if the page just reloaded, with no change what so ever. A bug in the latest mediawiki? 221.143.3.14

Error message

[edit]

I am attempting to use Special:export on my wiki, and I get this error message:

The XML page cannot be displayed
Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later.
Invalid at the top level of the document. Error processing resource

In firefox:

XML Parsing Error: not well-formed
Location: http://www.***.com/wiki/index.php?title=Special:Export&action=submit
Line Number 2, Column 1:<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="en">
^

I am running mediawiki 1.5.7 siteground.com is hosting my mediawiki, and i can't upgrade to 1.7

Any suggestions? Odessaukrain 06:49, 2 August 2006 (UTC)Reply

Response from Byon on mIRC #mediawiki :
odess: check for extra whitespace at the beginning or end of LocalSettings.php, any extension files, or any other modified .php files
I figured it out!
There was a large space between this code:
require_once( "extensions/****.php" ); # ****



  1. If PHP's memory limit is very low, some operations may fail.
  2. ini_set( 'memory_limit', '20M' );
When I changed it to:
require_once( "extensions/****.php" ); # ****
  1. If PHP's memory limit is very low, some operations may fail.
  2. ini_set( 'memory_limit', '20M' );
Problem solved! Odessaukrain 03:06, 5 August 2006 (UTC)Reply

Export to XHTML

[edit]

No way to export to plain XHTML? Just for save or copy-paste the article to a simple html file, without several stylesheets (and hidden edit links). --80.104.165.175 10:05, 23 October 2006 (UTC)Reply

Export full histories

[edit]

I'm trying to transwiki a page from wikipedia: to StrategyWiki:. However, the page has more than 100 revisions meaning that Export doesn't give me everything. This article says that the pywikipediabot can be used to export files from wikipedia, but I haven't been able to find any mention of it anywhere. Any pointers would be greatly appreciated (or another way to export complete history rather than going 100 edits at a time). -- Prod 04:35, 6 January 2007 (UTC)Reply

special characters

[edit]

How do I export articels, that are named like "Müllabfuhr"? Export does not find the article.

Namespaces in Export Format

[edit]

Looking at the export format, Help:Export#Export_format, I can't see any specification of the namespace for that article. From what I can tell, that is simply taken from the page title such as Help:Contents. Is that correct? Reason asking is are generating text that is to be imported into a wiki using 3rd party software and want to ensure that it is imported into a custom namespace. So, are I correct in assuming that there doesn't have to be any ID number or something associated with the namespace in the XML file that is imported? --Dr DBW 01:32, 6 June 2007 (UTC)Reply

To answer my own question, yes you just specify that at the start of the page title and it is all taken care of. Of course, the namespace has to be defined and all, but you don't need the ID number etc. --Dr DBW 23:19, 18 June 2007 (UTC)Reply

Which Tags Required in Export Format

[edit]

Would be handy to know which fields in the export format is required for it to work, i.e. what I am interested in is the bare minimum required to get a valid XML file that will get imported into the wiki. --Dr DBW 01:46, 6 June 2007 (UTC)Reply

I have reduced it to the following tags, and it appears to work fine: --Dr DBW 02:24, 6 June 2007 (UTC)Reply
    • <mediawiki></mediawiki>
    • <page></page>
    • <title></title>
    • <revision></revision>
    • <timestamp></timestamp>
    • <text></text>

Easiest/fastest way to export main namespace articles

[edit]

Hi,

I would like to export only the pages from the main name space from a wikipedia project to use for a school assignment (information retrieval and machine learning).

What is the easiest/fastest way to do this?

I already downloaded a xml page dump but this contains all namespaces.

--GrandiJoos 08:04, 5 October 2007 (UTC)Reply

How to export a full version

[edit]

I've listed the instructions for how to export a full version of an article using Special:Export.

Use http://en.wikipedia.org/w/index.php?title=Special:Export&pages=ARTICLE_NAME&history=1&action=submit
Save the file as something.xml
Use a find/replace feature of a text editor and find all "</username>" and replace it with "@en.wikipedia.org</username>"
Save. You should now be ready to import the file via Special:Import on another MediaWiki wiki.


From what I've seen this is pretty much not documented anywhere. I'll take a crack at updating the main page in a bit. There's a few things that need to be updated here. -- Ned Scott 03:06, 2 July 2008 (UTC)Reply

Today I see "Exporting the full history of pages through this form has been disabled due to performance reasons" when exporting. I wonder why.--Jusjih 19:07, 28 September 2008 (UTC)Reply
My mentioned problem seems to be gone now. I just wonder why the problem once happened.--Jusjih 01:34, 1 October 2008 (UTC)Reply


I don't think this *always* exports a full version. It clips exports at 1000 history entries. The main page here says 100, but that may be old or a type-o? I'm leery of updating it, but it contradicts the Special:Export page information - "Full history exports are limited to 1000 revisions." (at http://en.wikipedia.org/wiki/Special:Export). Does anyone know a way around that limit without resorting to curl (which I can't get to work, likely a U53R error - or however that joke goes). --Nobboddoddy 11:05, 27 May 2010 (UTC)Reply

How can we export the result of a linksearch result page?

[edit]

any clues? --Anshuk 09:27, 13 October 2008 (UTC)Reply

dumpBackup.php

[edit]

Kann mal jemand den Link erneuern? --194.114.62.70 13:09, 5 November 2009 (UTC)Reply


is there a way to get dumpBackup.php to *NOT* dump IDs?

Export Category (Content/History) via XML Export

[edit]

As the export function allows to export content and history from pages and even allows the export templates. We are wondering why the export function does not have an option to export category content and history (as it only exports the link objects) through XML.

As the category is an taxonomy/ontology object, it could (in our case we have) have descriptive text about how and why to use this category and support a common understanding within the wiki about this entity.

Even with version 1.16beta1 it is still not possible to export content from categories. Does anyone have a solution, besides making a sqldump. -- 25 March 2010 / 213.163.84.215

I'm searching for this feature too :(

Using History Flow with Special Export page

[edit]

Hi, I am using History Flow (IBM) to see the history of editing article. It designed to get data from [[2]]. However the special export page made the "only current revision " as automatic choice.

How can I get the entire history of article on this stage. Can I change the set of special export?

Thanks for help! Zeyi 15:44, 24 May 2010 (UTC)Reply

Exporting page logs?

[edit]

Is there a way to export logs? I'd like to export one of the specific logs (rather than the entire page log datadump), but am not sure if there's currently a way to do this. Thanks. --Dfinzer —Preceding undated comment added 06:08, 29 June 2010 (UTC).Reply

I second this. The Special:Log page specifies some events using obscure units—such as “months”—and there is no way to get the precise meaning other than to examine the source code. Incnis Mrsi (talk) 12:59, 8 April 2019 (UTC)Reply
mw:API:Logevents resolved my problem. Unsure what exactly did Dfinzer want to achieve. Incnis Mrsi (talk) 13:12, 8 April 2019 (UTC)Reply

extension for Special:Export because current page is so terrible?

[edit]

I use special:export a lot and I am appealed at how terrible it is to get page names, these steps are complex:


# You can achieve that relatively quickly if you paste the names into say MS Word - use paste special as unformatted text - then open the replace function (CTRL+h), entering ^t in Find what, entering ^p in Replace with and then hitting Replace All button. (This doesn't seem to work - there are no tabs between the page names.)
# The text editor Vim also allows for a quick way to fix line breaks: after pasting the whole list, run the command :1,$s/\t/\r/g to replace all tabs by carriage returns and then :1,$s/^\n//g to remove every line containing only a newline character.
# Another approach is to copy the formated text into any editor exposing the html. Remove all <tr> and </tr> tags and replace all <td> tags to <tr><td> and </td> tags to </td></tr>. the html will then be parsed into the needed format.

Does anyone know an extension which gives all results in ONE COLUMN, on ONE PAGE and with the namespace name? For example Template:Dirt, instead of Dirt. Adamtheclown 23:24, 16 December 2010 (UTC)Reply

Move to MediaWiki.org?

[edit]

Any reason not to move this to MediaWiki.org? --Varnent 22:31, 4 January 2012 (UTC)Reply

dumpBackup.php & MW 1.7.1

[edit]

According to the page for this, and the download page, dumpBackup.php is supposed to be compatible with MW ver 1.5 and later. Using MW ver 1.7.1, I got endless errors, using php5 dumpBackup.php fixed a lot of that. Altering the backup.inc & commandline.inc directories within dumpBackup.php so it could be run from the root MW directory was necessary. Trying to run it from either maintenance or includes produces directory errors in Localsettings.php. If I fix those in LocalSettings, it breaks the wiki.

Eventually it boiled down /includes/Export.php not having the proper line for "Class Constant 'TEXT'". Specific message was "Fatal error: Undefined class constant 'TEXT' in /xxx/xxx/xxx/dumpBackup.php on line 72" This was the hardest to solve, and the real crux of the issue. The version of Export.php included with 1.7.1, and probably prior versions does not include the right code to run dumpBackup.php. Replacing it with Export.php from the most recent version of MW allowed dumpBackup to execute, but I am not certain if the output is exactly what it is supposed to be since I am mixing versions of files at this point. There is an error "Cannot modify header information" or something at the top of the dump. Skimming through the output seems to have produced the contents of the wiki, however. Toastysoul (talk) 21:18, 30 May 2012 (UTC)Reply

Alternative script, in PHP, for getting all page titles

[edit]

If you want to, say, export all of Wikipedia's templates, and need the page titles, this PHP code could do the trick. It polls the API (mw:API:Allpages) to get that data, and then puts it in output.txt.

<?php
ini_set('user_agent', 'User-Agent: MyCoolTool (http://example.com/MyCoolToolPage/)');
$fp = fopen("output.txt","w");
do {
    $lastTitle = $thisTitle;
    $url = 'http://en.wikipedia.org/w/api.php?action=query&list=allpages&aplimit=500&apnamespace=10&format=xml' . $apfrom;
    $str = file_get_contents($url);
    $curPos = 1;
    $count = 0;
    while ( $curPos != false ) {
        $curPos = strpos ( $str , 'title="' , $curPos );
        if ( $curPos != false ) {
            $nextQuote = strpos ( $str, '"', $curPos + 8 );
            $thisTitle = substr ( $str, $curPos + 7, $nextQuote - $curPos - 7 );
            $count++;
            if ( $thisTitle != $lastTitle ) {
                echo $count.' '.$thisTitle . "\n";
                fwrite($fp,html_entity_decode ( $thisTitle , ENT_QUOTES) ."\n");
            }
            $curPos++;
        }
    }
    $truncTitle = rawurlencode ( substr ( $thisTitle, 9 ) );
    $apfrom = "&apfrom=$truncTitle";
} while ( $lastTitle != $thisTitle );
fclose($fp);

If anyone knows of a more elegant way to parse the XML, feel free to provide some alternative code. This worked pretty well, but the parsing methodology is obviously not ideal. Leucosticte (talk) 07:40, 23 June 2012 (UTC)Reply

Export Pages of a category and subcategories into PDF with mwc2pdf.php

[edit]

Hi there, I worte a little script in PHP that allows to export all pages of a category including all pages of subcategories into a single PDF-file. The script is called "mwc2pdf.php".

mwc2pdf uses MediaWiki's "api.php" to collect all data and create a "pagetree". mwc2pdf prints out every item of that pagetree into a single pdf-file using "wkhtmltopdf". It than combines all single pdf-files into one single pdf-file called "MWC2PDF.pdf" using "pdftk".

The code is available on github: https://github.com/produnis/myscripts/blob/master/PHP/mwc2pdf.php

Produnis (talk) 13:40, 18 March 2015 (UTC)Reply