mw2html -- Export Mediawiki to static, traditional website
mw2html is a Python program which exports a Mediawiki site to a traditional-looking HTML site. No indices are generated, and Wikipedia sidebars and extra formatting are removed to give a simple, streamlined site (you can substitute in your own sidebars if you wish). The outputted HTML source code is Monobook-specific and rather verbose at the moment. But the sites thus exported don't look bad. (Source code).

27 Comments:
please add more documentation, I want to use your script, im not good at python and PHP. but iM an experienced programmer (java/asp)so I know the concept. please help.
Many Thanks!
First, install Python from [1]. Next, verify that Python is in your system shell's PATH by typing python *enter* from the shell. If successful, this should place you in a Python interactive interpreter. If this fails, modify the PATH environment variable to include the Python binary directory (on Windows this is C:\Python2.x\).
Next, save mw2html.py somewhere. Install your copy of Mediawiki [2]. Now at the command line, type (on one line, without the % character)
% python mw2html.py http://yourwikiurl.com/ outdir
Browse the output directory to see the generated HTML. You can use python mw2html.py with no arguments to get a list of available command line options.
This advice is pretty general; if you have a more specific problem let me know.
I just tried mw2html.py to back up a fairly new mediawiki. I ran 'python mw2html.py http://127/0/0/1/wiki /tmp/wiki'
File "mw2html.py", line 218
SyntaxError: EOL while scanning single-quoted string
I can't include the call (doc.replace) because blogger thinks I am trying to use server side scripting.
Do yo have a spare clue as to what I might be doing wrong? If you need more info, let me know and I can send email.
Never mind. I found the proble: I had Privoxy running and it replaced some open commands. Ugh.
I also had to remove an assertion at line 265, but it now runs to completion.
ImportError: No module named textwrap
Any hints on how I might be able to get around that?
This worked perfectly for me, where many other methods had failed. Thank you very much.
Is there a way for this to access password protected wikis? I'd like to grab an output of the one I use for work clients that I have passworded so I can store it on my key for those days I don't have easy web access. Thanks.
It works very well.
I am modifying your code to remove the footer entirely and make it suitable for generating software documentation and such, from wiki.
However, I have a little problem. The script freezes for minutes right after the start, and at some files during generation of output directory. Task Manager shows no activity. Please let me know if there are any workaround.
Hi,
I'm a newbie. Installed Python. Its in my path, but python *enter* appears to be invalid...and so are the other commands...CAn you help?
Hi
I tried that html-exporter on a fresh mediawiki 2.6.8 installation but there is a weird problem:
When I start the script, the whole network connection goes down. In another shell I started a persistent ping to some host and as soon as i start the exporter i get ping timeouts and can't surf on the net anymore. If I cancel the exporter it works again. My network monitor shows 0% net activity though.
After a _long_ time the html was in the outdir but bad formatted because it couldn't fetch some files, like:
Error opening: http://censoredurl/skins/monobook/main.css?7
Error opening: http://censoredurl/skins/common/IEFixes.js
Those are accessible via browser without any problems.
My System:
WinXP SP2, Python 2.4, htmldata 1.1.0, mediawiki 2.6.8, no proxy, wiki is not locally installed
Any ideas?
Hi Connely, thanks for the really cool script! Unfortunately there seems to be a little error(?) in the function parse_css. In one of my .css files there is a reference to a w3c.com-page. Here's a patch to fix this, just like the parse_html function does.
Cheers!
--- mw2html.py.orig 2006-09-03 15:04:21.000000000 +0200
+++ mw2html.py 2006-09-03 14:55:53.000000000 +0200
@@ -480,10 +480,13 @@
L = htmldata.urlextract(doc, url, 'text/css')
for item in L:
- # Store url locally.
u = item.url
- new_urls += [u]
- item.url = url_to_relative(u, url, config)
+ if should_follow(url, u):
+ # Store url locally.
+ new_urls += [u]
+ item.url = url_to_relative(u, url, config)
+ else:
+ item.url = rewrite_external_url(item.url, config)
newdoc = htmldata.urljoin(doc, L)
newdoc = post_css_transform(newdoc, url, config)
When I run command :
python mw2html.py http://localhost/mediawiki/ ~/wiki1
And open main_page.html, however, the sidebar of wiki was lost so if I browse to child article, I cannot back to main page. Could you guide me how to export sidebar of wiki into sidebar.html
Thank you
Hello,
Thanks to your post
about using mw2html to export mediawiki to html :) it is such a great
help to me.
However, I encounter a problem of how to export the left menu of wiki in
each html page exported. Therefore, it is hard to navigate from each
html page to others.
(Left side menu includes:
* Main Page
* Community portal
* Current events
* Recent changes
* Random page
* Help
* Donations
...
)
I try to use the command with option -l to export left side menu bar,
but I fail. so could you help me how to export left side menu bar of
wiki in each html page
Thank you very much and looking forward to hearing from you.
Hello,
Thanks to your post
about using mw2html to export mediawiki to html :) it is such a great
help to me.
However, I encounter a problem of how to export the left menu of wiki in
each html page exported. Therefore, it is hard to navigate from each
html page to others.
(Left side menu includes:
* Main Page
* Community portal
* Current events
* Recent changes
* Random page
* Help
* Donations
...
)
I try to use the command with option -l to export left side menu bar,
but I fail. so could you help me how to export left side menu bar of
wiki in each html page
Thank you very much and looking forward to hearing from you.
With the left sidebar option you pass an HTML filename as an argument, and that file's contents are pasted into the "left sidebar" region. Thus you can make a custom left sidebar HTML file with whichever links you desire.
Is there any way to restrict the link-following depth, along the lines of wget? Or better yet, to prevent any link following outside of a given URL?
For example, we have lots of stuff on
oursite.com/wiki
that we want to save,
and stuff on
oursite.com/
that we don't.
Thanks for providing a very useful piece of software!
Superb work, solved a big headache. Easy to install worked without a hitch.
fantastic - everything worked fine from the start
Many thanks for the great work !
I was sure that I need to write my own WikiMedia converter, before I find this one.
Thanks a lot!
I have a mediawiki installed on my local intranet but I can't seem to get the script working. It always stalls out and gives an error. Is there a problem being behind a proxy? I tried removing all of the external links in case that was the problem, but I can't seem to get the script to finish. Any tips? The server is running IIS 6.0 w/ MySQL.
Hey man,
This hit the spot for my needs.
I was looking for a way to build a static copy of an internal wiki - and this tool got the job done. My MacBook running Leopard already had Python - so it was almost zero installation to use.
Good stuff, thanks for posting it...
Hello Connelly,
First, thank you for creating this tool. I hope you are still available to answer questions about it. I am running mw2html on the linux box that hosts my wiki. It runs great, however after it has processed what I think is a fair chunk of the pages, I get an error. I'm not a Python user so I'm at a loss as to how to proceed. The traceback I get is as follows:
Traceback (most recent call last):
File "mw2html.py", line 742, in ?
main()
File "mw2html.py", line 738, in main
run(config)
File "mw2html.py", line 600, in run
(doc, new_urls) = parse_css(doc, url, config)
File "mw2html.py", line 489, in parse_css
newdoc = post_css_transform(newdoc, url, config)
File "mw2html.py", line 286, in post_css_transform
doc = monobook_hack_skin_css(doc, url, config)
File "mw2html.py", line 265, in monobook_hack_skin_css
assert c1 in doc
AssertionError
Is there any way of knowing which page it may have been processing when it hit this error?
Thanks in advance,
Dan
Since this is still a top Google hit for making a static copy of a MW, here's my 2¢. Run the command with no parameters to see the options and try them! I had more success using the disable skin hack option.
Thank you for the wonderful script. My personal wiki is protected by http authentication. This fix allows you to download your wiki even when it is protected in such a way:
(add this code just before the 'while' statement in the run function)
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
username = raw_input("Username: ")
password = raw_input("Password: ")
password_mgr.add_password(None, config.rooturl, username, password)
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
This momentousdecree wow gold came as a great beacon gold in wow light of hope buy wow gold to millions of negroslaves wow gold kaufen who had been seared in the flames of withering injustice.maplestory mesos it came as a joyous daybreak to end the long night ofcaptivity.but one hundred years later,maplestory money we must face the tragic fact thatthe negro is still not free.maple money one hundred years later,sell wow gold the lifeof the negro is still sadly crippled by the manacles ofsegregation and the chains of discrimination. one hundred yearslater,maple story money the negro lives on a lonely island of poverty in themidst of a vast ocean of material prosperity.wow powerleveling one hundred yearslater,maple story power leveling the negro is still languishing in the corners of americansociety and finds himself an exile in his own land. so we havecome here today to dramatize wow powerleveln an appalling condition.in a ms mesos sense we have come to our nation''s capital to cash a check.when the architects of our republic wow powerleveln wrote the magnificent wordsof the constitution and the declaration of independence, theywere signing a promissory note maplestory power leveling to which every american was tofall heir. this note was a promise that all men would beguarranteed the inalienable rights of life, liberty, and thepursuit of happiness.it is obvious today that america has defaulted on thispromissory note insofar as her citizens of color are concerned.instead of honoring this sacred obligation, america has giventhe negro people a bad check which has come back markedinsufficient funds.justice is bankrupt. we refuse to believe that there areinsufficient funds in the great vaults of opportunity of thisnation. so we have come to cash this check -- a check that willgive us upon demand the riches of freedom and the security ofjustice. we have also come to this hallowed spot to remindamerica of the fierce urgency of now
Thanks, for this exelent program, it is what i was looking for.
Italy, 26 June 2009 (just to show the date!)
I found your wonderful script after so much searching in Internet. It solves the problem to convert mediawiki to html very elegantly. My compliments!
By the way: I have found a (dirty) trick to convert a section only of the wiki: the interesting pages only are set to a determined Category. Then I "export by category" (there are nice extensions to do this) to a void wiki server, by means of a XML export file. From the filled wiki server, now with only the category of interest, I extract all with your script.
Carlo
Post a Comment
<< Home