Thursday, October 06, 2005

mw2html -- Export Mediawiki to static, traditional website

mw2html is a Python program which exports a Mediawiki site to a traditional-looking HTML site. No indices are generated, and Wikipedia sidebars and extra formatting are removed to give a simple, streamlined site (you can substitute in your own sidebars if you wish). The outputted HTML source code is Monobook-specific and rather verbose at the moment. But the sites thus exported don't look bad. (Source code).

27 Comments:

Anonymous Donato said...

please add more documentation, I want to use your script, im not good at python and PHP. but iM an experienced programmer (java/asp)so I know the concept. please help.

Many Thanks!

7:10 AM  
Blogger Connelly said...

First, install Python from [1]. Next, verify that Python is in your system shell's PATH by typing python *enter* from the shell. If successful, this should place you in a Python interactive interpreter. If this fails, modify the PATH environment variable to include the Python binary directory (on Windows this is C:\Python2.x\).

Next, save mw2html.py somewhere. Install your copy of Mediawiki [2]. Now at the command line, type (on one line, without the % character)

% python mw2html.py http://yourwikiurl.com/ outdir

Browse the output directory to see the generated HTML. You can use python mw2html.py with no arguments to get a list of available command line options.

This advice is pretty general; if you have a more specific problem let me know.

9:55 PM  
Anonymous Andy Davidson said...

I just tried mw2html.py to back up a fairly new mediawiki. I ran 'python mw2html.py http://127/0/0/1/wiki /tmp/wiki'

File "mw2html.py", line 218

SyntaxError: EOL while scanning single-quoted string

I can't include the call (doc.replace) because blogger thinks I am trying to use server side scripting.

Do yo have a spare clue as to what I might be doing wrong? If you need more info, let me know and I can send email.

4:30 PM  
Anonymous Andy Davidson said...

Never mind. I found the proble: I had Privoxy running and it replaced some open commands. Ugh.

I also had to remove an assertion at line 265, but it now runs to completion.

3:18 PM  
Blogger Forrest O. said...

ImportError: No module named textwrap

Any hints on how I might be able to get around that?

1:52 PM  
Anonymous Anonymous said...

This worked perfectly for me, where many other methods had failed. Thank you very much.

8:16 AM  
Anonymous Michael Kennedy said...

Is there a way for this to access password protected wikis? I'd like to grab an output of the one I use for work clients that I have passworded so I can store it on my key for those days I don't have easy web access. Thanks.

1:05 AM  
Anonymous Anonymous said...

It works very well.

I am modifying your code to remove the footer entirely and make it suitable for generating software documentation and such, from wiki.

However, I have a little problem. The script freezes for minutes right after the start, and at some files during generation of output directory. Task Manager shows no activity. Please let me know if there are any workaround.

8:47 AM  
Anonymous Anonymous said...

Hi,

I'm a newbie. Installed Python. Its in my path, but python *enter* appears to be invalid...and so are the other commands...CAn you help?

2:02 AM  
Anonymous Anonymous said...

Hi

I tried that html-exporter on a fresh mediawiki 2.6.8 installation but there is a weird problem:
When I start the script, the whole network connection goes down. In another shell I started a persistent ping to some host and as soon as i start the exporter i get ping timeouts and can't surf on the net anymore. If I cancel the exporter it works again. My network monitor shows 0% net activity though.

After a _long_ time the html was in the outdir but bad formatted because it couldn't fetch some files, like:

Error opening: http://censoredurl/skins/monobook/main.css?7

Error opening: http://censoredurl/skins/common/IEFixes.js

Those are accessible via browser without any problems.

My System:
WinXP SP2, Python 2.4, htmldata 1.1.0, mediawiki 2.6.8, no proxy, wiki is not locally installed

Any ideas?

6:32 AM  
Anonymous Niels de Vos said...

Hi Connely, thanks for the really cool script! Unfortunately there seems to be a little error(?) in the function parse_css. In one of my .css files there is a reference to a w3c.com-page. Here's a patch to fix this, just like the parse_html function does.

Cheers!

--- mw2html.py.orig 2006-09-03 15:04:21.000000000 +0200
+++ mw2html.py 2006-09-03 14:55:53.000000000 +0200
@@ -480,10 +480,13 @@

L = htmldata.urlextract(doc, url, 'text/css')
for item in L:
- # Store url locally.
u = item.url
- new_urls += [u]
- item.url = url_to_relative(u, url, config)
+ if should_follow(url, u):
+ # Store url locally.
+ new_urls += [u]
+ item.url = url_to_relative(u, url, config)
+ else:
+ item.url = rewrite_external_url(item.url, config)

newdoc = htmldata.urljoin(doc, L)
newdoc = post_css_transform(newdoc, url, config)

6:23 AM  
Anonymous Anonymous said...

When I run command :
python mw2html.py http://localhost/mediawiki/ ~/wiki1

And open main_page.html, however, the sidebar of wiki was lost so if I browse to child article, I cannot back to main page. Could you guide me how to export sidebar of wiki into sidebar.html
Thank you

2:03 AM  
Anonymous Anonymous said...

Hello,

Thanks to your post
about using mw2html to export mediawiki to html :) it is such a great
help to me.

However, I encounter a problem of how to export the left menu of wiki in
each html page exported. Therefore, it is hard to navigate from each
html page to others.

(Left side menu includes:
* Main Page
* Community portal
* Current events
* Recent changes
* Random page
* Help
* Donations
...
)

I try to use the command with option -l to export left side menu bar,
but I fail. so could you help me how to export left side menu bar of
wiki in each html page

Thank you very much and looking forward to hearing from you.

5:30 PM  
Blogger tam said...

Hello,

Thanks to your post

about using mw2html to export mediawiki to html :) it is such a great
help to me.

However, I encounter a problem of how to export the left menu of wiki in
each html page exported. Therefore, it is hard to navigate from each
html page to others.

(Left side menu includes:
* Main Page
* Community portal
* Current events
* Recent changes
* Random page
* Help
* Donations
...
)

I try to use the command with option -l to export left side menu bar,
but I fail. so could you help me how to export left side menu bar of
wiki in each html page

Thank you very much and looking forward to hearing from you.

5:32 PM  
Blogger Connelly said...

With the left sidebar option you pass an HTML filename as an argument, and that file's contents are pasted into the "left sidebar" region. Thus you can make a custom left sidebar HTML file with whichever links you desire.

11:15 PM  
Anonymous Anonymous said...

Is there any way to restrict the link-following depth, along the lines of wget? Or better yet, to prevent any link following outside of a given URL?

For example, we have lots of stuff on

oursite.com/wiki
that we want to save,

and stuff on
oursite.com/
that we don't.

Thanks for providing a very useful piece of software!

10:33 AM  
Anonymous Anonymous said...

Superb work, solved a big headache. Easy to install worked without a hitch.

3:45 AM  
Blogger Marija said...

fantastic - everything worked fine from the start

Many thanks for the great work !

3:04 AM  
Anonymous Vlad said...

I was sure that I need to write my own WikiMedia converter, before I find this one.

Thanks a lot!

3:53 AM  
Blogger Dave said...

I have a mediawiki installed on my local intranet but I can't seem to get the script working. It always stalls out and gives an error. Is there a problem being behind a proxy? I tried removing all of the external links in case that was the problem, but I can't seem to get the script to finish. Any tips? The server is running IIS 6.0 w/ MySQL.

9:29 AM  
Anonymous Kryten said...

Hey man,

This hit the spot for my needs.

I was looking for a way to build a static copy of an internal wiki - and this tool got the job done. My MacBook running Leopard already had Python - so it was almost zero installation to use.

Good stuff, thanks for posting it...

2:46 PM  
Blogger Dan said...

Hello Connelly,
First, thank you for creating this tool. I hope you are still available to answer questions about it. I am running mw2html on the linux box that hosts my wiki. It runs great, however after it has processed what I think is a fair chunk of the pages, I get an error. I'm not a Python user so I'm at a loss as to how to proceed. The traceback I get is as follows:

Traceback (most recent call last):
File "mw2html.py", line 742, in ?
main()
File "mw2html.py", line 738, in main
run(config)
File "mw2html.py", line 600, in run
(doc, new_urls) = parse_css(doc, url, config)
File "mw2html.py", line 489, in parse_css
newdoc = post_css_transform(newdoc, url, config)
File "mw2html.py", line 286, in post_css_transform
doc = monobook_hack_skin_css(doc, url, config)
File "mw2html.py", line 265, in monobook_hack_skin_css
assert c1 in doc
AssertionError

Is there any way of knowing which page it may have been processing when it hit this error?

Thanks in advance,
Dan

8:21 AM  
Anonymous Anonymous said...

Since this is still a top Google hit for making a static copy of a MW, here's my 2¢. Run the command with no parameters to see the options and try them! I had more success using the disable skin hack option.

11:11 AM  
Anonymous Ron said...

Thank you for the wonderful script. My personal wiki is protected by http authentication. This fix allows you to download your wiki even when it is protected in such a way:

(add this code just before the 'while' statement in the run function)

password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
username = raw_input("Username: ")
password = raw_input("Password: ")
password_mgr.add_password(None, config.rooturl, username, password)
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)

4:42 AM  
Blogger dghnfgj said...

This momentousdecree wow gold came as a great beacon gold in wow light of hope buy wow gold to millions of negroslaves wow gold kaufen who had been seared in the flames of withering injustice.maplestory mesos it came as a joyous daybreak to end the long night ofcaptivity.but one hundred years later,maplestory money we must face the tragic fact thatthe negro is still not free.maple money one hundred years later,sell wow gold the lifeof the negro is still sadly crippled by the manacles ofsegregation and the chains of discrimination. one hundred yearslater,maple story money the negro lives on a lonely island of poverty in themidst of a vast ocean of material prosperity.wow powerleveling one hundred yearslater,maple story power leveling the negro is still languishing in the corners of americansociety and finds himself an exile in his own land. so we havecome here today to dramatize wow powerleveln an appalling condition.in a ms mesos sense we have come to our nation''s capital to cash a check.when the architects of our republic wow powerleveln wrote the magnificent wordsof the constitution and the declaration of independence, theywere signing a promissory note maplestory power leveling to which every american was tofall heir. this note was a promise that all men would beguarranteed the inalienable rights of life, liberty, and thepursuit of happiness.it is obvious today that america has defaulted on thispromissory note insofar as her citizens of color are concerned.instead of honoring this sacred obligation, america has giventhe negro people a bad check which has come back markedinsufficient funds.justice is bankrupt. we refuse to believe that there areinsufficient funds in the great vaults of opportunity of thisnation. so we have come to cash this check -- a check that willgive us upon demand the riches of freedom and the security ofjustice. we have also come to this hallowed spot to remindamerica of the fierce urgency of now

10:41 PM  
Blogger Domos said...

Thanks, for this exelent program, it is what i was looking for.

1:47 PM  
Anonymous carlo said...

Italy, 26 June 2009 (just to show the date!)

I found your wonderful script after so much searching in Internet. It solves the problem to convert mediawiki to html very elegantly. My compliments!
By the way: I have found a (dirty) trick to convert a section only of the wiki: the interesting pages only are set to a determined Category. Then I "export by category" (there are nice extensions to do this) to a void wiki server, by means of a XML export file. From the filled wiki server, now with only the category of interest, I extract all with your script.

Carlo

10:40 AM  

Post a Comment

<< Home