Sunday, October 02, 2005

Python htmldata module

Extract and modify URLs in an HTML/XHTML/CSS document. Translate HTML/XHTML documents to and from list data structures. Use cases: robots, proxy CGI scripts, filtering of HTML and CSS, and flexible wget-like mirroring. (Module website).

2 Comments:

Blogger orz said...

Hi...
Where I can get htmldata source code? The htmldata web site seems down.

8:49 PM  
Anonymous Anonymous said...

Hi, how do you install this module in python (note that I know nothing of this language and I tried already the basic commands I could find such as: python htmldata.py install or python htmldata.py install --prefix=/usr)

here is the result which seems OK (i.e. no error identified)
Unit tests:
_remove_comments: OK
_shlex_split: OK
_tag_dict: OK
_tuple_replace: OK
tagextract*: OK
tagextract (unicode)*: OK
urlextract*: OK
urlextract (unicode)*: OK

* The corresponding join method has been tested as well.

however, when I try to use mw2html, it still requires the install of htmldata.

Could you please help.
regards

9:02 AM  

Post a Comment

<< Home