The ihMarkup module provides two low-level types for
creating, examining, and editing HTML markup trees. These markup
trees represent the underlying structure and content of an HTML
document, in a form that can be easily manipulated by a program.
The basic component of this representation is the
<HEAD>
,
<IMG>
, or <P>
), piece of
text, comment, etc. Each node can have zero or more children,
corresponding to the
For example, consider the following simple HTML document:
<HTML> <HEAD> <TITLE>Example Document</TITLE> </HEAD> <BODY> <H1>Example Title</H1> <P>Example paragraph.</P> <HR> <ADDRESS>Example address</ADDRESS> </BODY> </HTML>
The tree representation of this that you would manipulate with these classes looks like this:
<HTML>+-+-+<HEAD>+----+<TITLE>+----+|"Example Document"| | +-+<BODY>+-+----+<H1>+-----+|"Example Title"| | +----+<P>+------+|"Example paragraph."| | +----+<HR> | +--+<ADDRESS>+--+|"Example address."|
Or, in a perhaps more familiar form:
<HTML> | +-------------------------+---------------+ | | <HEAD> <BODY> | | <TITLE> +-------------------+----+-----+---------+ | | | | | "Example Document" <H1> <P> <HR> <ADDRESS> | | | "Example Title" "Example paragraph." "Example address."
This class does not give direct access to the tree at the node
level, however; instead, it defines a type
HTMLTree
that
consists of a complete tree -- which may be only one node, but more
often consists of some larger sub-tree. A second type,
HTMLCursor
represents a particular location/node in that tree, and is used
perform the actual manipulation of the tree.
In addition, this class defines various helper functions for performing common operations on these HTML trees.
Tag
HTMLTag
object,
which is used to get the identifiers for HTML tags. For example,
"Tag.A
" is the browser's identifier for the
<A>
tag,
"Tag.TITLE
" is the browser's identifier for the
<TITLE>
tag, etc.
There are no global exceptions defined by this module.
FillinMarkup(tree,
dict)
HTMLTree
that
is to be modified.
dictionary
containing (keyword,value) pairs, that map to the tag IDs to
modify and the markup (as an ASCII
string
) to insert at that point.
HTMLTree
representing:
<HTML> <HEAD> <TITLE ID=TITLE> "Document <BODY> <H1 ID=TITLE> <P ID=CREATOR> "made this thing. <HR> <P ID=PICTURE> "This is me.
And arg is the
dictionary
:
{ "TITLE": 'Spiffy iHTML Example', "CREATOR": 'Dianne Hackborn', "PICTURE": '<IMG SRC="dianne.gif" ALT="_.oo_Q_Q_oo._">' }
Then calling FillinMarkup()
with these
two arguments will result in tree being
modified to look like:
<HTML> <HEAD> <TITLE ID=TITLE> "Spiffy iHTML Example "Document <BODY> <H1 ID=TITLE> "Spiffy iHTML Example <P ID=CREATOR> "Dianne Hackborn "made this thing. <HR> <P ID=PICTURE> <IMG SRC="dianne.gif" ALT="_.oo_Q_Q_oo._"> "This is me.
This function can be used to perform complex
insertion operations on a HTMLTree
. The
given dictionary
defines a mapping from tag
identifier names (i.e., a tag's ID
attribute) to an ASCII string
of HTML
markup to insert into the tree immediate after (or, more formally,
as the first child of) any tags that have that
ID
. The string
is
converted into a HTMLTree
at each point it
is inserted into tree by calling
NewHTMLTree()
.
NewHTMLCursor()
NewHTMLCursor(tree)
NewHTMLCursor(cursor)
HTMLTree
that the
cursor is to be on.
HTMLCursor
representing the place where this cursor should also be.
HTMLCursor
object.
This is the function used to create new
HTMLCursor
objects.
It as one optional parameter that, if supplied, is the initial
location of the cursor. It can be one of two types: if it is a
HTMLTree
, then the cursor begins at the
first node of that tree; otherwise, if it is a
HTMLCursor
, then the cursor begins on the
same tree and node as the given cursor. If no parameter is
supplied, the cursor begins on no tree.
NewHTMLTree()
NewHTMLTree(markup)
string
of
HTML markup.
HTMLTree
object.
This is the function used to create new
HTMLTree
objects.
It as one optional parameter that, if supplied, is an ASCII
string
of HTML markup; this is run through
the browser's HTML parser, and the resulting parse tree used as the
initial HTMLTree
object. If not supplied,
the HTMLTree
starts out empty.
HTMLTag
No methods are defined by this type.
No member variables are (explicitly) defined by this type, but see below.
The HTMLTag
type provides
access to the host browser's internal identifiers for its HTML tree
nodes. These identifiers may be accessed as if they were member
variables of the object. Their values are an opaque object whose
type is browser-dependent; the only thing that can be done with
them is to compare them to other tag values.
Unlike other Python identifiers, the tags defined by this class
are case-insensitive. This means, for example, that
"Tag.TITLE
" is the same as
"Tag.title
",
"Tag.Title
", or even
"Tag.tItLe
".
In addition to the expected HTML tag names
(HTML
, P
,
IMG
, STRONG
, etc.),
there are two special identifiers defined:
>text
>unknown
In order to retrieve these values, Python's
getattr()
function should be use. For
example,
textTag = getattr(ihMarkup.Tag,'>text')
HTMLTree
This type encapsulates zero or more nodes in an HTML parse tree.
AddCursor(cursor)
HTMLCursor
to the tree.
HTMLCursor
to be added to the tree.
RemCursor(cursor)
HTMLCursor
from the tree.
HTMLCursor
that is currently on the tree.
No member variables are defined by this type.
The HTMLTree
represents
some piece of an HTML parse tree, consisting of zero or more
nodes. This class itself does not allow the tree to be manipulated
or examined; instead it allows an
HTMLCursor
to be
added to the tree, which allows the individual nodes that make up
the tree to be examined, and through which operations on the tree
are performed.
HTMLCursor
Locates particular nodes in an
HTMLTree
and serves
as a proxy through which the tree is manipulated.
MoveChild()
MoveChild(pos)
HTMLCursor
or
HTMLTree
at which to position
this cursor, before it is moved.
HTMLCursor
in
its new position, or None
if its
current node has no children.
MoveNext()
MoveNext(pos)
HTMLCursor
or
HTMLTree
at which to position
this cursor, before it is moved.
HTMLCursor
in
its new position, or None
if there is
no node after it.
MoveNextDepth()
MoveNextDepth(pos)
HTMLCursor
or
HTMLTree
at which to position
this cursor, before it is moved.
HTMLCursor
in
its new position, or None
if there is
no node after it.
MoveNextID(id)
ID
attribute.
string
representing the ID to search for. This is a
case-sensitive search.
HTMLCursor
in
its new position, or None
if there is
no node after it with the given ID
.
MoveNextDepth()
, but only stops the cursor at
nodes that have the given
ID
attribute value.
MoveNextTag(tag)
Tag
identifier.
string
representing the
Tag
identifier to search for.
HTMLCursor
in
its new position, or None
if there is
no node after it with the given Tag
.
MoveNextDepth()
, but only stops the cursor at
nodes that are the same type as the given
Tag
identifier.
MoveParent()
MoveParent(pos)
HTMLCursor
or
HTMLTree
at which to position
this cursor, before it is moved.
HTMLCursor
in
its new position, or None
if there is
no parent to its current node.
MovePrev()
MovePrev(pos)
HTMLCursor
or
HTMLTree
at which to position
this cursor, before it is moved.
HTMLCursor
in
its new position, or None
if there is
no node after it.
MovePrevDepth()
MovePrevDepth(pos)
HTMLCursor
or
HTMLTree
at which to position
this cursor, before it is moved.
HTMLCursor
in
its new position, or None
if there is
no node before it.
SetPos()
SetPos(tree)
SetPos(cursor)
HTMLTree
on
which to position the cursor.
HTMLCursor
at
which this cursor should also be positioned.
HTMLCursor
.
HTMLTree
, then the cursor is located
at
the first node of that tree; otherwise, if it is a
HTMLCursor
, then the cursor is
located on the same tree and node as the given cursor.
PasteAfter(tree)
HTMLTree
to insert into the
existing tree.
HTMLCursor
.
<HTML> <HEAD> <TITLE> "Example Document <BODY> <H1> "PasteAfter Example <P> "This is cursor position.
And tree contains the HTML tree:
<STRONG> "This is our new text.
Then calling this method, when the cursor is positioned at the
<P>
tag, will result in the
complete HTML tree:
<HTML> <HEAD> <TITLE> "Example Document <BODY> <H1> "PasteAfter Example <P> "This is cursor position. <STRONG> "This is our new text.
PasteBefore(tree)
HTMLTree
to insert into the
existing tree.
HTMLCursor
.
<HTML> <HEAD> <TITLE> "Example Document <BODY> <H1> "PasteAfter Example <P> "This is cursor position.
And tree contains the HTML tree:
<STRONG> "This is our new text.
Then calling this method, when the cursor is positioned at the
<P>
tag, will result in the
complete HTML tree:
<HTML> <HEAD> <TITLE> "Example Document <BODY> <H1> "PasteAfter Example <STRONG> "This is our new text. <P> "This is cursor position.
PasteHead(tree)
HTMLTree
to insert into the
existing tree.
HTMLCursor
.
<HTML> <HEAD> <TITLE> "Example Document <BODY> <H1> "PasteAfter Example <P> "This is cursor position.
And tree contains the HTML tree:
<STRONG> "This is our new text.
Then calling this method, when the cursor is positioned at the
<P>
tag, will result in the
complete HTML tree:
<HTML> <HEAD> <TITLE> "Example Document <BODY> <H1> "PasteAfter Example <P> <STRONG> "This is our new text. "This is cursor position.
PasteTail(tree)
HTMLTree
to insert into the
existing tree.
HTMLCursor
.
<HTML> <HEAD> <TITLE> "Example Document <BODY> <H1> "PasteAfter Example <P> "This is cursor position.
And tree contains the HTML tree:
<STRONG> "This is our new text.
Then calling this method, when the cursor is positioned at the
<P>
tag, will result in the
complete HTML tree:
<HTML> <HEAD> <TITLE> "Example Document <BODY> <H1> "PasteAfter Example <P> "This is cursor position. <STRONG> "This is our new text.
CurHTMLTree
HTMLTree
object
that this cursor's node is a part of.
First
True
if the node at the cursor is the
first at its level (i.e.,
MovePrev()
would move off the tree), or
False
if there are nodes before it.
Last
True
if the node at the cursor is the
last at its level (i.e.,
MoveNext()
would move off the tree), or
False
if there are after before it.
Leaf
True
if the node at the cursor is at the
bottom of the tree (i.e., it does not have any children), or
False
if there are nodes below it.
NodeAttr
link_url = cursor.NodeAttr.HREF
Unlike most Python identifiers, the identifier names in this object are case-insensitive.
NodeText
string
representing
the text associated with this node, if it is of type
">text
".
Root
True
if the node at the cursor is at the
top of the tree (i.e., it does not have a parent node), or
False
if there are nodes above it.
Tag
Tag
identifier.
See the
HTMLTag
type more
information.
Objects of this type are used to represent specific nodes in an HTML tree, and provide an interface to the operations that can be performed on the tree. A cursor is a mutable type -- moving the cursor involves changing the object's state, rather than creating a new cursor object at the new position.
The editing operations
PasteAfter()
,
PasteBefore()
,
PasteHead()
, and
PasteTail()
result in one
HTMLTree
being
placed inside of another. This means that moving a cursor will
often involve moving between different
HTMLTree
objects. When this happen, the
cursor automatically calls the trees' appropriate
HTMLTree.AddCursor()
and
HTMLTree.RemCursor()
methods.
Note: the methods for deleting nodes and more
complex editing operations (e.g., inserting a single node around
around one or more nodes in the tree) are not currently
available from the HTMLCursor
.
There are no global classes defined by this module.
Dianne Kyra Hackborn <hackbod@angryredplanet.com> | Last modified: Fri Sep 13 06:08:49 PDT 1996 |