NOTE:
This project is no longer being maintained: it was developed for my masters thesis, which was completed in early 1997. I still, however, welcome any questions or comments that people may have.

[Home] [ToC] [Prev] [Next]


iHTML Python Language

Standard ihMarkup Module

The ihMarkup module provides two low-level types for creating, examining, and editing HTML markup trees. These markup trees represent the underlying structure and content of an HTML document, in a form that can be easily manipulated by a program. The basic component of this representation is the node, which corresponds to a single part of the document: a tag (such as <HEAD>, <IMG>, or <P>), piece of text, comment, etc. Each node can have zero or more children, corresponding to the content of that node.

For example, consider the following simple HTML document:

<HTML>
  <HEAD>
    <TITLE>Example Document</TITLE>
  </HEAD>
  <BODY>
    <H1>Example Title</H1>
    <P>Example paragraph.</P>
    <HR>
    <ADDRESS>Example address</ADDRESS>
  </BODY>
</HTML>
      

The tree representation of this that you would manipulate with these classes looks like this:

<HTML>+-+-+<HEAD>+----+<TITLE>+----+|"Example Document"|
        |
        +-+<BODY>+-+----+<H1>+-----+|"Example Title"|
                   |
                   +----+<P>+------+|"Example paragraph."|
                   |
                   +----+<HR>
                   |
                   +--+<ADDRESS>+--+|"Example address."|
      

Or, in a perhaps more familiar form:

                              <HTML>
                                |
      +-------------------------+---------------+
      |                                         |
    <HEAD>                                    <BODY>
      |                                         |
   <TITLE>             +-------------------+----+-----+---------+
      |                |                   |          |         |
"Example Document"    <H1>                <P>        <HR>   <ADDRESS>
                       |                   |                    |
                 "Example Title"  "Example paragraph."  "Example address."
      

This class does not give direct access to the tree at the node level, however; instead, it defines a type HTMLTree that consists of a complete tree -- which may be only one node, but more often consists of some larger sub-tree. A second type, HTMLCursor represents a particular location/node in that tree, and is used perform the actual manipulation of the tree.

In addition, this class defines various helper functions for performing common operations on these HTML trees.


Exported Values


Tag Names

Tag
This is a HTMLTag object, which is used to get the identifiers for HTML tags. For example, "Tag.A" is the browser's identifier for the <A> tag, "Tag.TITLE" is the browser's identifier for the <TITLE> tag, etc.

Exported Exceptions


There are no global exceptions defined by this module.


Exported Functions


FillinMarkup

Synopsis
FillinMarkup(tree, dict)
Arguments
tree
The HTMLTree that is to be modified.
dict
A dictionary containing (keyword,value) pairs, that map to the tag IDs to modify and the markup (as an ASCII string) to insert at that point.
Result
nothing.
Example
If tree is a HTMLTree representing:
<HTML> <HEAD> <TITLE ID=TITLE> "Document
       <BODY> <H1 ID=TITLE>
              <P ID=CREATOR>   "made this thing.
              <HR>
              <P ID=PICTURE>   "This is me.
	  

And arg is the dictionary:

{
  "TITLE":   'Spiffy iHTML Example',
  "CREATOR": 'Dianne Hackborn',
  "PICTURE": '<IMG SRC="dianne.gif" ALT="_.oo_Q_Q_oo._">'
}
	  

Then calling FillinMarkup() with these two arguments will result in tree being modified to look like:

<HTML> <HEAD> <TITLE ID=TITLE> "Spiffy iHTML Example
                               "Document
       <BODY> <H1 ID=TITLE>    "Spiffy iHTML Example
              <P ID=CREATOR>   "Dianne Hackborn
                               "made this thing.
              <HR>
              <P ID=PICTURE>   <IMG SRC="dianne.gif" ALT="_.oo_Q_Q_oo._">
                               "This is me.
	  

This function can be used to perform complex insertion operations on a HTMLTree. The given dictionary defines a mapping from tag identifier names (i.e., a tag's ID attribute) to an ASCII string of HTML markup to insert into the tree immediate after (or, more formally, as the first child of) any tags that have that ID. The string is converted into a HTMLTree at each point it is inserted into tree by calling NewHTMLTree().


NewHTMLCursor

Synopsis
NewHTMLCursor() NewHTMLCursor(tree) NewHTMLCursor(cursor)
Arguments
tree
A HTMLTree that the cursor is to be on.
cursor
A HTMLCursor representing the place where this cursor should also be.
A newly created HTMLCursor object.

This is the function used to create new HTMLCursor objects. It as one optional parameter that, if supplied, is the initial location of the cursor. It can be one of two types: if it is a HTMLTree, then the cursor begins at the first node of that tree; otherwise, if it is a HTMLCursor, then the cursor begins on the same tree and node as the given cursor. If no parameter is supplied, the cursor begins on no tree.


NewHTMLTree

Synopsis
NewHTMLTree()
NewHTMLTree(markup)
Arguments
markup
An ASCII string of HTML markup.
Result
A newly created HTMLTree object.

This is the function used to create new HTMLTree objects. It as one optional parameter that, if supplied, is an ASCII string of HTML markup; this is run through the browser's HTML parser, and the resulting parse tree used as the initial HTMLTree object. If not supplied, the HTMLTree starts out empty.


Exported Types


HTMLTag

Methods

No methods are defined by this type.

Members

No member variables are (explicitly) defined by this type, but see below.

Description

The HTMLTag type provides access to the host browser's internal identifiers for its HTML tree nodes. These identifiers may be accessed as if they were member variables of the object. Their values are an opaque object whose type is browser-dependent; the only thing that can be done with them is to compare them to other tag values.

Unlike other Python identifiers, the tags defined by this class are case-insensitive. This means, for example, that "Tag.TITLE" is the same as "Tag.title", "Tag.Title", or even "Tag.tItLe".

In addition to the expected HTML tag names (HTML, P, IMG, STRONG, etc.), there are two special identifiers defined:

>text
is the identifier for a node that contains document text, and
>unknown
is the identifier for a node that is not a tag name the browser recognizes.

In order to retrieve these values, Python's getattr() function should be use. For example,

textTag = getattr(ihMarkup.Tag,'>text')
      

HTMLTree

This type encapsulates zero or more nodes in an HTML parse tree.

Methods

AddCursor(cursor)
Summary
Add a HTMLCursor to the tree.
Arguments
cursor
The HTMLCursor to be added to the tree.
Result
nothing.
Description
This function adds the given cursor to the HTML tree, positioning it at the first node on the tree.
RemCursor(cursor)
Summary
Remove a HTMLCursor from the tree.
Arguments
cursor
An HTMLCursor that is currently on the tree.
Result
nothing.
Description
This function removes the given cursor to the HTML tree, if the cursor is currently on it. If the cursor is not on the tree, no action is performed.

Members

No member variables are defined by this type.

Description

The HTMLTree represents some piece of an HTML parse tree, consisting of zero or more nodes. This class itself does not allow the tree to be manipulated or examined; instead it allows an HTMLCursor to be added to the tree, which allows the individual nodes that make up the tree to be examined, and through which operations on the tree are performed.


HTMLCursor

Locates particular nodes in an HTMLTree and serves as a proxy through which the tree is manipulated.

Methods

MoveChild()
MoveChild(pos)
Summary
Go to the first child node of the current location.
Arguments
pos
Optional HTMLCursor or HTMLTree at which to position this cursor, before it is moved.
Result
This HTMLCursor in its new position, or None if its current node has no children.
Description
This method moves the cursor to the first child of the current node it is pointing at.
MoveNext()
MoveNext(pos)
Summary
Go to the next node in this level of the HTML tree.
Arguments
pos
Optional HTMLCursor or HTMLTree at which to position this cursor, before it is moved.
Result
This HTMLCursor in its new position, or None if there is no node after it.
Description
This method moves the cursor to the next node that is at the same level as the current node. In other words, it moves to the next child node of its parent.
MoveNextDepth()
MoveNextDepth(pos)
Summary
Go to the next node of the tree, in depth-first order.
Arguments
pos
Optional HTMLCursor or HTMLTree at which to position this cursor, before it is moved.
Result
This HTMLCursor in its new position, or None if there is no node after it.
Description
This method moves the cursor to the node that would occur after this one, if a depth-first traversal of the tree were being performed. In other words, this traverses the nodes of the tree in the same order that they occurred in the original HTML document.
MoveNextID(id)
Summary
Go to the next node with a specific ID attribute.
Arguments
id
A string representing the ID to search for. This is a case-sensitive search.
Result
This HTMLCursor in its new position, or None if there is no node after it with the given ID.
Description
This method moves the cursor in a depth-first traversal, as with MoveNextDepth(), but only stops the cursor at nodes that have the given ID attribute value.
MoveNextTag(tag)
Summary
Go to the next node with a specific Tag identifier.
Arguments
tag
A string representing the Tag identifier to search for.
Result
This HTMLCursor in its new position, or None if there is no node after it with the given Tag.
Description
This method moves the cursor in a depth-first traversal, as with MoveNextDepth(), but only stops the cursor at nodes that are the same type as the given Tag identifier.
MoveParent()
MoveParent(pos)
Summary
Go to the parent node of the current location.
Arguments
pos
Optional HTMLCursor or HTMLTree at which to position this cursor, before it is moved.
Result
This HTMLCursor in its new position, or None if there is no parent to its current node.
Description
This method moves the cursor to the parent of the current node it is pointing at.
MovePrev()
MovePrev(pos)
Summary
Go to the previous node in this level of the HTML tree.
Arguments
pos
Optional HTMLCursor or HTMLTree at which to position this cursor, before it is moved.
Result
This HTMLCursor in its new position, or None if there is no node after it.
Description
This method moves the cursor to the previous node that is at the same level as the current node. In other words, it moves to the previous child node of its parent.
MovePrevDepth()
MovePrevDepth(pos)
Summary
Go to the previous node of the tree, in depth-first order.
Arguments
pos
Optional HTMLCursor or HTMLTree at which to position this cursor, before it is moved.
Result
This HTMLCursor in its new position, or None if there is no node before it.
Description
This method moves the cursor to the node that would occur before this one, if a depth-first traversal of the tree were being performed. In other words, this traverses the nodes of the tree in the reverse order that they occurred in the original HTML document.
SetPos()
SetPos(tree)
SetPos(cursor)
Summary
Change the cursor's location.
Arguments
tree
A HTMLTree on which to position the cursor.
cursor
A HTMLCursor at which this cursor should also be positioned.
Result
The same HTMLCursor.
Description
This method directly changes the cursor's current position. If no arguments are supplied, the cursor is not positioned on any tree. The single argument can be one of two types: if it is a HTMLTree, then the cursor is located at the first node of that tree; otherwise, if it is a HTMLCursor, then the cursor is located on the same tree and node as the given cursor.
PasteAfter(tree)
Summary
Insert a HTML tree after the cursor.
Arguments
tree
The new HTMLTree to insert into the existing tree.
Result
The same HTMLCursor.
Example
If the HTML tree that the cursor is located on is:
<HTML> <HEAD> <TITLE>   "Example Document
       <BODY> <H1>      "PasteAfter Example
              <P>       "This is cursor position.
	  

And tree contains the HTML tree:

              <STRONG>  "This is our new text.
	  

Then calling this method, when the cursor is positioned at the <P> tag, will result in the complete HTML tree:

<HTML> <HEAD> <TITLE>   "Example Document
       <BODY> <H1>      "PasteAfter Example
              <P>       "This is cursor position.
              <STRONG>  "This is our new text.
	  
Description
This method inserts a given HTML tree into the tree that this cursor is on, after the node it is located at.
PasteBefore(tree)
Summary
Insert a HTML tree before the cursor.
Arguments
tree
The new HTMLTree to insert into the existing tree.
Result
The same HTMLCursor.
Example
If the HTML tree that the cursor is located on is:
<HTML> <HEAD> <TITLE>   "Example Document
       <BODY> <H1>      "PasteAfter Example
              <P>       "This is cursor position.
	  

And tree contains the HTML tree:

              <STRONG>  "This is our new text.
	  

Then calling this method, when the cursor is positioned at the <P> tag, will result in the complete HTML tree:

<HTML> <HEAD> <TITLE>   "Example Document
       <BODY> <H1>      "PasteAfter Example
              <STRONG>  "This is our new text.
              <P>       "This is cursor position.
	  
Description
This method inserts a given HTML tree into the tree that this cursor is on, before the node it is located at.
PasteHead(tree)
Summary
Insert a HTML tree before the cursor's first child.
Arguments
tree
The new HTMLTree to insert into the existing tree.
Result
The same HTMLCursor.
Example
If the HTML tree that the cursor is located on is:
<HTML> <HEAD> <TITLE>   "Example Document
       <BODY> <H1>      "PasteAfter Example
              <P>       "This is cursor position.
	  

And tree contains the HTML tree:

              <STRONG>  "This is our new text.
	  

Then calling this method, when the cursor is positioned at the <P> tag, will result in the complete HTML tree:

<HTML> <HEAD> <TITLE>   "Example Document
       <BODY> <H1>      "PasteAfter Example
              <P>       <STRONG>  "This is our new text.
                        "This is cursor position.
	  
Description
This method inserts a given HTML tree into the tree that this cursor is on, placing it as the first child of the node at the cursor's location.
PasteTail(tree)
Summary
Insert a HTML tree after the cursor's last child.
Arguments
tree
The new HTMLTree to insert into the existing tree.
Result
The same HTMLCursor.
Example
If the HTML tree that the cursor is located on is:
<HTML> <HEAD> <TITLE>   "Example Document
       <BODY> <H1>      "PasteAfter Example
              <P>       "This is cursor position.
	  

And tree contains the HTML tree:

              <STRONG>  "This is our new text.
	  

Then calling this method, when the cursor is positioned at the <P> tag, will result in the complete HTML tree:

<HTML> <HEAD> <TITLE>   "Example Document
       <BODY> <H1>      "PasteAfter Example
              <P>       "This is cursor position.
                        <STRONG>  "This is our new text.
	  
Description
This method inserts a given HTML tree into the tree that this cursor is on, placing it as the last child of the node at the cursor's location.

Members

CurHTMLTree
Read-only. The actual HTMLTree object that this cursor's node is a part of.
First
Read-only. True if the node at the cursor is the first at its level (i.e., MovePrev() would move off the tree), or False if there are nodes before it.
Last
Read-only. True if the node at the cursor is the last at its level (i.e., MoveNext() would move off the tree), or False if there are after before it.
Leaf
Read-only. True if the node at the cursor is at the bottom of the tree (i.e., it does not have any children), or False if there are nodes below it.
NodeAttr
Read-only. An object containing the attributes associated with this node. They can be retrieves as identifiers in the object, e.g.:
link_url = cursor.NodeAttr.HREF
	  

Unlike most Python identifiers, the identifier names in this object are case-insensitive.

NodeText
Read-only. A string representing the text associated with this node, if it is of type ">text".
Root
Read-only. True if the node at the cursor is at the top of the tree (i.e., it does not have a parent node), or False if there are nodes above it.
Tag
Read-only. This node's Tag identifier. See the HTMLTag type more information.

Description

Objects of this type are used to represent specific nodes in an HTML tree, and provide an interface to the operations that can be performed on the tree. A cursor is a mutable type -- moving the cursor involves changing the object's state, rather than creating a new cursor object at the new position.

The editing operations PasteAfter(), PasteBefore(), PasteHead(), and PasteTail() result in one HTMLTree being placed inside of another. This means that moving a cursor will often involve moving between different HTMLTree objects. When this happen, the cursor automatically calls the trees' appropriate HTMLTree.AddCursor() and HTMLTree.RemCursor() methods.

Note: the methods for deleting nodes and more complex editing operations (e.g., inserting a single node around around one or more nodes in the tree) are not currently available from the HTMLCursor.


Exported Classes


There are no global classes defined by this module.


[Home] [ToC] [Prev] [Next]

_________.oo_Q_Q_oo.____________________________________________
Dianne Kyra Hackborn <hackbod@angryredplanet.com>
Last modified: Fri Sep 13 06:08:49 PDT 1996

This web page and all material contained herein is Copyright (c) 1997 Dianne Hackborn, unless otherwise noted. All rights reserved.