Binder Data Model

< Storage Kit | Storage Kit | Writing Data Objects >

The Binder defines a standard data model, which is used for things such as creating the Binder Namespace, providing data to the BListView widget, etc.

Though the data model is documented as part of the Storage Kit, it actually spans between that and the Binder Kit. The core data model interfaces and client APIs (INode, IIterable, IDatum, ICatalog, ITable, SNode, SIterator, SDatum) are included in the Binder Kit, but all of the implementation for them is in the Storage Kit.

  1. The Interfaces
    1. The Data Model as a Filesystem
    2. The Data Model as a Database
  2. Data Model Clients
  3. Detailed Topics
    1. Naming Entries
    2. Entries and Identity
    3. Where Are the Files?
    4. Datum Considerations
    5. IDatum or SValue?

The Interfaces

The basic data model is simply a set of well-defined Binder interfaces, creating a hierarchical namespace of data. The most important interfaces are:

Note:
Clients will usually use the SNode, SIterator, and SDatum classes instead of calling these interfaces directly. These classes provide the same functionality as the interfaces, in a somewhat more convenient form.
It is important to note that everything in the namespace is an object. (Or more specifically, everything is an IBinder.) This is why there is a specific IDatum interface: it is the protocol through which you retrieve and place data at a particular location in the namespace.

Traversal in the namespace only goes from parent to child. The core data model does not include access from a child back to its parent. The reason for this is to support hard links in the namespace – two or more nodes that share a common child. In this situation the child has multiple parents, so it is not feasible for it to keep track of its parent.

This does not mean that APIs built on top of the basic data model can't have back-pointers an example would be exposing the view hierarchy in the namespace it just means that it is not a requirement. For example, the entire view hierarchy is visible in the namespace, and through the view hierarchy APIs you can travel back up from a child to its parent. Similarily, some data space objects may implement the IReferable interface to provide a full path back to the object.

There are two major use-cases these interfaces are designed for: filesystems and databases. Instead of discussing the interfaces themselves in abstract terms, we will describe them as concrete mappings to these traditional kinds of structured data.

The Data Model as a Filesystem

A filesystem is a hierarchical organization of directories, where a directory can contain other directories as well as leaf files holding data. In addition, each file or directory has some fixed set of meta-data associated with it, often creation and modification dates and permission flags.

All files and directories in a filesystem implement the INode interface. This is to provide meta-data (such as creation and modification date) about that item, and for directories it is used to walk a path through the directory hierarchy.

A file also implements the IDatum interface, allowing access to that file's data. In particular, the IDatum::Open() method is the equivalent to opening a file, giving by byte stream interfaces through which you can read and write data in the file.

A directory, in contrast, implements IIterable and ICatalog. The IIterable interface lets you browse through the contents of the directory this is, for example, the API that would be used to implement an "ls" command. The ICatalog interface provides various methods for adding new entries (files or directories) to the directory, renaming entries, and deleting existing entries.

The Data Model as a Database

A database is a flat collection of tables. Each table is a two-dimensional structure, consisting of an arbitrary numbner of rows, each containing a set data items corresponding to from a fixed set of columns defined by the table.

The table of a database revolves around the IIterable interface. This allows you to create queries on the table by calling IIterable::NewIterator() with the appropriate options, such as "select" for the columns to include (the projection), "sort_by" to specify row ordering, and "where" to do filtering. These options correspond fairly directly to the SQL operation that must be done on the underlying database, with the returned IIterator being a cursor on the query results.

For each database row returned by this IIterator, you get back an INode containing data of the table's columns for that row. The INode::Walk() method is used to retrieve the desired column data by name.

Each of the pieces of data associated with a row and column is provided through the IDatum interface. You will usually access that data through IDatum::Value() to retain type information.

Note that at each of these levels in the database other secondary interfaces are often implemented. For example, tables should always try to implement ITable to provide some more efficient mechanisms for modifying and watching their contents. Tables also usually implement INode to be able to associate a MIME type and other meta-data with the table, and if there is a known key column associated with the table the INode may also allow you to Walk() to single rows in the table using that column as a name.

Rows usually implement IIterable for you to be able to step through all of the contents of a row, looking very much like a directory in a filesystem. The IDatums under the row, however, usually don't implement INode because there is no specific meta-data associated with them.

Data Model Clients

Clients of the data model interfaces should work through the SNode, SIterator, and SDatum classes for ease of use.

One of the most common places you come in contact with the data model interfaces is in the Binder Namespace, usually accessed through an SContext object. While this class has some conveniences for doing common operations on the namespace, you can also use SContext::Root() to directly access the INode that provides the root of its namespace.

Detailed Topics

Naming Entries

The introduction of meta-data begs the question of how you walk to those entries. The meta-data formally exists under each INode as a separate node called ":", so you can ls that node just like any other node:

$ ls img.png/:
mimeType
creationDate
modifiedDate

And you can get to individual pieces of meta-data simply by walking through the attributes nod:

$ cat img.png/:/mimeType
vnd.palm.catalog/vnd.palm.plain

As a convenience, we specify that the ':' at the front of a path name is a special identifier for the attributes namespace, so you can also treat it is an entry at the same level as the INode it is associated with:

$ cat img.png/:mimeType
vnd.palm.catalog/vnd.palm.plain

This is how you will normally access attributes, and it is important to allow this so that these attributes can be accessed at the same level as the normal catalog entries. For example, consider accessing different pieces of data in the 'font' service:

$ cat services/font/value
Some text

$ cat services/font/:mimeType
vnd.palm.catalog/vnd.palm.plain

However other operations besides Walk() do not expose the meta-data:

$ ls services/font
value
items

$ ls services/font/:
mimeType
creationDate
modifiedDate

Entries and Identity

The things inside of a node are called entries. Each entry consists of one or more Binder interfaces, representing the capabilities of that entry. The node will pick one of these Binder interfaces to be the identity of the entry. When the node returns that entry, it returns the IBinder for that selected identity interface.

Data model objects must obey the rule of identity persistence. This says that if a client requests an entry and keeps a strong reference on it, then later requests the same entry again, the IBinder of the first and second requests will be the same. If, however, the entry in the node has changed between the first and second requests, then the interface must return a different IBinder for that entry. (If the data inside the entry changes, it is still the same entry, and thus the identity must persist.) Of course, if a client releases all references on an entry, what it will receive next time is completely undefined (it must be this way since it is undefined by the underlying Binder object model as well).

It is important to state this rule explicitly because many data model implementations will manage their entries in special ways. Consider, for example, a directory on a file system. In general there will not be one object created up-front for every entry in the underlying file system directory. Instead, the directory node will construct objects for individual entries in the file system on demand, destroying them later when all clients have finished using them. A node is free to manage entries however it wants – creating and destroying actual INode and IDatum objects as needed – as long as it publicly obeys entry identity persistence.

One other implication to be aware of is that holding a weak pointer on a namespace object may not behave as you expect. For example, trying to promote that to a strong pointer may fail even though an attempt to re-retrieve the same object would still succeed. This is to allow node implementation to dynamically construct objects on demand, meaning those constructed objects may be destroyed after all strong references on them are released.

Where Are the Files?

A question that comes up in this namespace is "how do I know whether some entry is a file?" Because it is possible to have entries that are both a datum and a node, it isn't at all clear how you know whether a particular entry you are looking at should be considered a file. A good example to make this more concrete is implementing a file browser: if you are showing the user a list of entries in a node and they click on an entry that is both a datum and a node, do you open the datum or dig down into the node?

For this purpose our definition of a file is "any entry that is a datum". Thus you will always open an entry if it allows that. This definition implies that datums will tend to appear toward the leaves of the namespace; a datum that appears closer to the root of the namespace will tend to hide any node beyond it.

At some point in the future we may introduce facilities to map actions to mimeTypes, so that you could retrieve the mimeType of the object you have and determine from that whether you should open its data or dig further into its node.

Datum Considerations

It is very important that, in the formal data model, every piece of data in the namespace has an object behind it – this object serves as the identity of that data, through which you can grant access to other entities in the system, perform monitoring operations, etc. Simply enforcing this as the only way to access data, however, would have a significant performance impact: every access to a data entry would involve transferring new object to the client, followed by a second IPC to retrieve the data.

To address this, a client may ask an INode or IIterable to directly return the contents of an entry's IDatum as an SValue, skipping the intermediate IDatum object altogether. This facility makes it much more practical to interact with small pieces of data (file attributes, individual items in schema databases, data in the settings catalog, etc) through the standard namespace.

A node is not required to provide this direct data mechanism – any client making use of it must be able to deal with receipt of an IDatum object and retrieving the actual data through that. An individual node may even have different behavior for each entry – for example, a file system may allow you to directly retrieve the data of files less than 512 bytes, but always return datums for files larger than that. This direct data access is purely an optimization hint that the client makes at the time of the request.

The SDatum::FetchValue() and SDatum::FetchTruncatedValue() are very important conveniences for clients wanting to make use of this optimization.

A similar facility is available for dealing with nodes. When iterating you can request that sub-nodes be collapsed into value mappings in the returned iterator. This is a very useful optimization for example when using a node to populate a list view, where the list view needs to know certain entries from each sub-node to populate the data in its rows. As with datums, this optimization is entirely optional and the caller needs to deal gracefully with situations when it doesn't happen.

The SNode class provides conveniences for dealing with node collapsing.

IDatum or SValue?

At their core, the IDatum and SValue express a similar concept: a typed piece of data. Ignoring the slightly different coating, what is the difference between these? Do we really need both of them?

The key thing to understand about these two APIs is their semantic usage. An SValue is an anonymous blob of data. You generally pass it by value (each thing effectively has its own copy of the data), and there is no dentity attached to an SValue beyond the very primitive concept of a C++ "pointer to an object".

An IDatum, in contrast, blatantly carries an identity. It is a Binder object, meaning that it has a very well-defined identity that can be carried across processes. Because of this, it is always passed by reference – if you give a IDatum to someone else and they modify what they got, you will see those changes as well . This implies that the datum API must provide reasonable support for multithreaded accesses, where-as the SValue API is fundamentally not thread-safe.

Another way to look at this is that an SValue is a raw piece of data, and an IDatum wraps up an SValue in a Binder interface (plus other facilities more appropriate for dealing with larger streams of data).