ralphm.net

ralphm's blog

Wednesday, 30 July 2008

XMPP and Social Networks, part 2: Nodes

How to organize the nodes that can be subscribed to and what identifiers to use for them.

In part 1 I wrote about what you can subscribe to and how a social network service will send out notifications. I often used node as the thing you subscribe to, a term comes directly from the XMPP Publish Subscribe specification. In other publish-subscribe implementations this is often referred to as topic. Nodes are kept by a publish-subscribe service, and, among other things, this service is responsible for keeping the list of subscribers and sending out notifications.

Publish-subscribe services currently come in two forms: dedicated publish-subscribe services with their own domain (e.g. pubsub.ik.nu) and publish-subscribe services tied to a user account (often mentioned in combination with the Personal Eventing Protocol, also known as PEP). In the latter case, nodes are kept at the bare JID of a user's account (e.g. ralphm@ik.nu. Personal pubsub-nodes have nice properties, like the ability to directly associate a particular node with a person, and the possibility of doing access control on the user's contact list (roster).

Node organization

In the context of federating social networks, a service needs to decide where to put the nodes it wants to allow other entities to subscribe to and send out notifications from. In some cases it makes sense to keep nodes at user accounts, though in some other cases it is better to provide the nodes at the domain of the service itself. This depends on the nature of the social objects and the subscribable unit you provide. Let's explore some use cases.

Jaiku

In Jaiku, social objects (microblog posts and aggregated items like photos, bookmarks, etc), are organized in streams. Streams are tied to either a user, or a channel, and don't change ownership. The social objects themselves are static, once created, they cannot be edited. They can have comments associated with them, but those also cannot be edited. The only thing that can happen to streams, stream items, and comments is deletion.

Here, it makes sense to have a node for each stream, and possibly a stream for the comments to each stream item. Those can be tied to the owner's JID (e.g. ralphm@jaiku.com or #jabber@jaiku.com). Another possible node could be: all comments by a person. Another node an entity might want to subscribe to is: all public microblog posts. Such a node would be associated with the domain of the service rather than any particular user's JID.

anyMeta

The company I work for, Mediamatic Lab has a (proprietary) CMS called anyMeta. Instead of 'content', the C in CMS here stands for Community, to highlight the social network properties it provides. anyMeta is a highly semantic system that deals in things (a person, an article, an event, a blog), and edges (the relations between things, each with a predicate like friend-of, author-of, etc). I mainly work on federating instances of anyMeta.

Things in anyMeta are usually editable, so it makes sense to want to keep informed about changes. For example, an article can have a large number of edits, and a person might move, change employers or have other changes to his profile. Thus, we chose to at least provide each thing as a subscribable unit. Upon creating a thing, a new node is created, and a representation of the thing is published to the node. Editing a thing, results in subsequent publishes. Subscribers will receive notifications as the node gets published to.

We organized the nodes in a flat namespace, tied to a domain, rather than a user. One reason is that the owner of any particular thing might change. Tying a node to the first owner, and then needing to move it when the owner changes, is cumbersome.

Node naming

Each node has an identifier that is unique within the publish-subscribe service holding them. So you could have two nodes named updates tied to two different users. Node identifiers are opaque; one should not derive meaning from how the node identifier looks. Embedded slashes might suggest some hierarchy, for example, but an application should not assume that such a hierarchy actually exists.

That said, it makes perfect sense to use logical, human readable identifiers for nodes. They might even be very similar to the URI layout of the service's web site. Let's check what one could do for the examples given above.

Jaiku

It makes sense to have the node identifier for the regular posts (called presence) be presence and the nodes for the individual posts (with comments) presence/123456, where the number is the same as used in the web page for that post. Those two examples could be tied to a JID representing me at Jaiku: ralphm@jaiku.com.

The node for all public posts could be called explore and located at the JID of the whole service: jaiku.com. This would be similar to the web site, where all public posts can be viewed at http://jaiku.com/explore.

It might also make sense to have a dedicated node for a user's profile information, that can be retrieved and presented at a service or application that consumes the social object updates. At least a (full) name and some icon or headshot would be nice to have there. Obviously, subscribing to such a node would mean that future profile changes will also propagate to the consuming entities. An example identifier would be profile, to be kept at the user's JID.

anyMeta

In anyMeta, each thing has an identifier, that could be used for the node identifier as well. However, in the current implementation, all nodes are held by a loosely coupled, generic publish-subscribe service that caters multiple anyMeta instances. We chose to use unique identifiers as generated by the publish-subscribe service, which don't have any relation with the thing identifier.

As you might have guessed, some of the stuff being discussed here has already been implemented in anyMeta. The publish-subscribe service used is Idavoll. It has grown an HTTP interface that is used (internally) to create new nodes, publish items that represent things, and subscribe to, and receive notifications from, remote publish-subscribe nodes. The thing that holds my Mediamatic profile is represented by the node generic/4efe2253-2242-4e01-bfdf-957cc2a9481d at pubsub.mediamatic.nl. All things in this site, but also the PICNIC site, have nodes like this. In a future post I will explore what we do with these nodes.

In this part, we explored how one could organize the nodes that entities can subscribe to to get updates. Some might be tied to the (virtual) JID of the user's account, or associated with the JID of the service itself. Node identifiers might be human guessable, and like the web URIs, or could be seemingly random opaque strings. Implementations that consume subscribe to, and consume notifications from, the nodes at social networking services, should not assume anything about the organization and naming of the providing service. This presents a challenge for the next episode: how does one know which nodes are there and what they are called? So, up next: discovery. Homework assignment: look carefully at the HTML of my Mediamatic profile page.