PubSubHubbub is a protocol and reference implementation for doing publish-subscribe using web hooks, polling in feeds triggered by a ping from the publisher, and POSTing Atom entries to notify subscribers. The notification part is similar to what I've been working on for the publish-subscribe stuff at Mediamatic Lab, where we spiced up Idavoll with an HTTP interface to bridge the gap between XMPP Publish-Subscribe and HTTP speaking entities.
Although I spend a lot of time working on XMPP based publish-subscribe, I understand the reasons for going for a full HTTP-based approach. XMPP can be intimidating for developers of web applications. While the differences between XMPP and HTTP are important (stateful connections, asynchronous processing, etc), the fact that it is different is reason often enough. Hosting facilities don't always offer ways to do XMPP, and there is not nearly enough running code out there to make it easier for people to play with these technologies to spice up their web application with non-IM XMPP functionality. Having platforms like Google App Engine provide sending and handling raw XMPP stanzas as part of the API would surely help.
That said, PubSubHubbub has two separate sides to it, the publishing part and the notification part. There's nothing that prevents a hub to do the publishing part using regular XMPP publish-subscribe. Instead of fetching the Atom Feed over HTTP every time, it could use autodiscovery to find out the publish-subscribe node and upgrade by subscribing to it instead. Similarly, the notification part could send out XMPP notifications. Combined with existing HTTP aggregator, that combination is very similar to how the aggregator for Mimír works.
I'm still not convinced that PubSubHubbub is the answer to the efficient exchange of updates on social objects, but I do think it is a good way to make smaller entities be part of a federation of social networking sites. Likely, we'll see a hybrid approach, to begin with.
Last month I was fortunate enough to attend Social Web FooCamp at O'Reilly HQ in Sebastopol, CA, a follow up to Social Graph FooCamp in 2008. I can't express how inspiring such events are, being able to have a continuous, in-depth conversation with so many bright minds about so many topics that keep you busy on regular days, and more. I'll give a quick overview of the whole trip, and then go into depth in a series of posts.
My trip started with a visit to friend and former Jaiku colleague Andy Smith, who was kind enough to take me in at Houseku. As soon as I landed on SFO, I got an SMS from him to make a detour to his office. Besides meeting a bunch of Andy's fellow googlers, I got to spend some time with Brett Slatkin talking about PubSubHubbub.
The next day I got a ride to Sebastopol from Edwin Aoki. After a trip full of interesting conversation, we arrived at the O'Reilly offices. Sebastopol was a lot warmer than San Francisco, perfect for camping. Lots of familiar faces, but also a lot of new ones. During the Friday evening, apart from the general introduction, I didn't get to any sessions, but instead spent talking to a bunch of people on XMPP, Publish-Subscribe and the work I am doing on federating social networks under that name Open-CI at Mediamatic Lab.
The next two days were filled with sessions and hallway talk on OpenID, OAuth, different approaches to Publish-Subscribe and inter-site communication, resource and service discovery and service scalability. While most of the topics were similar to last year, I was glad to share what we've done at Mediamatic Lab over the past year, while learning how others have fared. We used these technologies to make a true federation of social networking sites where you can make cross-site relations between people and their social objects. Some of our discoveries there we're shared among the participants, while others had interesting other approaches.
Especially interesting to me was a session on OAuth and OpenID where I could explain how we tried to improve upon the user experience. Both technologies have a bad reputation in this area. With some smart defaults and trust between sites, we could eliminate some of the screens. There was talk about using pop-ups in some situations, either as lightboxes or as new (small) windows. In our experience the former can't be used if you want to do SSL (since you can't validate the address and certificate). The latter was deemed confusing in our user tests. Research is still ongoing, I suppose. The other issue had to do with presenting OpenID providers. We currently use a drop down, but that doesn't scale up very nicely. Logos might work, but in the end has the same issue.
I also got to show Blaine Cook the code I wrote recently to make it easier to write XMPP publish-subscribe enabled services (code-as-a-node), that has been included in the recent Wokkel release. In turn, Blaine shared his thoughts on simple addressing on the web and we got to hash it out with a bunch of people like Brad Fitzpatrick, who also organized the pubsub shootout session. Finally, Eran Hammer-Lahav showed his work on XRD.
I'm pretty sure I forgot to mention a lot of things, but when it comes back to me, I'll write about it some other time.
In part 1 I wrote about what you can subscribe to and how a social network service will send out notifications. I often used node as the thing you subscribe to, a term comes directly from the XMPP Publish Subscribe specification. In other publish-subscribe implementations this is often referred to as topic. Nodes are kept by a publish-subscribe service, and, among other things, this service is responsible for keeping the list of subscribers and sending out notifications.
Publish-subscribe services currently come in two forms: dedicated
publish-subscribe services with their own domain (e.g.
pubsub.ik.nu) and publish-subscribe services tied
to a user account (often mentioned in combination with the Personal Eventing
Protocol, also known as PEP). In the latter case, nodes are
kept at the bare JID of a user's account (e.g.
ralphm@ik.nu. Personal pubsub-nodes have nice
properties, like the ability to directly associate a particular node
with a person, and the possibility of doing access control on the
user's contact list (roster).
In the context of federating social networks, a service needs to decide where to put the nodes it wants to allow other entities to subscribe to and send out notifications from. In some cases it makes sense to keep nodes at user accounts, though in some other cases it is better to provide the nodes at the domain of the service itself. This depends on the nature of the social objects and the subscribable unit you provide. Let's explore some use cases.
In Jaiku, social objects (microblog posts and aggregated items like photos, bookmarks, etc), are organized in streams. Streams are tied to either a user, or a channel, and don't change ownership. The social objects themselves are static, once created, they cannot be edited. They can have comments associated with them, but those also cannot be edited. The only thing that can happen to streams, stream items, and comments is deletion.
Here, it makes sense to have a node for each stream, and
possibly a stream for the comments to each stream item. Those can
be tied to the owner's JID (e.g.
ralphm@jaiku.com or
#jabber@jaiku.com). Another possible node could
be: all comments by a person. Another node an entity might want to
subscribe to is: all public microblog posts. Such a node would be
associated with the domain of the service rather than any
particular user's JID.
The company I work for, Mediamatic Lab has a (proprietary) CMS called anyMeta. Instead of 'content', the C in CMS here stands for Community, to highlight the social network properties it provides. anyMeta is a highly semantic system that deals in things (a person, an article, an event, a blog), and edges (the relations between things, each with a predicate like friend-of, author-of, etc). I mainly work on federating instances of anyMeta.
Things in anyMeta are usually editable, so it makes sense to want to keep informed about changes. For example, an article can have a large number of edits, and a person might move, change employers or have other changes to his profile. Thus, we chose to at least provide each thing as a subscribable unit. Upon creating a thing, a new node is created, and a representation of the thing is published to the node. Editing a thing, results in subsequent publishes. Subscribers will receive notifications as the node gets published to.
We organized the nodes in a flat namespace, tied to a domain, rather than a user. One reason is that the owner of any particular thing might change. Tying a node to the first owner, and then needing to move it when the owner changes, is cumbersome.
Each node has an identifier that is unique within the
publish-subscribe service holding them. So you could have two nodes
named updates tied to two different users. Node
identifiers are opaque; one should not derive meaning from how the
node identifier looks. Embedded slashes might suggest some
hierarchy, for example, but an application should not assume that
such a hierarchy actually exists.
That said, it makes perfect sense to use logical, human readable identifiers for nodes. They might even be very similar to the URI layout of the service's web site. Let's check what one could do for the examples given above.
It makes sense to have the node identifier for the regular
posts (called presence) be presence and the
nodes for the individual posts (with comments)
presence/123456, where the number is the same
as used in the web page for that post. Those two examples could be
tied to a JID representing me at Jaiku:
ralphm@jaiku.com.
The node for all public posts could be called
explore and located at the JID of the whole
service: jaiku.com. This would be similar to
the web site, where all public posts can be viewed at http://jaiku.com/explore.
It might also make sense to have a dedicated node for a user's
profile information, that can be retrieved and presented at a
service or application that consumes the social object updates. At
least a (full) name and some icon or headshot would be nice to
have there. Obviously, subscribing to such a node would mean that
future profile changes will also propagate to the consuming
entities. An example identifier would be
profile, to be kept at the user's JID.
In anyMeta, each thing has an identifier, that could be used for the node identifier as well. However, in the current implementation, all nodes are held by a loosely coupled, generic publish-subscribe service that caters multiple anyMeta instances. We chose to use unique identifiers as generated by the publish-subscribe service, which don't have any relation with the thing identifier.
As you might have guessed, some of the stuff being discussed
here has already been implemented in anyMeta. The
publish-subscribe service used is Idavoll. It has grown an
HTTP
interface that is used (internally) to create new nodes,
publish items that represent things, and subscribe to, and receive
notifications from, remote publish-subscribe nodes. The thing that
holds my
Mediamatic profile is represented by the node
generic/4efe2253-2242-4e01-bfdf-957cc2a9481d at
pubsub.mediamatic.nl. All things in this site,
but also the PICNIC
site, have nodes like this. In a future post I will
explore what we do with these nodes.
In this part, we explored how one could organize the nodes that entities can subscribe to to get updates. Some might be tied to the (virtual) JID of the user's account, or associated with the JID of the service itself. Node identifiers might be human guessable, and like the web URIs, or could be seemingly random opaque strings. Implementations that consume subscribe to, and consume notifications from, the nodes at social networking services, should not assume anything about the organization and naming of the providing service. This presents a challenge for the next episode: how does one know which nodes are there and what they are called? So, up next: discovery. Homework assignment: look carefully at the HTML of my Mediamatic profile page.
The use of XMPP publish-subscribe in federation and third-party applications deviates a bit from the standard use-case. Usually publishing, subscribing and receiving notifications happen through the same protocol on specific (leaf) nodes. Entities subscribe to a node that represents a particular thing they are interested in getting updates for, and when an item is published to that node, these subscribers will receive a notification for that item.
For federating social networks, the focus is on the exchange of updates on social objects or comments between services. For third-party applications, the most important thing is getting updates, preferably as soon as possible. So, for both of those use cases, receiving notifications through XMPP gives it an edge over HTTP: no polling, lower latency, less connections.
How these items are published, does not really matter that much. What you will typically see is that services somehow have a new item available (submission via the web, SMS, e-mail or a web-based API) and want to expose that through XMPP. Posting a new update through XMPP from a third-party client usually does not provide an advantage over existing web-based APIs.
For a service like Jaiku, Twitter or Identi.ca to provide XMPP publish-subscribe support, it is important to define the subscribable unit and provide that as a node. Such a node will usually not be published to directly, but is more of an aggregate node. Examples would be: all updates by a particular user, all updates in particular channel, all updates by a user and his contacts, all public updates. An other example could be: all comments on a particular social object.
Conceptually, all such aggregate nodes are internally
subscribed to a particular subset of new and updated social objects
and comments. You might even implement it exactly like that. Think
of a prospective search that is captured by a node: every time a
new item comes into the service, it is determined which of the
provided nodes would be a match for this item, based on author,
contact lists and permissions. Subsequently, for all of those
nodes, a notification will be sent out to its subscribers. Telling
items apart in this scenario is then likely not done using the
service JID, node identifier of item identifier, but using some
identifier in the payload, like Atom's id
element, although those other identifiers might provide a
context.
For those familiar with the concept of XMPP publish-subscribe collection nodes: those would be a special form of aggregate nodes that make it explicit what their relationship to the nodes they aggregate items for is.
The major topics on the 5th XMPP Summit were Jingle, and XMPP
as a complementary protocol next to HTTP for building social
networking services, as
stpeter briefly mentioned. While I think
that the consensus on OAuth
over XMPP, was very important, I think we also settled on
a good set of best practices for federating social networks using
XMPP
Publish Subscribe.
This particular topic has had my full attention over the last year or so, and it is about time that I start writing about that, explaining the afore mentioned best practices in their context. As this covers a lot of ground, I'd like to make a series out of it, each detailing a particular aspect.
Topics that will come by include: the subscribable unit and how notifications are generated, payload formats, discovery, local representation and implementation strategies.
I am a month in my new job at Mediamatic Lab. As part of one of our projects, we are working on having the different instances of anyMeta (a CMS used to build websites like those for PICNIC and Reboot. We want to use open protocols for this. To prevent us from working on our own little island, and noticing the huge buzz on opening up social networking services, I've been pretty busy with organizing a workshop (upcoming) on federating social networks. Next Saturday (8 December) we will get a bunch of smart people together to talk about what it takes to have social networking services, as well as more generic CMSs, work together, so that people are not caught between the numerous walls that are currently in place around each respective garden. My colleague wok wrote a related piece on this: Solving Social Networking Fatigue.
Although it was pretty short notice, there will be a fair number of people from the Dutch social networking crowd, like Robert Gaal of Wakoopa, who chaired the Portable Social Networks session at PICNIC '07. Also, we got two great people over from San Francisco. One of them is David Recordon of SixApart. I met David at Web 2.0 Expo Berlin, where he did a presentation on opening up social networking services. He gave a good overview on the issues and stated that most of the tools are there, and we should just put them together.
The other one is Blaine Cook of Twitter, who I met at XTech 2007 in Paris. In his talk together with Kellan Elliott-McCrea, he went into why XMPP is a great technology to complement HTTP in building online services. Also, we spoke about hooking up social networking sites like Twitter and Jaiku using XMPP. During the summer we discussed the bits and pieces in XMPP (like the publish-subscribe extension) that'd be needed for that. This resulted in my presentation at Web 2.0 Expo Berlin, and the workshop we are doing now.
The day will start off with a couple of presentations and introductions, followed by sessions of discussion (and a lunch). From 17:00h, there'll be a borrel (drink) with a few short higher level presentations for a broader group of people, with less focus on the technical aspects. Afterwards, we'll probably head into the city for some food and stuff.
Unfortunately, because of the short notice and other obligations, a lot of people that really want to come said they can't make it. Maybe we should try and do a follow-up event early next year. For them and anyone interested in all this, we will try to get some live coverage on the #fsn Jaiku channel. All in all, I am pretty excited about doing this event and hope to see you there!
In a few minutes my presentation on Federating Social Networks on Web 2.0 Expo Berlin will start. I will talk about exchanging (changes to) social objects between sites like Jaiku, Twitter and all the others using open formats like Atom on top of XMPP Publish-Subscribe. Here are the slides.
The last eight months must have been the most hectic ones I have experienced. This started out with my leaving the TU/e and starting at Jaiku. Apart from microblogging effectively killing any urge to write in this blog, there weren't many dull moments since then.
The most important happening last summer, that I've documented on Jaiku but not here, was the birth of my daughter Birgit on 28 June. I don't think I've ever been more proud and happy than at the instance I first held her. Watching her grow up and discover the world around her is awesome. Her smiles make every day a joy.
About a month ago, it was announced that Jaiku has been acquired by Google. Many people contacted me to congratulate me on this news, and I'd like to thank you all! I feel it is a compliment to the great team I've had the pleasure of working with and to our community of users and application developers that helped made Jaiku what it is today.
Naturally, the acquisition didn't happen in a day, so leading up to that there were a lot of questions that kept us and our families busy for a while. What will happen? Will we need to move? When? Where? Exciting and Exhausting at the same time. Eventually the deal obviously took place, and most of the team have left for the Bay Area for a few months, while some didn't. I was one of the guys that was not included in the acquisition, so I'm not moving after all. We had a great combined Valve/Jaiku/Thinglink new office warming but also Jaiku farewell party, though, and I'd like to say thanks to all my former colleagues for the great experience it has been. For the forseeable future, I will be associated with Jaiku, the service, but not in any official capacity.
So what now? As of today I am officially employed by Mediamatic Lab, where I will continue work on XMPP publish-subscribe technologies and open standards. More on that later, though.
Before today, my last entry here was three months ago. What has happened since? In short I stopped working for the university (thanks guys, I really enjoyed my time with you) and are now working for Jaiku since halfway March, while staying in Eindhoven. My tasks are mostly focussed on adding IM support and in general working on XMPP, standards and Twisted in our service. Obviously, I now keep a life stream which might explain the lack of entries on by regular blog. Or maybe it is all the travelling I've been doing lately.
In that entry from February, I was contemplating going to XTech because of the interesting Jabber related talks. I did go, but basically replaced Jyri and gave a presentation on Jaiku myself. There are a lot of exciting things to write about this trip, and Jaiku as well. I might do that on the plane back from Helsinki, where I am now for a gathering at Jaiku HQ.
Next week I'll be at Reboot 9.0 which like XTech gathers a lot of interesting people and already has a nice line-up of talks. Hope to see you there!
Another exciting thing that I didn't mention here before is my upcoming fatherhood! Irma is due in July and that is approaching quite fast. More on that later, of course.
I added the feeds of the students that have one of the
Jabber-related Google Summer of
Code projects, on cue by
stpeter. The blogs
are marked with the image on the right that I kindly ripped and remixed
from the Planet SoC
logo. Obviously we expect plenty of updates throughout the summer and
wish the students lots of fun!