pubsub.com and XMPP
A few months ago, I wrote about pubsub.com. The guy behind pubsub.com is Bob Wyman, who has been involved in the discussions on the direction of the pubsub JEP.
Recently, pubsub has been getting more attention again, and at one point Bob sent a link to the list about PubSub.com's use of XMPP PubSub. This is a document describing the experimental support of pushing Atom snippets using the pubsub JEP over Jabber. It is an interesting read, I must say. The comments I made in my previous post still stand, and I'll focus on the new stuff in the JEP.
First of all, pubsub.com focusses on content-based subscriptions. You subscribe to the results of a certain query based on content that becomes available. This is an interesting concept, which is also used in the Location Linked Information (LLI) project. In a recent mail by Bob, sent while I was already working on this blog entry, he talks about some of the problems he encountered while trying to implement such a system. He proposes some modifications to JEP 0060 to have the concept of subnodes. For LLI, however, I recommended an approach which might also be applicable for pubsub.com. It uses existing Jabber protocols, without alterations.
A user asks for a Data Form, using Jabber Search, in which he can specify the query he wants to subscribe to. This form is sent back to the service. If that particular query hasn't been used earlier, a new pubsub node is created to which the query results are published. The requesting user is automatically subscribed, and if someone else afterwards asks the same query, he can just be subscribed to the same node, without creating a new one. The node is then reported back in the result of the Jabber Search.
<iq type="get" from="ralphm@ik.nu/pubsub.com_client" to="pubsub.example.com" id="search1">
<query xmlns="jabber:iq:search"/>
</iq>
<iq type="result" from="pubsub.example.com" to="ralphm@ik.nu/pubsub.com_client" id="search1">
<query xmlns="jabber:iq:search">
<x xmlns="jabber:x:data" type="form">
<title>PubSub.com query</title>
<instructions>Fill in the query to search in the pubsub.com feeds</instructions>
<field type="hidden" var="FORM_TYPE">
<value>jabber:iq:search</value>
</field>
<field type="text-single" label="Query" var="query"/>
</x>
</iq>
</iq>
<iq type="set" from="ralphm@ik.nu/pubsub.com_client" to="pubsub.example.com" id="search2">
<query xmlns="jabber:iq:search">
<x xmlns="jabber:x:data" type="submit">
<field type="hidden" var="FORM_TYPE">
<value>jabber:iq:search</value>
</field>
<field var="query">
<value>(SOURCE:pubsub.com AND "RSS")</value>
</field>
</x>
</iq>
</iq>
<iq type="result" from="pubsub.example.com" to="ralphm@ik.nu/pubsub.com_client" id="search2">
<query xmlns="jabber:iq:search">
<x xmlns="jabber:x:data" type="result">
<field type="hidden" var="FORM_TYPE">
<value>jabber:iq:search</value>
</field>
<field var="host">
<value>pubsub.example.com</value>
</field>
<field var="node">
<value>e09d3ead0285a6d20d211916a783b4a9</value>
</field>
</x>
</iq>
</iq>
The user should be allowed to request unsubscription and view the current subscriptions via the protocols specified in JEP 0060. The node configuration form can be used to let the user check back on the specifics of that node, like the original query.
The above is for the dynamic subscriptions. Pubsub.com still has a lot of topic based nodes, which I think can still be retained. The content providers publish the news to their node, and the pubsub.com backend then matches those to the user-defined queries, and republishes the data to the respective dynamic nodes described above. Of course this can be an internal action, not actually involving Jabber protocol, the user just receives the notification of the nodes they subscribed to.
Bob also shortly mentions the situation where a user might receive items more than once, because they match multiple queries. I'm not sure if you would have to solve that server side. Sure it generates a bit more traffic, but is immediately clear which query yielded a particular result. A client application could always filter the data on the unique identifier that is embedded in the payload, and leave out duplicates.
Secondly, I like the concept of using Atom for holding the news
entries. Basically, you get mini feeds with just one entry. I would want
to propose to use the entry
element as the root, in
stead of feed
. This is legal XML, and shouldn't really
pose a problem. I would want to keep the
ps:source-feed
idea, but only have it contain the
title
and link
elements (but using
the Atom namespace). The author
element is not needed,
as it can be put in the entry payload, if not already present. The
modified
element holds no real value for the end user,
as we are not really dealing with files anymore.
Maybe it is desirable to be informed of changes in the feed's meta
data. For normal topic based subscriptions, I'd create one specifically
named node item (e.g. feed-info
) that holds the data
normally found in the header of an Atom file. It is only updated as
something in that data changes (not sure about the
modified
element there). For content based
subscriptions, I would want to propose to have a link
element there that points to the topic node that contains all entries,
using the XMPP
URI Format. That allows the client to request the feed's meta
data via Jabber, instead of having to refer to the original feed via
HTTP. If it is also desirable to keep informed of changes to the meta
data of feeds covered by a particular query, there could also be a
link
element pointing to a sister node of the feed's
node that only gets the feed's meta data published.