ralphm's blog

Sunday, 13 June 2004

pubsub.com and XMPP

Getting there...

A few months ago, I wrote about pubsub.com. The guy behind pubsub.com is Bob Wyman, who has been involved in the discussions on the direction of the pubsub JEP.

Recently, pubsub has been getting more attention again, and at one point Bob sent a link to the list about PubSub.com's use of XMPP PubSub. This is a document describing the experimental support of pushing Atom snippets using the pubsub JEP over Jabber. It is an interesting read, I must say. The comments I made in my previous post still stand, and I'll focus on the new stuff in the JEP.

First of all, pubsub.com focusses on content-based subscriptions. You subscribe to the results of a certain query based on content that becomes available. This is an interesting concept, which is also used in the Location Linked Information (LLI) project. In a recent mail by Bob, sent while I was already working on this blog entry, he talks about some of the problems he encountered while trying to implement such a system. He proposes some modifications to JEP 0060 to have the concept of subnodes. For LLI, however, I recommended an approach which might also be applicable for pubsub.com. It uses existing Jabber protocols, without alterations.

A user asks for a Data Form, using Jabber Search, in which he can specify the query he wants to subscribe to. This form is sent back to the service. If that particular query hasn't been used earlier, a new pubsub node is created to which the query results are published. The requesting user is automatically subscribed, and if someone else afterwards asks the same query, he can just be subscribed to the same node, without creating a new one. The node is then reported back in the result of the Jabber Search.

<iq type="get" from="ralphm@ik.nu/pubsub.com_client" to="pubsub.example.com" id="search1">
  <query xmlns="jabber:iq:search"/>
</iq>

Client requests form

<iq type="result" from="pubsub.example.com" to="ralphm@ik.nu/pubsub.com_client" id="search1">
  <query xmlns="jabber:iq:search">
    <x xmlns="jabber:x:data" type="form">
      <title>PubSub.com query</title>
      <instructions>Fill in the query to search in the pubsub.com feeds</instructions>
      <field type="hidden" var="FORM_TYPE">
        <value>jabber:iq:search</value>
      </field>
      <field type="text-single" label="Query" var="query"/>
    </x>
  </iq>
</iq>

Service returns form

<iq type="set" from="ralphm@ik.nu/pubsub.com_client" to="pubsub.example.com" id="search2">
  <query xmlns="jabber:iq:search">
    <x xmlns="jabber:x:data" type="submit">
      <field type="hidden" var="FORM_TYPE">
        <value>jabber:iq:search</value>
      </field>
      <field var="query">
        <value>(SOURCE:pubsub.com AND "RSS")</value>
      </field>
    </x>
  </iq>
</iq>

Client submits form

<iq type="result" from="pubsub.example.com" to="ralphm@ik.nu/pubsub.com_client" id="search2">
  <query xmlns="jabber:iq:search">
    <x xmlns="jabber:x:data" type="result">
      <field type="hidden" var="FORM_TYPE">
        <value>jabber:iq:search</value>
      </field>
      <field var="host">
        <value>pubsub.example.com</value>
      </field>
      <field var="node">
        <value>e09d3ead0285a6d20d211916a783b4a9</value>
      </field>
    </x>
  </iq>
</iq>

Service returns result

The user should be allowed to request unsubscription and view the current subscriptions via the protocols specified in JEP 0060. The node configuration form can be used to let the user check back on the specifics of that node, like the original query.

The above is for the dynamic subscriptions. Pubsub.com still has a lot of topic based nodes, which I think can still be retained. The content providers publish the news to their node, and the pubsub.com backend then matches those to the user-defined queries, and republishes the data to the respective dynamic nodes described above. Of course this can be an internal action, not actually involving Jabber protocol, the user just receives the notification of the nodes they subscribed to.

Bob also shortly mentions the situation where a user might receive items more than once, because they match multiple queries. I'm not sure if you would have to solve that server side. Sure it generates a bit more traffic, but is immediately clear which query yielded a particular result. A client application could always filter the data on the unique identifier that is embedded in the payload, and leave out duplicates.

Secondly, I like the concept of using Atom for holding the news entries. Basically, you get mini feeds with just one entry. I would want to propose to use the entry element as the root, in stead of feed. This is legal XML, and shouldn't really pose a problem. I would want to keep the ps:source-feed idea, but only have it contain the title and link elements (but using the Atom namespace). The author element is not needed, as it can be put in the entry payload, if not already present. The modified element holds no real value for the end user, as we are not really dealing with files anymore.

Maybe it is desirable to be informed of changes in the feed's meta data. For normal topic based subscriptions, I'd create one specifically named node item (e.g. feed-info) that holds the data normally found in the header of an Atom file. It is only updated as something in that data changes (not sure about the modified element there). For content based subscriptions, I would want to propose to have a link element there that points to the topic node that contains all entries, using the XMPP URI Format. That allows the client to request the feed's meta data via Jabber, instead of having to refer to the original feed via HTTP. If it is also desirable to keep informed of changes to the meta data of feeds covered by a particular query, there could also be a link element pointing to a sister node of the feed's node that only gets the feed's meta data published.

Wednesday, 9 June 2004

Planet Jabber

All blogs unite!

For a while I had been having the idea of setting up a Planet for Jabber. Originally done for the GNOME project as Planet GNOME, a Planet is the aggregation of a set of weblogs of people in a particular community.

Planets are a great way to keep track of things that are going on in a community, all in one place. People reading weblogs on a regular basis have a large number of blogs on their blogroll, but a Planet also allows you to see entries of people who's blog you don't normally read, or just didn't discover yet.

As I said, I've been having this idea for the Jabber developers community for a while now, and the start of Planet IM, where this blog is also aggregated (thanks Christian!), I just set up Planet Jabber using Planet, a feed aggregator written in Python.

Why a Planet for just Jabber? Although I think Planet IM is very good idea, Jabber is much more than just IM. There'll be some overlap in the content, but that's ok. I hope many people will start blogging (more), so we read about all the cool things built using Jabber. If you want to appear on Planet Jabber, let me know.

Wednesday, 2 June 2004

Idavoll rewrite

Twistifying matters...

Idavoll needs some loving. Apart from minor updates, it hasn't really moved forward. My luck is that pubsub has not really gotten any (visible) traction apart from being advanced to Draft status last October.

Recently, however, I got kicked by temas and pgmillard, to fix some small bugs in Idavoll, which they are hooking up to the jabber.org server. They are playing with it a bit now, with some serious plans for the near future.

In the mean while, I have been wanting to redo the implementation of Idavoll in Twisted, an event-driven networking framework written in Python. I've been talking to offline offline dizzyd, who wrote most of the Jabber protocol code in Twisted, for some guidance to go about this. Looking at Proxy65 and the nice step-by-step tutorial for implementing a full-featured, scalable and extensible finger service, Twisted from Scratch, or the Evolution of Finger, I have finally started on the rewrite of Idavoll using Twisted.

The idea is to make the service allow for different, pluggable backends and also, in the philosophy of Twisted, make it possible to hook up other protocols for interfacing with this backend. This, hopefully, means, that it should become rather easy to hook up a web-based interface to help administer the component. Let's see how that evolves.

One of the bugs I needed to fix in Idavoll was to have the Jabber component return error stanzas to signal that unknown queries are not implemented. I did a quick fix on the current implementation, and then started to wonder how to do that using the Twisted framework. This proved to be impossible to do nicely. In Twisted, you can hook up handlers to XPATH like queries that match incoming XML elements. However, there is no ordering in these handlers, so although you can make a catch-all handler, and let other handlers signal whether they did or did not handle a certain stanze, the catch-all can easily be called before the more specific handler.

For now, I hacked xish, which does the XPATH matching and calling of the observers, to have XPathQuery objects contain a priority attribute, and implemented the __cmd__() and __hash__() methods to be able to sort the objects. After modifying the methods for adding observers, you can now give a priority to each XPATH query, much like template matching in XSLT. This works nicely. I just give the catch-all observer a priority of -1, with 0 being the default. This way, existing code is unaffected.

Now, on to the rest of the implementation. If only to prove boyd wrong.

Birthday

Again...

Today, one day later than most years, it is my birthday again. For the 28th time in my life. Most normal people start to count at 1, though, so let me say it like this: I am now 27 years of age. Time sure flies.