We are using and developing ejabberd to integrate into out web-infrastructure and to serve the “instant messaging” part of Wire. So far, this has been quite a move away from OpenFire, that was only serving Jabber-Chat to Wire, nothing more, nothing less.
Over the time we came to the conclusion, that it would be a lot better if ejabberd did not have it’s own userdatabase and authentication, but we would authenticate against our own webservice, that, in turn, asks our userdatabase. Here is the first “blocker”:
While ejabberd enables it’s admin to implement whatever kind of authentication you might need, by using an external script, the authentication to ejabberd has to be done using the PLAIN-Method. This gave us the need to update our client to support this authentication-method and to deploy this to all the users. Now, though, we authenticate against an external script written in Python and do not experience neither slowdown in login-time, nor any kind of unreliability with this. Good!
After this, we thought, it would be nice if ejabberd did not have it’s own database for rosters, which kept us updating two databases and notifying two server on any change (be it from the website or from the Wire-Client). So the idea was to ask another webservice of our own to hand us the roster (in JSON) in our own mod_roster_wire and then load that into ejabberd. The absolutely worst part of this was the documentation that is freely available and how data should be returned by hooks that are registered with ejabberd. This had us “reverse engineering” for quite some time and constantly checking our code (and escpecially the return values) against (i.e.) mod_roster_odbc.erl and mod_roster.erl, itself. Very tedious. But now it’s running smoothly and we got rid of the burden of having two databases for the same set of data. Yay!
(Expect a blog-post titled “How to write your own mod_roster”, once i find a few more minutes and have gained some more confidence in my erlang-skills).
The next thought we had, was that we could use ejabberd’s pubsub-architecture to do some event-related stuff in Wire (like people joining a gather, for example). This proved to be a generally good idea, and inspired by this blog-post by the creators of ejabberd we happily connected to ejabberd everytime something changed in gather-data to push an event to Wire. This proved to us a few things:
- we have more data than we thought
- PHP sucks for sockets
- ejabberd does not like “short” connections
- sessions seem “expensive”
1.) The pubsub-tree that we had for the gather-feature in Wire (which was unrelatedly removed from the latest beta, but will be back in one of the next) easily consisted of a few thousand nodes, one for each gather, happily living under subnodes (one for each country, than below that one for each game, then below that the gathers). We rapidly ran into memory problems, that later got fixed by a patch to ejabberd and now (memory usage) seems to be very ok.
2.) Sockets in PHP seem to be very flakey and unreliable…or I am too stupid to use them correctly. Opening a socket to the Jabber-Server every “once in a while” (that is ‘a few times per second’) rapidly proved to be a bad idea. Since most of our backend is in PHP, we are still looking for a good way to make “cheap” connections to the Jabber-Server, to notify of events in “live mode”. Currently I am favoring using the “curl” extension of PHP to talk to either ejabberd’s own BOSH implementation, or (preferrably) a BOSH-Proxy (like punjab). I will let you know wether this scales better…since:
3.) It showed very fast that connecting and authentication to ejabberd seems to be quite expensive. Not only does it have a good overhead on socket-communication, but it also seems to put quite some load on ejabberd. Since we only need to publish “one event at a time” we always basically did this:
- connect, authenticate
- publish event to pubsub-node
- disconnect
- goto 10
This kept the publishing of events “live” but made ejabberd grind to a halt on peak times, literally sucking ALL of the memory AND the swap-space on the machine it ran on. Turning off that massive “pubsub feature” on side of the application-server sent ud back to normal. So i guess:
4.) Reducing logins on the Jabber-Server and “reusing” existing connections can give you a lot of advantages and save you from a non-responsive Jabber-Server. Combine this with clustering, then use one server for “user requests” and another one for you “application server uses”, and you will get a lot more uptime and better responsiveness of the server. We are currently trying to downscale out connections to the Jabber-Server by (ab)using features of BOSH and punjab. The main idea is, to have punjab keep _one_ connection to the Jabber-Server open, then (in PHP) we share the session-id over all our requests and simply publish (by HTTP-POST using “curl”) our event down that pipe to the Jabber-Server. This saves us the overhead of authentication everytime we only need to publish one item, it should be a lot more “lag-resistant”, we don’t need to use sockets in PHP but can rely on “curl” and ejabberd does not have to hand out “a dozen” sessions per second…thus keeping it’s memory footprint low. I am anxious to see how/wether this works out.
So after we learned all this by hard and made a few modifications to ejabberd (and a few more that were only to make things more “like we would expect it to be” (as you can see here), we are now on the virge of having an event-based-messaging-and-application-server that integrates with our existing infrastructure and serves our needs. It took some tweaking here and there and some “learning by burning”, and it still needs more work (and more “burning” i guess) but we are now quite confident that it was a good choice to move from OpenFire to ejabberd. And I learned a new programming-language on the way…so who am I to complain?
(Now if only the ICQ/AIM/MSN-Transports were better…)
Author: steam
whats with an ‘relay’ -> Website connects to an little c, cpp or java tool that keeps the connection alive.
Some Modifications on the ejabberd server software -> an static connection to the ‘relay’.
Try it XD
That is basically what punjab does, it’s a “small” python-script that relays stuff it gets on HTTP to the appropriate jabber-server. With “hijacking” of sessions and some request-id forgery we were abled to push all our messages down one “pipe”…this should reduce load a lot.