I Love Cocoon!

A survey of extensible server strategies
S V Ramu (2002-05-10)

A 'home grown' strategy for designing a website

Around two years back, for the first time, I had an opportunity to design and implement a dynamic website, with a team. Though it took few weeks to digest the HTTP's stateless model and the associated browser quirks, soon we could apply all the standard application tricks to HTML page designing. For an OOP addict, the ASP, or the plain JSP model of mixing script and html was as harsh as metal scratching metal. It was immediately certain that if at all we use JSP, custom tag is the way to go. By which, we separate the tag structures from the java coding, through the elegant taglib model.

But still, this total dependence of the application to the web infrastructure was still reeking with bad smell. After all, a web interface is only one face of an application. Also, we were not ready for full fledged application server based design, due to its seeming heaviness. So JSP is out and Servlet is in. At least, JSP as a way of mixing HTML and script, is definitely out. The servlet has its own drawbacks. The idea of generating HTML tags with Java code was like squashing a bug with a sledge hammer. We needed the flexibility of JSP with the Independence of Servlet.

The newly learnt jargons of XML-XSL came to the rescue. With XSL you can generate HTML, just like in JSP. But unlike it you cannot mix in server-side script inside it. XSL is a way for generating pure presentation output. Any code based manipulation should be externally performed and fed in as XSL parameters. This was at once appealing and liberating. So, the model now was, to use servlet for request marshaling, and the HTML generated by a battery of XSL files, using Java's XSLT API. This meant that all our content has to be converted form the regular SQL resultsets to XML, through Java.

Soon we also decided that we just needed only one servlet, whose job was only to route the requests and the parameters to appropriate Java classes (we even made this dynamic, by loading the appropriate Java classes only in runtime, as configured in a property file). Initially we were guilty of such drastic simplification! Of reducing the whole web architecture dependence to just that one servlet. But it made sense, as we realized that servlet is just for that: to connect our server-side code to the client's browser.

The curiosity, the Struts and the Cocoon

Even till today, I'm happy that we hit upon such a model. But we were secretly aware that we were not alone in this drastic rethinking of the web architecture. I soon stumbled upon the Apache-Jakarta project called Coccon. Initially I was put off by its huge size and seemingly heavy learning curve. Also, from many people, we heard that, there is another project called Struts that played in the same arena with promise. I wondered why there are two projects from the same stable for solving the same problem? So the exploration began, and this essay is the result. I should warn you, that this article is in no way a tutorial of these OpenSource products. This is just my sort of First Impression Report, and a leisurely reflection upon the architecture of these two products with respect to our above model.

Put very simply: Both Struts and Cocoon, just like the 'Home Grown Model' relies very little on the servlets (in fact only one servlet usually)! This was very heart warming. Then, where they differed? what did they offer extra? Again, from the very very thin study I've done, it seems, that the Struts relied more on Java, and Cocoon relied almost completely on XML-XSL infrastructure. As Cocoon was close to my heart, I went into it to some depth, while just giving a quick glance at Struts. So this article will be more on Cocoon and its philosophy, rather than Struts. I'm in fact an advocate of completely XMLised world, with XSLT as the transforming node (see the next section for more explanations).

There was an interesting surprise in the Cocoon model, which made me think that we could have been bolder in simplifying the 'Home Grown Model'. From the conventional view point, a run-time XSL transformation is a heavy overload, when compared to the pure static HTML. True, but ASP/JSP/Servlet is not much different from the XSLT load. In fact there is not much to design in a static website which only has HTML pages and some pictures. So, once we decided that we need dynamic content, XSLT is not too different from JSP like models. With this mental background, where XSLT is barely breaking even with JSP, the idea of multiple XSL transformation, before sending it over HTTP, was like an unpardonable sin! But this is exactly what Cocoon does. Short of completely eliminating Java, it manages with just XML and XSL, with java only in the silent background.

'Isn't OOP dead?'

Cocoon made me remember my pet theory that, properly planned, the future programming could reduce to a suitable graph network, where the nodes are XSLT converters and the edges are the XML content transmission protocols.

...what are then the modern Application Architecture options available to us? If key module-to-module communication can be in neutral terms with ports and XML, then it doesn't matter if these models are in the same machine or across the world. Of course, performance is still a deciding factor, before going overboard and converting all our method calls as port calls. All the same, an application can now be imagined as a bunch of Service modules which communicate over neutral channels and format, and possibly with neutral semantics as well (i.e. the XML schema too might be a global standard, instead of a proprietary one). And all that remains of programming is to code these Service Nodes which just transform some input XML to some other output XML, not altogether a OO demand at all.

If you imagine all the service nodes as points, and their interconnections with other services as directed lines. Then what we have for an application is a network of points and lines. Now, if the semantics of these communication (i.e. if XML is the universal format, a particular schema of tag structure is the semantics) is an international standard, draw that line in red, and if it is proprietary then in black. If done, then our application network would be many points with red or black colored directed (arrowed) lines connecting them. We can say that as the number of red lines grow, the application is to that extant an extensible and maintainable product, since any new vendor can deliver a module with better performance and yet with complete integration assurances. This is really the promise of the Web Services paradigm, where a service is the software equivalent to the IC of electronics.
(Isn't OOP dead?)
In this light, seeing Cocoon made me envious of those admirable minds, who dared to go beyond the fear of too many runtime transformation becoming a bottleneck, to a dream of completely separating the concerns, to the point of reducing Java like coding to the absolute minimum. You must realize, that today, with the tremendous processor speed and the spacious RAMs, and above all an optimized monster servers, the speed is really not a concern. You can always throw in more hardware. The issue now is having a portable content, which is ultimately extendable and scalable. Cocoon realizes this fully, hence exploits and combines the simplicity of XML with the versatility of XSL. Its design advocates multi-level XSL transformation before sending out the response. All the same, Cocoon 2 claims to optimize fiercely to production quality, by using SAX parsers instead of the memory and CPU gobbling DOM.

A very rough overview of Cocoon

There are very many jargons to be learnt in Cocoon. As Cocoon itself admits, the concepts of SiteMap and XSP (eXtensible Server Pages), have a steep learning curve. But the heart of the whole framework is nothing short of a revolution (as its early founder, Stefano Mazzocchi -mad-zoki-, rightfully claims). After some time with its docs, I'm bit uncomfortable with its over simplified model of Actions, which ridiculously simplifies all the programming needs with the elegant Apache Jakarta Avalon Framework. If this is true, what it means is, that the whole site management can be done with XML-XSL alone, with java only for producing those starting XML 'seeds' (so to say).

Basically the model consists of the following concepts...
  1. Parse the URI and select the appropriate process:
    Match the request URI with RegEx, Wildcards etc. and branch it off to an appropriate Pipeline. This process selection phase can also be done with Selectors (which can use things like Browser types, parameters etc.), and with Actions which are just Java classes which take in a list of Name-Value pairs and churn out a modified list of those Name-Values.

  2. Setup a Pipeline of XML transformers:
    A pipeline is just the Generation of the initial XML for the given request, Transforming it in many stages, finally Serializing it into a response format. There is also a very nice capability of Aggregating the XML output of two or more Pipelines, and continuing with the Transformation.

  3. Generate the Initial XML:
    The key idea here is to use one of the Generators to create the initial XML. This generation of XML could just be a physical XML file, or a JIT created Directory structure (with Ant like selectability), or RDBMS, or maybe from other Template based content creators like Apache Jakarta Velocity, or from a legacy Java Script file, etc.

  4. Transform the XML:
    Once the initial XML is created, we can Transform it with our own XSL file, or with any one of the standard transformers coming with Cocoon, like for I18N, Logging or SQL etc. There could be as many number of transformations as you like.

  5. Finally Serialize the XML for output:
    Serialize just means to convert your XML output of transformation into a non-XML formats (usually) like PDF (using Apache FOP project), PNG/GIF/JPEG image file (Yes! through SVG - Scalable Vector Graphic XML markup language - using Apache Batik project), or into MS Excel or Word file format (using Apache Jakarta POI project), etc.

Of course, there are many other tricks for handling Errors, Views etc. for which you can use the decent documentation available with Cocoon downloads. Installing Cocoon is just copying the Cocoon.war file to the Tomcat webapps folder (but due to some mismatch of the XML parser versions, you have to follow few jar copying rules stipulated in the Cocoon installation pages).


Trying to understand the Cocoon project, I happened to stumble upon so many nice projects in Apache Jakarta, that are used by Cocoon. I do realize that what little I've explained here about Cocoon is pathetically cryptic. But, my idea is to start on this survey of distributed server/web strategies, and give you a taste of the motivation behind such notable efforts, in the eyes of a personal experience. To me, the realization that the efforts of my previous team is up-to-date enough, was heart warming. I hope this gives you as much confidence in innovating, as much as it gave us. Soon, I'll try to continue to explore Cocoon, Struts and others, in much more detail. Mainly I'd like to arrive at some unifying architectures, that we can discuss and standardize at TATTVUM, so as not to be put off by so many wonderful upcoming projects and models. So, please do comment.