(M)  s i s t e m a   o p e r a c i o n a l   m a g n u x   l i n u x ~/ · documentação · suporte · sobre

  Next Previous Contents

16. XML projects

Directly from the Apache XML project website, its goals are:

  • To provide commercial-quality standards-based XML solutions that are developed in an open and cooperative fashion.
  • To provide feedback to standards bodies (such as IETF and W3C) from an implementation perspective.
  • To be a focus for XML-related activities within Apache projects

The project homepage is located at http://xml.apache.org. It is an umbrella for a variety of subprojects.

16.1 Introduction to XML

This is a quick introduction to XML. To know more about XML, a good starting point is http://www.xml.com. XML is a markup language (think HTML) for describing structured content using tags and attributes. Once content is separated from presentation, you can choose how to display (cellphone, html, text) or exchange it. The XML standard only describes how the tags and attributes can be arranged, not its names of what they mean. Apache provides the tools described in the following sections.

16.2 Xerces

The Xerces project provides XML parsers for a variety of languages, including Java, C++ and Perl. The Perl bindings are based on the C++ sources. There are Tcl bindings for Xerces in the 2.0 version of TclXML, by Steve Ball. This 2.0 version is only available at the moment thru Ajuba CVS repository. A XML parser is a tool used for programatic access to XML documents. This is a description of the standards supported by Xerces:

  • DOM: DOM stands for Document Object Model. XML documents are hierarchical by nature (nested tags). XML documents can be accessed thru a tree like interface. The process is as follow:
    • Parse document
    • Build tree
    • add/delete/modify nodes
    • Serialize tree
  • SAX:Simple API for XML. This is a stream based API. This means that we will receive callbacks as elements are encountered. These callbacks can be used to construct a DOM tree for example.
  • XML Namespaces
  • XML Schema: The XML standard provides the syntax for writing documents. XML Schema provides the tools for defining the contents of the XML document (semantics). It allows to define that a certain element in the document must be an integer between 10 and 20, etc.
The Xerces XML project initial code base was donated by IBM. You can find more information in the Xerces Java, Xerces C and Xerces Perl homepages.

16.3 Xalan

Xalan is an XSLT processor available for Java and C++. XSL is a style sheet language for XML. The T is for Transformation. XML is good at storing structured data (information). We sometimes need to display this data to the user or apply some other transformation. Xalan takes the original XML document, reads transformation configuration (stylesheet) and outputs HTML, plain text or another XML document. You can learn more about Xalan at the Xalan Java and Xalan C project homepages.

16.4 FOP

From the website FOP is a Java application that reads a formatting object tree and then turns it into a PDF document. So FOP takes an XML document and outputs PDF, in a similar way that Xalan does with HTML or text. You can learn more about FOP here.

16.5 Cocoon

Cocoon leverages other Apache XML technologies like Xerces, Xalan and FOP to provide a comprehensive publishing framework. Cocoon is based around XML and XSL and targeted to sites of medium - high complexity. It separates content, logic and presentation as described in the website:

  • XML creation: the XML file is created by the content owners. They do not require specific knowledge on how the XML content is further processed rather than the particular chosen DTD/namespace. This layer is always performed by humans directly through normal text editors or XML-aware tools/editors.
  • XML process generators: the logic is separated from the content file.
  • XSL rendering: The created document is then rendered by applying an XSL stylesheet to it and formatting it to the specified resource type (HTML, PDF, XML, WML, XHTML)
You can learn more about Cocoon at the project homepage

16.6 Xang

The goal of the Xang project is make it easy for developers to build commercial quality XML aware applications for the Web. The application logic is defined in a hierarchical XML file which can be scripted via JavaScript. This file defines how to access the data (which can be other XML files, Java plug-ins, etc.). The Xang engine takes care of mapping HTTP requests to the appropriate handlers. You can learn more about Xang at the project homepage.

16.7 SOAP

Apache SOAP ("Simple Object Access Protocol") is an implementation of the SOAP submission to W3C. It is based on, and supersedes, the IBM SOAP4J implementation.

From the draft W3C specification: SOAP is a lightweight protocol for exchange of information in a decentralized, distributed environment. It is an XML based protocol that consists of three parts:

  • An envelope that defines a framework for describing what is in a message and how to process it,
  • a set of encoding rules for expressing instances of application-defined datatypes,
  • and a convention for representing remote procedure calls and responses.
Think of SOAP as an XML based remote procedure call or CORBA system. It is based on HTTP and XML. In one hand this means it is verbose and slow compared to other systems. On the other hand it eases interoperatibility, debugging and development of clients and servers for a variety of languages (C, Java, , Perl, Python, Tcl, etc.) since most modern languages have HTTP and XML modules. You can learn more at the Apache SOAP homepage

Related talk

  • W02: Rub-a-dub-dub-dubya: SOAP and the Web

16.8 Batik

Batik is a Java based toolkit for applications that want to use images in the Scalable Vector Graphics (SVG) format for various purposes, such as viewing, generation or manipulation.

It is XML centric and compliant with the W3C specification. It is a bit atypical from other Apache projects, in that it provides a graphical component. Batik provides hooks to extend the framework thru custom tags and it allows conversion from SVG to other formats like JPEG or PNG.

Batik homepage

Related talk

  • W14: Introduction to the Batik project.

16.9 Crimson

It is an alternative, Java-based, XML parser with support for XML 1.0 thru a variety of interfaces. It is the parser currently shipping in Sun products, and an intermediate step until the version 2 of Xerces is released.

Crimson homepage

Related talk

  • TH08: Java API for XML processing (JAXP) version 1.1

16.10 Other XML projects

There are other projects based on Apache and XML that do not live under the Apache XML umbrella

  • mod_xslt. It is a C based module for delivering XML/XSL based content. It has a GPL license.
  • AxKit is an XML based Application Server for mod_perl and Apache. It allows separation of content and presentation.

Related talk

  • TH04: AxKit - An XML Application server for Apache

Next Previous Contents