Main Navigation

Content

Perl for XML Processing

Learn Perl Now!
And get a job doing Perl.

This page will cover how to properly process XML using Perl and various recommended modules from CPAN (= Comprehensive Perl Archive Network).

Table of Contents

Technologies of Interest

XML-LibXML

XML-LibXML is the de-facto standard for XML processing in Perl. It's a comprehensive CPAN module based on the libxml2 library, that provides DOM (Document Object Module), SAX (a stream parser), a pull parser, XPath, and XSLT support. XML-LibXML has good reference documentation and is actively maintained. The Perl XML::LibXML by Example site provides a tutorial suitable for beginners.

One note is that you should be aware of XML namespaces and how they interact with the DOM and the XML-LibXML API before using this library.

XPath

XPath is an XML-related technology (but not notated in XML) that allows one to locate nodes in XML files using a compact syntax. One can use it using XML::LibXML, and should avoid using the old, slow, and largely unmaintained XML::XPath CPAN distribution.

Another useful module is HTML-Selector-XPath which allows one to convert CSS-style selectors to XPath and provides functionality similar to that offered by such JavaScript libraries such as jQuery. So you can, for example, write selector_to_xpath('ul.myclass a') to find all a elements inside a ul element with a CSS class of myclass.

To learn about XPath, consult the following resources:

Custom XPath Functions

XML::LibXML allows the programmer to register custom XPath functions, coded in Perl, in order to help working with XPath. For more information, see XML::LibXML::XPathContext .

XSLT

XSLT stands for Extensible Stylesheet Language Transformations and is a language for transforming XML documents into other XML documents or other formats such as HTML or plain text. Perl has good support for version 1.0 of XSLT by using the XML-LibXSLT distribution.

(Please avoid using XML-XSLT which is old, and largely unmaintained. Use XML-LibXSLT instead.)

For more about XSLT, see the following links, but note that XSLT makes extensive use of XPath, so you should learn it first.

Web Pages about Perl and XML

The Perl XML Project Home Page

Their Frequently Asked Questions List (FAQ)

What to Avoid

XML-Simple

XML-Simple is not so simple when done properly and takes the wrong approach to dealing with XML. Please avoid using it. Look at XML-LibXML for an easy and fast alternative.

Parsing XML Using Regular Expressions

You should also avoid parsing XML using regular expressions, because it is difficult to handle the non-regular grammar of XML using them. Instead, use a parser. For more information see:

  1. “Parsing HTML the Cthulhu Way”.

  2. Comment on Stack Overflow (funny).

Modules for Dealing with Specific Grammars

In addition to generic XML parsers and manipulators, there are many specialised modules on the CPAN for dealing with specific XML grammars. Many of them reside under XML:: namespace. Some prominent examples include:

Share/Bookmark

Footer