S-XML is a simple XML parser implemented in Common Lisp. Originally it was written by Sven Van Caekenberghe. It is now being maintained by Sven Van Caekenberghe, Rudi Schlatte and Brian Mastenbrook. S-XML is used by S-XML-RPC and CL-PREVALENCE.
This XML parser implementation has the following features:
This XML parser implementation has the following limitations:
You can download the LLGPL source code and documentation as s-xml.tgz (signature: s-xml.tgz.asc for which the public key can be found in the common-lisp.net keyring) (build and/or install with ASDF).
You can view the CVS Repository or get anonymous CVS access as follows:
$ cvs -d:pserver:anonymous@common-lisp.net:/project/s-xml/cvsroot login (Logging in to anonymous@common-lisp.net) CVS password: anonymous $ cvs -d:pserver:anonymous@common-lisp.net:/project/s-xml/cvsroot co s-xml
The plain API exported by the package S-XML (automatically generated by LispDoc) is available in S-XML.html.
Using a DOM parser is easier, but usually less efficient: see the next sections. To use the event-based API of the parser, you call the function start-parse-xml on a stream, specifying 3 hook functions:
As an example, consider the following tracer that shows how the different hooks are called:
(defun trace-xml-new-element-hook (name attributes seed) (let ((new-seed (cons (1+ (car seed)) (1+ (cdr seed))))) (trace-xml-log (car seed) "(new-element :name ~s :attributes ~:[()~;~:*~s~] :seed ~s) => ~s" name attributes seed new-seed) new-seed)) (defun trace-xml-finish-element-hook (name attributes parent-seed seed) (let ((new-seed (cons (1- (car seed)) (1+ (cdr seed))))) (trace-xml-log (car parent-seed) "(finish-element :name ~s :attributes ~:[()~;~:*~s~] :parent-seed ~s :seed ~s) => ~s" name attributes parent-seed seed new-seed) new-seed)) (defun trace-xml-text-hook (string seed) (let ((new-seed (cons (car seed) (1+ (cdr seed))))) (trace-xml-log (car seed) "(text :string ~s :seed ~s) => ~s" string seed new-seed) new-seed)) (defun trace-xml (in) "Parse and trace a toplevel XML element from stream in" (start-parse-xml in (make-instance 'xml-parser-state :seed (cons 0 0) ;; seed car is xml element nesting level ;; seed cdr is ever increasing from element to element :new-element-hook #'trace-xml-new-element-hook :finish-element-hook #'trace-xml-finish-element-hook :text-hook #'trace-xml-text-hook)))
This is the output of the tracer on two small XML documents, the seed is a CONS that keeps track of the nesting level in its CAR and of its flow through the hooks with an ever increasing number is its CDR:
S-XML 31 > (with-input-from-string (in "<FOO X='10' Y='20'><P>Text</P><BAR/><H1><H2></H2></H1></FOO>") (trace-xml in)) (new-element :name :FOO :attributes ((:Y . "20") (:X . "10")) :seed (0 . 0)) => (1 . 1) (new-element :name :P :attributes () :seed (1 . 1)) => (2 . 2) (text :string "Text" :seed (2 . 2)) => (2 . 3) (finish-element :name :P :attributes () :parent-seed (1 . 1) :seed (2 . 3)) => (1 . 4) (new-element :name :BAR :attributes () :seed (1 . 4)) => (2 . 5) (finish-element :name :BAR :attributes () :parent-seed (1 . 4) :seed (2 . 5)) => (1 . 6) (new-element :name :H1 :attributes () :seed (1 . 6)) => (2 . 7) (new-element :name :H2 :attributes () :seed (2 . 7)) => (3 . 8) (finish-element :name :H2 :attributes () :parent-seed (2 . 7) :seed (3 . 8)) => (2 . 9) (finish-element :name :H1 :attributes () :parent-seed (1 . 6) :seed (2 . 9)) => (1 . 10) (finish-element :name :FOO :attributes ((:Y . "20") (:X . "10")) :parent-seed (0 . 0) :seed (1 . 10)) => (0 . 11) (0 . 11) S-XML 32 > (with-input-from-string (in "<FOO><UL><LI>1</LI><LI>2</LI><LI>3</LI></UL></FOO>") (trace-xml in)) (new-element :name :FOO :attributes () :seed (0 . 0)) => (1 . 1) (new-element :name :UL :attributes () :seed (1 . 1)) => (2 . 2) (new-element :name :LI :attributes () :seed (2 . 2)) => (3 . 3) (text :string "1" :seed (3 . 3)) => (3 . 4) (finish-element :name :LI :attributes () :parent-seed (2 . 2) :seed (3 . 4)) => (2 . 5) (new-element :name :LI :attributes () :seed (2 . 5)) => (3 . 6) (text :string "2" :seed (3 . 6)) => (3 . 7) (finish-element :name :LI :attributes () :parent-seed (2 . 5) :seed (3 . 7)) => (2 . 8) (new-element :name :LI :attributes () :seed (2 . 8)) => (3 . 9) (text :string "3" :seed (3 . 9)) => (3 . 10) (finish-element :name :LI :attributes () :parent-seed (2 . 8) :seed (3 . 10)) => (2 . 11) (finish-element :name :UL :attributes () :parent-seed (1 . 1) :seed (2 . 11)) => (1 . 12) (finish-element :name :FOO :attributes () :parent-seed (0 . 0) :seed (1 . 12)) => (0 . 13) (0 . 13)
The following example counts tags, attributes and characters:
(defclass count-xml-seed () ((elements :initform 0) (attributes :initform 0) (characters :initform 0))) (defun count-xml-new-element-hook (name attributes seed) (declare (ignore name)) (incf (slot-value seed 'elements)) (incf (slot-value seed 'attributes) (length attributes)) seed) (defun count-xml-text-hook (string seed) (incf (slot-value seed 'characters) (length string)) seed) (defun count-xml (in) "Parse a toplevel XML element from stream in, counting elements, attributes and characters" (start-parse-xml in (make-instance 'xml-parser-state :seed (make-instance 'count-xml-seed) :new-element-hook #'count-xml-new-element-hook :text-hook #'count-xml-text-hook))) (defun count-xml-file (pathname) "Parse XMl from the file at pathname, counting elements, attributes and characters" (with-open-file (in pathname) (let ((result (count-xml in))) (with-slots (elements attributes characters) result (format t "~a contains ~d XML elements, ~d attributes and ~d characters.~%" pathname elements attributes characters)))))
This example removes XML markup:
(defun remove-xml-markup (in) (let* ((state (make-instance 'xml-parser-state :text-hook #'(lambda (string seed) (cons string seed)))) (result (start-parse-xml in state))) (apply #'concatenate 'string (nreverse result))))
The next example is from the xml-element struct DOM implementation, where the SSAX parser hook functions are building the actual DOM:
(defun standard-new-element-hook (name attributes seed) (declare (ignore name attributes seed)) '()) (defun standard-finish-element-hook (name attributes parent-seed seed) (let ((xml-element (make-xml-element :name name :attributes attributes :children (nreverse seed)))) (cons xml-element parent-seed))) (defun standard-text-hook (string seed) (cons string seed)) (defmethod parse-xml-dom (stream (output-type (eql :xml-struct))) (car (start-parse-xml stream (make-instance 'xml-parser-state :new-element-hook #'standard-new-element-hook :finish-element-hook #'standard-finish-element-hook :text-hook #'standard-text-hook))))
The parse state can be used to specify the initial seed value (nil by default), and the set of known entities (the 5 standard entities (lt, gt, amp, qout, apos) and nbps by default).
Using a DOM parser is easier, but usually less efficient. Currently three different DOM's are supported:
There is a generic API that is identical for each type of DOM, with an extra parameter input-type or output-type used to specify the type of DOM. The default DOM type is :lxml. Here are some examples:
? (in-package :s-xml) #<Package "S-XML"> ? (setf xml-string "<foo id='top'><bar>text</bar></foo>") "<foo id='top'><bar>text</bar></foo>" ? (parse-xml-string xml-string) ((:|foo| :|id| "top") (:|bar| "text")) ? (parse-xml-string xml-string :output-type :sxml) (:|foo| (:@ (:|id| "top")) (:|bar| "text")) ? (parse-xml-string xml-string :output-type :xml-struct) #S(XML-ELEMENT :NAME :|foo| :ATTRIBUTES ((:|id| . "top")) :CHILDREN (#S(XML-ELEMENT :NAME :|bar| :ATTRIBUTES NIL :CHILDREN ("text")))) ? (print-xml * :pretty t :input-type :xml-struct) <foo id="top"> <bar>text</bar> </foo> NIL ? (print-xml '(p "Interesting stuff at " ((a href "http://slashdot.org") "SlashDot"))) <P>Interesting stuff at <A HREF="http://slashdot.org">SlashDot</A></P> NIL
Tag and attribute names are converted to keywords. Note that XML is case-sensitive, hence the fact that Common Lisp has to resort to the special literal symbol syntax.
2006-01-19 Sven Van Caekenberghe* added a set of patches contributed by David Tolpin dvd@davidashen.net : we're now using char of type Character and #\Null instead of null, read/unread instead of peek/read and some more declarations for more efficiency - added hooks for customizing parsing attribute names and values 2005-11-20 Sven Van Caekenberghe * added xml prefix namespace as per REC-xml-names-19990114 (by Rudi Schlatte) 2005-11-06 Sven Van Caekenberghe * removed Debian packaging directory (on Luca's request) * added CDATA support (patch contributed by Peter Van Eynde pvaneynd@mailworks.org) 2005-08-30 Sven Van Caekenberghe * added Debian packaging directory (contributed by Luca Capello luca@pca.it) * added experimental XML namespace support 2005-02-03 Sven Van Caekenberghe <svc@mac.com> * release 5 (cvs tag RELEASE_5) * added :start and :end keywords to print-string-xml * fixed a bug: in a tag containing whitespace, like <foo> </foo> the parser collapsed and ingnored all whitespace and considered the tag to be empty! this is now fixed and a unit test has been added * cleaned up xml character escaping a bit: single quotes and all normal whitespace (newline, return and tab) is preserved a unit test for this has been added * IE doesn't understand the ' XML entity, so I've commented that out for now. Also, using actual newlines for newlines is probably better than using #xA, which won't get any end of line conversion by the server or user agent. June 2004 Sven Van Caekenberghe <svc@mac.com> * release 4 * project moved to common-lisp.net, renamed to s-xml, * added examples counter, tracer and remove-markup, improved documentation 13 Jan 2004 Sven Van Caekenberghe <svc@mac.com> * release 3 * added ASDF systems * optimized print-string-xml 10 Jun 2003 Sven Van Caekenberghe <svc@mac.com> * release 2 * added echo-xml function: we are no longer taking the car when the last seed is returned from start-parse-xml 25 May 2003 Sven Van Caekenberghe <svc@mac.com> * release 1 * first public release of working code * tested on OpenMCL * rewritten to be event-based, to improve efficiency and to optionally use different DOM representations * more documentation end of 2002 Sven Van Caekenberghe <svc@mac.com> * release 0 * as part of an XML-RPC implementation
CVS version $Id: index.html,v 1.12 2006/01/31 11:56:06 scaekenberghe Exp $