xml basics

XML (Extensible Markup Language) has two main uses:

A tag is a delimiter that marks the area of the document that contains the tagged content. XML is a standard describing the syntax of marked up documents and data exchange. XML is a simplified version of SGML (Structure General Markup Language). Some relevant web links are:

document xml example

An example of an XML document is:

examples/letter.xml

<?xml version="1.0" encoding="UTF-8"?>
<letter>
    <to>Complaint Department</to>
    <from>me</from>
    <subject>Service Complaint</subject>
    <body>
        I would like to complain.
    </body>
</letter>

data xml example

We can also use XML to create self describing external representation of data structures. For example:

class Rectangle {
    int x, y;
    int width, height;
}

Rectangle r = new Rectangle(3,4,10,20) can be serialized as:

examples/rect.xml

<?xml version="1.0" encoding="UTF-8"?>
<shapes>
    <!-- with child nodes -->
    <rect>
        <x> 3 </x>
        <y> 4 </y>
        <width> 10 </width>
        <height> 20 </height>
    </rect>
    <!-- with attribute nodes -->
    <rect x="3" y="4" width="10" height="20"/>
</shapes>

data xml example

An XML document consists of tags denoted by <, text giving the name, and >. A opening tag is give by:

<tagname>

A closing tag is denoted by:

</tagname>

Each opening tag must have a matching closing tag. The tags must nest inside each other. Thus

<a> <b> </b> </a>

is correct, while

<a> <b> </a> </b>

is invalid.

Everything between a start tag and the matching end tag is a called an element. Each content item is called a child node of the element.

tagname and attributes

Every tag name and attribute name must conform to

NameChar ::= Letter | Digit | '.' | '-' | '_' | ':'
Name ::= (Letter | '_' | ':') (NameChar)*

Each attribute consists of a name/value pair. Attributes are placed inside the start tag and follow the form:

Attribute ::= Name '=' Value

The value is enclosed by single or double quotes. The quotes must match. An attribute must have a name and value. Attributes are also considered child nodes of an element. Some examples are:

<table border="1" class="big" >

more xml details

The < or > characters are included in a document by

&gt; gives >
&lt; gives <
&amp; gives &

An arbitary character entity can be encoded with &amp;#ddddd; where d is the sequence of decimal digits that specify the unicode character. For example:

&#10; specifies a linefeed
&#32; specifies a space
&#48; specifies the '0' character
&#160; specifies a non-breaking space

Comments are denoted by:

<!-- a comment -- >

Arbitrary text is placed between

<![CDATA[ ... ]]>

Tags that contain only whitespace content is said to be empty.

<tag> </tag>

Empty tags can be denote by

<tag/>

XML is a tree

A tree starts from the truck and then branches to the rest of the tree. An XML document has one outermost element, and further divides the document by nested elements. A XML document follows a tree structure.

examples/tree.xml

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <a>
        <b> 1 </b>
        <b> 2 </b>
    </a>
    <c>
        <d>
            <e> 3 </e>
            <e> 4 </e>
        </d>
        <f> 5 </f>
    </c>
    hi 
</root>

White space text is ignored in this example.

declarations and processing instructions

Any XML files should begin with the following declaration

<?xml version="1.0" encoding="iso-8859-1" ?>

The declaration is used by the XML parser to determine the type of XML to process. The encoding attribute specifies the character encoding of the document. The encoding attribute, iso-8859-1, is an 8 bit code usually referred to as Latin-1. The character encoding is an extension to ASCII, a 7-bit code.

The XML standard specifies a method to auto-detect the character encoding.

XML provides a method to pass information to the programs handling the XML document. A processing element is given by

<?name attributes ?>

where name is the name of the program handling the document, and the list of attributes specify the actions to be taken by the program.

namespaces

The 1.1 revision of XML added namespaces. Namespace allow documents from different sources that contain identical tag names but serve different roles to be distinguished when the documents are merged. Every element in an XML document has an associated URI (Uniform Resource Identifier). The most common URIs are URLs (e.g., http://somewhere/something).

Since URIs can be very long, XML provides a shorthand with tag prefixes.

examples/ns.xml

<?xml version="1.0" encoding="UTF-8"?>
<root
    xmlns:rod="http://www.cs.mun.ca/~rod"
    xmlns="http://www.what.com/default">
    <rod:letter> hi there </rod:letter>
    <!-- rod:letter is a qualified name -->
    <letter>
        another letter with the same local name
    </letter>
    <!-- letter is in the "http://www.what.com/default"
          namespace -->
</root>

Namespace are similar to packages in Java.

namespaces

An XML document that follows a defined structure (the tags and where they can nest) is called an application. Example applications include:

xhtml
Extensible HyperText Markup Language
svg
Scalable Vector Graphics
gpx
GPS Exchange Format
xslt
Extensible Stylesheet Language Transformations
soap
Simple Object Access Protocol

XML is becoming the standard technique for the creation and exchange of information. There are many software tools available for the processing of XML documents.

document xml example

An example of an GPX document is:

examples/gpx.xml

<?xml version="1.0"?>
<gpx
 version="1.0"
 xmlns="http://www.topografix.com/GPX/1/0">
    <trk>
        <name>ACTIVE LOG</name>
        <trkseg>
	    <trkpt lat="47.581916" lon="-52.710171">
		<ele>81.368774</ele>
		<time>2004-08-30T11:44:55Z</time>
	    </trkpt>
	    <trkpt lat="47.581916" lon="-52.710214">
		<ele>81.368774</ele>
		<time>2004-08-30T11:44:57Z</time>
	    </trkpt>
	</trkseg>
    </trk>
</gpx>

document xml example

An example of an SVG document is:

examples/rect.svg

<?xml version="1.0" standalone="no"?>
<svg width="12cm" height="4cm" viewBox="0 0 1200 400"
     xmlns="http://www.w3.org/2000/svg" version="1.1">
  <desc>Example rect01 - rectangle with sharp corners</desc>

  <!-- Show outline of canvas using 'rect' element -->
  <rect x="1" y="1" width="1198" height="398"
        fill="none" stroke="blue" stroke-width="2"/>

  <rect x="400" y="100" width="400" height="200"
        fill="yellow" stroke="navy" stroke-width="10"  />
</svg>

lexically entities

The characters is an XML document can be grouped into the following lexically entities: