Приглашаем посетить
Добычин (dobychin.lit-info.ru)

Overview of XML

Table of Contents
Previous Next

Overview of XML

Like an HTML document, an XML document has tags and data. Unlike HTML, XML tags can be named almost anything. For example, <B>, <Bb>, and <4f5gt6g> are all valid (start) XML tags, but only <B> in the preceding list is valid HTML. Like an HTML document an XML document can have data between the start and end tags, for example, <B>text</B> and <Bb>some text</Bb>. In XML the combined start tag, data, and end tag are referred to as an element.

This figure shows the different parts of an XML element:

Click To expand

An element, consisting of one start and one closing tag, multiple optional attributes, optional character data content, and sub-elements (child nodes) is considered a node. In an element there are start and end tags, for example, <first></first> or <last></last>. The name of the tag must be unique and is case-sensitive. The element can be a container for other elements or it can contain character data. An attribute is part of an element, for example, <first id="4"> where id="4" is the attribute and first is the name of the element. An attribute is similar to an array in that both have a key-value pair.

The XML tags in a document must have two characteristics. They must be:

Now let's look at what all these concepts look like in XML files:

The following XML is not well-formed and not valid:

    <root>
      <title>
        <name>some text</title>
      <name>

The following XML is well-formed but it is not valid:

    <root>
      <title>
        <name>some text</name>
      </title>
    </root>

The following XML is well-formed and valid:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE root [
    <!ELEMENT name (#PCDATA)>
    <!ELEMENT root (title)>
    <!ELEMENT title (name)>
    ]>
    <root>
      <title>
        <name>some text</name>
      </title>
    </root>

The XML Framework

XML forms the core of the family of the XML specifications:

Click To expand
  • XML is the foundation

  • Namespaces are used to extend XML
    It is not possible to have two elements with the same name with in a document. XML namespaces were created to overcome this problem. An XML file can reference one or more namespaces. A namespace is a collection of defined XML tags. Namespaces become important in complex XML documents where we need to use established external definitions for the tags in the document.

  • SAX, DOM, and PRAX are APIs that allow access to the XML document.

  • DTD and Schema are used to provide a definition for a specific XML document
    A DTD or a schema is used to describe the elements within an XML file. A DTD describes what different elements can be present in an XML document and what those elements have to look like. DTDs are not written in XML and for that reason have been deprecated by the W3C, so you should use a schema instead of a DTD if you are concerned about valid XML. A schema is similar to a DTD in that it also describes an XML document. Schemas are written in XML and they can provide more information than a DTD would provide. Schemas are also more verbose than a DTD.

  • XSL and XSLT are used to transform XML
    XML documents can be manipulated using eXtensible Stylesheet Language (XSL) and eXtensible Stylesheet Transformations (XSLT). XSL has two roles; it is used to describe the formatting of an XML document, and it is used to transform an XML document. XSLT is a language used to transform XML documents into different formats or structures.

  • XPath, XPointer, and XLink are used to provide data access capability
    XPath is a recommendation for locating nodes in an XML document tree. XPath is not used standalone, but in conjunction with other tools such as XSL, which rely on XPath intensively. We discuss XPath briefly in the DOM section of this chapter where XPath is used to find specific parts of an XML document. XPointer and XLink extend XPath and are not used or discussed in this chapter.

XML vs. Databases

A big misconception about XML is that it is a replacement for a database. It's not. The value for using XML doesn't lie in using it as another way to store and retrieve data, but for its translation and mark up capabilities, not to mention the transfer of data across web sites using SOAP.

Data stored in an XML document is fundamentally different to the data stored in a database. An XML document is a discrete text file, like sample.xml. It has one basic view, and we have to use XSL to transform it or combine XSL and XPath to extract data.

Data in a database is stored in tables. In most cases the value of a database is the ability to present multiple dynamic views of the same data sets.

Important 

One way to think about XML is that it's like a recordset or the result of a query of the database, where the different tags correspond to the names of the table columns and values in the table correspond to the data in the XML between the tags.

In comparison:

XML

Database

XML is well-suited to describing both simple and complex data formats. It is especially well- suited to describing data that uses dynamic/complex/nested structures – such as docbook.

Databases are well-suited to storing and retrieving "linear" data structures that can be represented in a table-type format.

Parsing/using XML data is resource intensive. The simplicity/flexibility of the format reduces performance.

Databases are much faster at writing and retrieving data. The structured nature of the data improves performance at the loss of flexibility.

XML is very easy to transport.

Databases are more difficult to move.

XML can be written and read by humans – although an XML editor is very handy.

Few humans can manually read and write database files.

XML and a database can be used together to get the best of both worlds. XML can be stored in the database as a BLOB (Binary Large OBject) or a CLOB (Character Large OBject) column type, or as text. By doing this, we get the performance of a database and the flexibility of XML.

If the software you are developing needs a very high degree of flexibility and customisability in how it stores its data, you should consider a pure XML database or a traditional RDBMS which can be extended with XML inside the database itself, for example Oracle.


Table of Contents
Previous Next