【XML】Learning XML

Posted by 西维蜀黍 on 2018-11-12, Last Modified on 2024-05-02

The Difference Between XML and HTML

XML and HTML were designed with different goals:

  • XML was designed to carry data - with focus on what data is
  • HTML was designed to display data - with focus on how data looks
  • XML tags are not predefined like HTML tags are

XML Syntax Rules

1.XML Documents Must Have a Root Element

XML documents must contain one root element that is the parent of all other elements:

<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root>

In this example is the root element:

<?xml version="1.0" encoding="UTF-8"?>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

2.The XML Prolog

This line is called the XML prolog:

<?xml version="1.0" encoding="UTF-8"?>

The XML prolog is optional. If it exists, it must come first in the document.

3.All XML Elements Must Have a Closing Tag

In HTML, some elements might work well, even with a missing closing tag:

<p>This is a paragraph.
<br>

In XML, it is illegal to omit the closing tag. All elements must have a closing tag:

<p>This is a paragraph.</p>
<br />

The XML prolog does not have a closing tag. This is not an error. The prolog is not a part of the XML document.

4.XML Tags are Case Sensitive

XML tags are case sensitive. The tag is different from the tag .

Opening and closing tags must be written with the same case:

<Message>This is incorrect</message>
<message>This is correct</message>

5.XML Elements Must be Properly Nested

In HTML, you might see improperly nested elements:

<b><i>This text is bold and italic</b></i>

In XML, all elements must be properly nested within each other:

<b><i>This text is bold and italic</i></b>

In the example above, “Properly nested” simply means that since the <i> element is opened inside the <b> element, it must be closed inside the <b> element.

6.XML Attribute Values Must be Quoted

XML elements can have attributes in name/value pairs just like in HTML.

In XML, the attribute values must always be quoted.

INCORRECT:

<note date=12/11/2007 orderId=123456>
  <to>Tove</to>
  <from>Jani</from>
</note>

CORRECT:

<note date="12/11/2007" orderId="123456">
  <to>Tove</to>
  <from>Jani</from>
</note>

The error in the first document is that the date attribute in the note element is not quoted.

Entity References

Some characters have a special meaning in XML.

If you place a character like “<” inside an XML element, it will generate an error because the parser interprets it as the start of a new element.

This will generate an XML error:

<message>salary < 1000</message>

To avoid this error, replace the “<” character with an entity reference:

<message>salary &lt; 1000</message>

There are 5 pre-defined entity references in XML:

&lt; < less than
&gt; > greater than
&amp; & ampersand
&apos; ' apostrophe
&quot; " quotation mark

Only < and & are strictly illegal in XML, but it is a good habit to replace > with &gt; as well.

Comments in XML

The syntax for writing comments in XML is similar to that of HTML.

<!-- This is a comment -->

Two dashes in the middle of a comment are not allowed.

Not allowed:

<!-- This is a -- comment -->

Strange, but allowed:

<!-- This is a - - comment -->

Well Formed XML

XML documents that conform to the syntax rules above are said to be “Well Formed” XML documents.

XML Attributes

XML Attributes Must be Quoted

Attribute values must always be quoted. Either single or double quotes can be used.

For a person’s gender, the element can be written like this:

<person gender="female">

or like this:

<person gender='female'>

If the attribute value itself contains double quotes you can use single quotes, like in this example:

<gangster name='George "Shotgun" Ziegler'>

or you can use character entities:

<gangster name="George &quot;Shotgun&quot; Ziegler">

XML Namespaces

XML Namespaces - The xmlns Attribute

When using prefixes in XML, a namespace for the prefix must be defined.

The namespace can be defined by an xmlns attribute in the start tag of an element.

The namespace declaration has the following syntax. xmlns:prefix=“URI”.

<root>

<h:table xmlns:h="http://www.w3.org/TR/html4/">
  <h:tr>
    <h:td>Apples</h:td>
    <h:td>Bananas</h:td>
  </h:tr>
</h:table>

<f:table xmlns:f="https://www.w3schools.com/furniture">
  <f:name>African Coffee Table</f:name>
  <f:width>80</f:width>
  <f:length>120</f:length>
</f:table>

</root>

In the example above:

The xmlns attribute in the first element gives the h: prefix a qualified namespace.

The xmlns attribute in the second element gives the f: prefix a qualified namespace.

When a namespace is defined for an element, all child elements with the same prefix are associated with the same namespace.

Namespaces can also be declared in the XML root element:

<root xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f="https://www.w3schools.com/furniture">

<h:table>
  <h:tr>
    <h:td>Apples</h:td>
    <h:td>Bananas</h:td>
  </h:tr>
</h:table>

<f:table>
  <f:name>African Coffee Table</f:name>
  <f:width>80</f:width>
  <f:length>120</f:length>
</f:table>

</root>

Note: The namespace URI is not used by the parser to look up information.

The purpose of using an URI is to give the namespace a unique name.

However, companies often use the namespace as a pointer to a web page containing namespace information.

XML Validator

XML Errors Will Stop You

Errors in XML documents will stop your XML applications.

The W3C XML specification states that a program should stop processing an XML document if it finds an error. The reason is that XML software should be small, fast, and compatible.

HTML browsers are allowed to display HTML documents with errors (like missing end tags).

With XML, errors are not allowed.

XML DTD

Valid XML Documents

A “Valid” XML document is a “Well Formed” XML document, which also conforms to the rules of a DTD:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "Note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

The DOCTYPE declaration, in the example above, is a reference to an external DTD file. The content of the file is shown in the paragraph below.

XML DTD

The purpose of a DTD is to define the structure of an XML document. It defines the structure with a list of legal elements:

<!DOCTYPE note
[
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>

The DTD above is interpreted like this:

  • !DOCTYPE note defines that the root element of the document is note
  • !ELEMENT note defines that the note element must contain the elements: “to, from, heading, body”
  • !ELEMENT to defines the to element to be of type “#PCDATA”
  • !ELEMENT from defines the from element to be of type “#PCDATA”
  • !ELEMENT heading defines the heading element to be of type “#PCDATA”
  • !ELEMENT body defines the body element to be of type “#PCDATA”

#PCDATA means parse-able text data.

When to Use a DTD/Schema?

With a DTD, independent groups of people can agree to use a standard DTD for interchanging data.

With a DTD, you can verify that the data you receive from the outside world is valid.

You can also use a DTD to verify your own data.

If you want to study DTD, please read our DTD Tutorial.

When to NOT to Use a DTD/Schema?

XML does not require a DTD/Schema.

When you are experimenting with XML, or when you are working with small XML files, creating DTDs may be a waste of time.

If you develop applications, wait until the specification is stable before you add a document definition. Otherwise, your software might stop working because of validation errors.

Reference

https://www.w3schools.com/xml/xml_whatis.asp