What is XML? — Extensible Markup Language Explained
Definition
XML (Extensible Markup Language) is a markup language and file format designed to store, transport, and reconstruct arbitrary data using a set of custom tags. It is defined by the World Wide Web Consortium (W3C) specification and is both human-readable and machine-readable. Unlike HTML, which has a fixed set of tags, XML allows authors to define their own tags and document structure, making it a meta-language for defining other markup languages.
XML was developed in 1996 by the XML Working Group (formerly SGML Editorial Review Board) and became a W3C Recommendation in 1998. It is the foundation for numerous document formats including RSS, Atom, SVG, XHTML, and Microsoft Office Open XML.
XML Syntax Rules
XML syntax follows a strict set of rules designed to ensure consistency across platforms and parsers:
- All elements must have a closing tag — Every opening tag requires a corresponding closing tag
- Tags are case-sensitive —
<Book>and<book>are different elements - Elements must be properly nested — If
<a>opens before<b>,<a>must close after<b> - All attributes must be quoted — Attribute values must be enclosed in single or double quotes
- The document must have a single root element — All content wraps inside one top-level element
- Empty elements can be self-closing —
<br />is valid shorthand for<br></br> - Comments use
<!-- -->syntax — Comments are preserved in the document - Entity references replace reserved characters —
&,<,>,",'
XML Prolog / Declaration
An XML document typically begins with an optional XML declaration (also called the prolog) that specifies the version, encoding, and whether the document is standalone:
<?xml version="1.0" encoding="UTF-8"?>The declaration attributes are:
version— The XML version (currently1.0or1.1)encoding— The character encoding (commonlyUTF-8orISO-8859-1)standalone— Whether the document depends on an external DTD (yesorno)
The declaration is optional but strongly recommended. If present, it must be the very first line in the document with no whitespace before <?xml.
XML vs HTML
| Aspect | XML | HTML |
|---|---|---|
| Purpose | Store and transport data | Display and present content |
| Tags | Custom, user-defined | Predefined by HTML specification |
| Syntax | Strict (must be well-formed) | Forgiving (browsers tolerate errors) |
| Case Sensitivity | Case-sensitive | Case-insensitive |
| Closing Tags | Always required | Optional for some elements (<br>, <img>) |
| Attributes | Must be quoted | Quotes optional in HTML5 |
| Root Element | Required (one root) | Implied (<html> is root) |
| Whitespace | Preserved by default | Collapsed in rendering |
| Support for Attributes | Full attribute support | Full attribute support |
| Namespaces | Supported via xmlns |
Supported but rarely used |
| Parsing Complexity | Simple with SAX/DOM | Complex (error recovery) |
| Extensibility | Fully extensible | Fixed schema |
Example XML Document
Here is a typical XML document representing a catalog of books:
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies in this hilarious novel.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology society, the young survivors lay the foundation for a new society.</description>
</book>
</catalog>Key observations from this example:
<catalog>is the root element containing all content- Each
<book>element has an attribute (id) that uniquely identifies it - Elements are nested in a tree structure
- The document is well-formed — every open tag has a matching close tag
XML Elements
An XML element is the fundamental building block of an XML document. Elements consist of:
- A start tag:
<tagname> - Content (text or nested elements)
- An end tag:
</tagname>
Element naming rules:
- Names can contain letters, numbers, hyphens, underscores, and periods
- Names must start with a letter or underscore (not a number or punctuation)
- Names are case-sensitive
- Names should not start with
xml(any case) — reserved by the W3C
XML Attributes
XML elements can have attributes — name-value pairs that provide additional information about an element:
<book id="bk101" lang="en" currency="USD">
<title lang="en">XML Developer's Guide</title>
</book>Rules for attributes:
- Attribute values must be enclosed in quotes (single or double)
- An element cannot have duplicate attributes
- Attributes are often used for metadata (IDs, references, types)
- Unlike HTML, XML attributes should not be used to store large amounts of data
CDATA Sections
A CDATA section tells the XML parser to treat enclosed content as character data rather than markup. This is useful when your content contains characters that would otherwise be interpreted as XML syntax:
<example>
<![CDATA[
if (x < y && y > 0) {
console.log("x < y and y > 0");
}
]]>
</example>CDATA rules:
- Starts with
<![CDATA[and ends with]]> - Everything between is treated as literal text
- Do not nest CDATA sections
- The string
]]>cannot appear inside CDATA content - Whitespace inside CDATA is preserved exactly
Well-Formed vs Valid XML
| Concept | Well-Formed XML | Valid XML |
|---|---|---|
| Definition | Follows XML syntax rules | Conforms to a DTD or Schema |
| Required | Always required | Optional |
| Checked by | XML parser | Validator (with DTD/Schema) |
| Example Rule | All tags must close | Element order defined in XSD |
| Enforces | Structural correctness | Data constraints and types |
Well-Formed XML
A well-formed XML document must:
- Have exactly one root element
- Have properly closed tags
- Have correctly nested elements
- Have quoted attribute values
- Avoid illegal characters (use entities instead)
Valid XML
A valid XML document is well-formed and follows the rules defined in a DTD (Document Type Definition) or XML Schema (XSD). Validation adds data typing, cardinality constraints, and structural rules.
DTD — Document Type Definition
A DTD (Document Type Definition) defines the legal building blocks of an XML document. It can be declared inline or referenced externally:
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>DTD limitations:
- Uses a non-XML syntax (must be learned separately)
- Limited data typing (no numbers, dates, or custom types)
- Does not support namespaces natively
XML Schema (XSD)
XML Schema Definition (XSD) is a more powerful alternative to DTD. An XSD itself is written in XML syntax and provides:
- Strong data typing (strings, integers, dates, decimals)
- Custom type definitions
- Element and attribute constraints (min/max occurrences)
- Pattern-based validation via regular expressions
- Namespace support
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>Why XSD over DTD:
- XML-native syntax (no separate language to learn)
- Rich data types (
xs:date,xs:integer,xs:decimal, etc.) - Reusable complex types
- Stronger tooling support in modern IDEs
Common XML Use Cases
RSS / Atom Feeds
XML powers web syndication. RSS and Atom feeds distribute blog posts, news articles, and podcast episodes using XML-formatted documents consumed by feed readers.
SVG (Scalable Vector Graphics)
SVG images are XML documents containing vector shapes, paths, text, and animations. Every SVG file you open in a browser or design tool is an XML document rendered as graphics.
SOAP APIs
SOAP (Simple Object Access Protocol) uses XML for both request and response payloads. SOAP envelopes define a strict structure for web service communication.
Android Layouts
Android uses XML to define user interface layouts, themes, and resources. Every activity_main.xml or AndroidManifest.xml is an XML file.
Microsoft Office Formats
Microsoft Office Open XML (OOXML) formats — .docx, .xlsx, .pptx — are ZIP archives containing XML files that define document structure, styles, and content.
Configuration Files
Many tools and frameworks use XML for configuration: Maven (pom.xml), Ant (build.xml), Spring (applicationContext.xml), Tomcat (server.xml), and Jenkins (config.xml).
UI Serialization
WPF (Windows Presentation Foundation) uses XAML, an XML-based language for declarative UI design. Similar approaches appear in Xamarin and Avalonia.
Data Interchange
XML remains widely used in enterprise environments where schema validation, namespaces, and document-centric workflows are required — financial transactions, healthcare (HL7 FHIR), and scientific data.
Why XML Still Matters
While JSON has become the dominant format for web APIs, XML continues to be essential in:
- Enterprise integrations where strict validation is required
- Document-oriented workflows (publishing, legal, compliance)
- Markup-centric domains (SVG, MathML, DocBook)
- Legacy systems with established XML pipelines
- Mixed-content documents that combine text and markup (like XHTML)
- Applications requiring schemas, namespaces, and extensibility
XML's ability to represent complex, hierarchical data with attributes, namespaces, and mixed content keeps it relevant alongside newer formats like JSON, YAML, and TOML.
LangStop XML Tools
- XML Formatter — Pretty print and beautify XML
- XML Validator — Check well-formedness and validate
- XML Minifier — Compress XML by removing whitespace
- XML to JSON — Convert XML documents to JSON
- XML to YAML — Convert XML to human-readable YAML
- XML to CSV — Convert XML data into CSV tables
- XML to TypeScript — Generate TypeScript interfaces from XML schemas