XML is a tag-based data storage format. XML is used in many places in enterprise applications today. Many business branches and partners use XML to send and receive data. Generally, the more sophisticated the transmission mechanism is, the more highly abstracted the programming API is for working with the data. For example, SOAP Web services use XML under the covers to transmit their data payload—but generally the apps on both ends of the wire work directly with class instances of the underlying data—not the underlying raw XML data.
There are, however, many less sophisticated transmission methods than SOAP Web services. In these cases you can't avoid getting the oil and grit of low-level XML on your hands when working directly with XML. .NET provides great built-in support for working with XML at both high abstraction levels (SOAP Web services, for example) and at low abstraction levels (reading raw XML directly).
This document takes a look at a few techniques for reading XML with low-level .NET Framework APIs. This programmability lives in the System.Xml namespace. As you'll soon see, XML's low-level oil and grit isn't as bad as it may first seem.
XML documents
Figure 1 shows a small XML document that uses element values to describe its data. That is, the text between each opening <beatle> tag and each closing </beatle> provides that element's value. Note that opening tags are simply text (no embedded white space is allowed in tag names) bounded by "<" and ">" and closing tags are the same, except they start with "</". All XML tags are case sensitive. </BEATLES> would not successfully close the <beatles> opening tag. If an XML document is 100% valid, an exception is thrown when you attempt to open it. The word "document" is a little imposing but it's the convention when discussing XML. For all practical purposes, an XML file and an XML document are the same thing.
0001 <beatles>
0002 <beatlesAsElements>
0003 <beatle>John Lennon</beatle>
0004 <beatle>George Harrison</beatle>
0005 <beatle>Ringo Starr</beatle>
0006 <beatle>Paul McCartney</beatle>
0007 </beatlesAsElements>
0008 </beatles>
Figure 1. XML document with element values only
If an element is empty, that is, it has no value, there you can either close it as shown in line 4 or line 5 of Figure 2. Notice in line 5 the element value is followed by a slash character before the ending tag. This is a shorthand for the longer form shown in line 4. Valid XML insists that all tags be closed as shown in either lines 3, 4, or 5 of Figure 2. Any non-closed tag renders the XML document invalid.
0001 <beatles>
0002 <beatlesAsElements>
0003 <beatle>John Lennon</beatle>
0004 <beatle></beatle>
0005 <beatle/>
0006 </beatlesAsElements>
0007 </beatles>
Figure 2. XML document with element values only
Persisted on disk, XML files are simply ASCII text. You can open an XML file with any ASCII text editor (such as NotePad) to view and/or modify them (be very careful doing this—it doesn't take much to accidently render an XML document unreadable with a text editor.) For read-only purposes, you can view XML in your browser. I like to view XML documents with Google's Chrome browser as shown in Figure 3. However, to do so takes a free XML viewer addin for Chrome available here. As you click various elements in the XML document, the addin displays the path to the currently selected element in a textbox. Later in this article we'll see how these element tags are used as "keys" to select data in the XML document.

Figure 3. Viewing an XML document in Google Chrome with a free XML viewer in Chrome
The XML document object
The XmlDocument class in .NET's System.XML namespace provides the programming abstraction needed to represent an XML document. Through the APIs this abstraction offers, it's pretty easy to read and/or write to the underling XML document. However, the XmlDocument class has one drawback: when you use it to work with XML document, the entire XML document is loaded into memory. Thus, be prudent about the size of documents you use with the XmlDocument object.
Having said that, the XmlDocument object can work with very large documents—much larger than seems reasonable. For example, I opened a 13MB iTunes music library XML document with the XmlDocument object in about 2.3 seconds on a four-year old middle-of-the-road Dell laptop. This document has 306,000 lines in it. While it probably would be prudent to avoiding work with XML documents this large with the XmlDocument object (especially on a Web server where several users each may have the large document opened concurrently), at least know that you can load very large files.
This article specifically discusses using the XmlDocument object for reading XML documents. With enough demand, a follow-up article will show how to use the XPathNavigator object to work with XML. For larger documents, the XpathNavigator is probably the better choice; it doesn't load the entire document into memory. However, most XML documents you work with will be of rational size and the XmlDocument object provides a great balance of capability versus ease of use.
Reading an XML document's elements
Let's turn our attention to reading the XML document in figure 1. The code in Figure 4 shows how open the XML document and read some data from it. To work with an XML document with the XmlDocument object (which requires a "Using System.Xml" statement at the top of your class) you first need to instance an XmlDocument object (line 1) and then use its Load() method to load the specified XML document (line 6).
This code doesn't show it, but you should add error handling code around the Load() method. If the specified file isn't found, or if the specified file isn't valid XML, an exception would be thrown at line 6.
Individual elements in an XML document are represented by the XmlNode object (also in the System.XML namespace). The XmlDocument object provides a SelectSingleNode() method that reaches into the XML document and returns a given node. To locate a given element, you pass a single SelectSingleNode() argument to specify the path (known as the XPath) of the desired node. Read more about XPath here. Simple XPath strings are just the path to the node, specified very much like the path in a DOS folder.
In line 10 of figure 4 you can see "/beatles/beatlesAsElements" used as the XPath argument. If you're an RPG coder, think of SelectSingleNode() as working like RPG's CHAIN operation code does. Given a "key," it returns a reference to the the node with that key. If the SelectSingleNode() fails, then a null object is returned. In production code it's important to check the return result from SelectSingleNode(). If it is *Nothing, then the operation didn't find a node with the given XPath value.
XPath values are case-sensitive. Using "/Beatles/BeatlesAsElements" for the XPath for SelectSingleNode() with Figure 1's XML document would fail. Let me say it again, XPath values are case-sensitive. Take great care with XML's case-sensitivity—incorrectly cased XML values are at the root of many problems work with XML documents. (When viewing XML documents with Google Chrome and the using the addin previously mentioned, you can click anywhere in the XML document the XPath is shown for where you clicked in a textbox above the document.)
0001 DclFld xml Type( XmlDocument ) New()
0002 DclFld ParentNode Type( XmlNode )
0003 DclFld ChildNode Type( XmlNode )
0004 DclFld XPath Type( *String )
0005
0006 xml.Load( "c:\beatles.xml" )
0007
0008 XPath = "/beatles/beatlesAsElements"
0009
0010 ParentNode = xml.SelectSingleNode( XPath )
0011 ForEach ChildNode Collection( ParentNode.ChildNodes )
0012 MsgBox ChildNode.InnerText
0013 EndFor
Figure 4. Reading an XML node with SelectSingleNode
In the case of our example Beatles XML document in Figure 1, the beatlesAsElements node has four children. A very common task in reading an XML document is needing to read a given node's children. There are several ways to traverse a node's children nodes; AVR's For/Each is probably the best way. It requires the least code and is pretty clear in what it's doing. In Figure 4's line 11, the ChildNodes property (which is a collection of nodes) is used with the For/Each to traverse the child nodes of given parent node. If there aren't any child nodes, this loop wouldn't do anything. To ensure there are child nodes to iterate over, you could check the Count property of the ChildNodes property to determine exactly how many child nodes there are. The XmlNode object also has a HasChildNodes Booelan property; you could then also check if ParentNode.HasChildHodes is true to determine if there are child nodes to process. Note also that the InnerText property is what you use to report a node's value. The output from figure 4 would show each of the four Beatle's names.
Figure 5 below shows two additional ways to traverse a node's children. My preference would generally be the For/Each but you can use whatever works for you.
0001 // Using FirstChild and NextSibling to traverse children.
0002 ChildNode = ParentNode.FirstChild
0003 DoWhile ( ChildNode <> *Nothing )
0004 MsgBox ChildNode.InnerText
0005 ChildNode = ChildNode.NextSibling
0006 EndDo
0007
0008 // Using ChildNodes array to traverse children.
0009 // Assumes an i *Integer4 has been declared.
0010 If ( ParentNode.HasChildNodes )
0011 For Index( i = 0 ) To( ParentNode.ChildNodes.Count - 1 )
0012 MsgBox ParentNode.ChildNodes[ i ].InnerText
0013 EndFor
0014 EndIf
Figure 5. Alternative ways to read a node's children
Reading XML document's node's attributes
Having seen how to read an XML documents element values, let's turn our attention to a node's attributes. Attributes are optional value pairs that further define an XML node's definition. They can further define an XML element or they can be used as an alternative to element values. In lines 9 through 14 of figure 6, the four Beatles nodes provide the "name" value as an attribute value of the node instead of as the element's value. Any one node can have as many attributes as necessary. There isn't a right or wrong here; lines 2-7 and lines 9-14 are both valid, rational XML. If you're creating an XML document from scratch it's up to you to either use elements, attributes, or combination of the two.
There is an argument that says the element provides the primary value (the primary noun) and attributes provide further detail for that primary value (adjectives). For example, consider lines 16-21. Here you see the name provided as the element value, and an attribute used to provide additional secondary information. Again, it's your choice. Remember, too, that sometimes you don't get to choose. A vendor may say, "Here's our document, deal with it." So it's good to know how to read both element values and attribute values.
0001 <beatles>
0002 <beatlesAsElements>
0003 <beatle>John Lennon</beatle>
0004 <beatle>George Harrison</beatle>
0005 <beatle>Ringo Starr</beatle>
0006 <beatle>Paul McCartney</beatle>
0007 </beatlesAsElements>
0008
0009 <beatlesAsAttributes>
0010 <beatle name="John Lennon"/>
0011 <beatle name="George Harrison"/>
0012 <beatle name="Ringo Starr"/>
0013 <beatle name="Paul McCartney"/>
0014 </beatlesAsAttributes>
0015
0016 <beatlesAsElementsAndAttributes>
0017 <beatle born="1940">John Lennon</beatle>
0018 <beatle born="1943">George Harrison</beatle>
0019 <beatle born="1940">Ringo Starr</beatle>
0020 <beatle born="1942">Paul McCartney</beatle>
0021 </beatlesAsElementsAndAttributes>
0022 </beatles>
Figure 6. An XML document with attributes
To read attribute values, you simply use the XML element's GetAttribute() method—with one catch. The XmlNode object doesn't provide a GetAttribute() method—but the XmlElement object does (which also lives in the System.Xml namespace). The XmlElement is derived from the Xmlnode object—and is a more special case of an XML element's type. So, a way to read lines 17-20 in Figure 6's XML is with the code below in Figure 7:
0001 DclFld ChildElement Type( XmlElement )
0002
0003 ParentNode = xml.SelectSingleNode( XPath )
0004 ForEach ChildElement Collection( ParentNode.ChildNodes )
0005 MsgBox childElement.InnerText + " " ++
0006 childElement.GetAttribute( "born" )
0007 EndFor
Figure 7: Reading XML element and attribute values
The only gotcha here is that ChildElement needs to be declared as an XmlElement instead of an XmlNode.
Using SelectSingleNode() with attribute values
There may be a time when you need to select a given XML node based on an attribute value. For example, let's consider needing to determine John Lennon's birth year from the XML in lines 16 through 21 of Figure 6. To do so, you'd use the code shown in Figure 8:
0001 DclFld ParentNode Type( XmlElement )
0002 DclFld XPath Type( *String )
0003
0004 XPath = "/beatles/beatlesAsElementsAndAttributes/beatle"
0005 XPath = XPath + "[@born='1940']"
0006
0007 ParentNode = xml.SelectSingleNode( XPath ) *As XmlElement
Figure 8. Reading a specific node with an attribute value
The awkward concatenation in lines 4-5 is to keep the XPath value from wrapping in this article. XPaths can specified with one long string or be built up with String.Format or a StringBuilder object—whichever you prefer. The XPath value created in lines 4 and 5 of Figure 8 says select the XML node at the given XPath where the attribute "born" is 1940. If no such element exists, a null element is returned.
There is one minor wrinkle, though—the casting required on line 7. SelectSingleNode() returns specially an XmlNode object. XmlNode objects don't offer the cream rich goodness than an XmlElement does. So, ParentNode is declared as an XmlElement object and the result from SelectSingleNodes is cast as that type.
That's all folks
As you can see, with just a few guidelines under your belt, reading XML is easy. Study the System.Xml namespace—there are lots of XML-related interesting objects and capabilities in that namespace.