Stuff I Learned Scripting - - Parsing XML in a One-Liner
Last Updated: 2011-11-10 14:12:05 UTC
by Rob VandenBrink (Version: 1)
This is the second story in this "Stuff I Learned Scripting" series. As I write scripts, I tend to stumble over commands or methods that I didn't know even existed before, and I thought I'd share these with our readers as they come up. Since I'm finding some of these commands for the first time, I invite you to post any more "elegant" or correct methods in our comment form.
If you're like me, you have a generally "good feeling" when you see config files set up as XML, it's an open standard with loads of tools to parse it out.
However ... I was recently tasked with parsing variables out of an XML file, using *only* what is available in Windows. This turned out to be trickier than I thought - XML is a tad more complex than your tradional "variable=value" windows INI file (or registry key for that matter). This is one of the reasons I've been (subconsciously I think) avoiding writing automation scripts against XML.
On the face of it, it might look easy - for instance:
<some variable> value </some variable>
is easy to get with the "find" command. But the same construct could just as easily be represented as:
which is *not* so easy to pull out using the find command in Windows.
Also, XML is heirachal, so:
is different altogether from:
At that point, I took a deep breath and decided it was time to dive into Powershell. Powershell has everything needed to parse and write XML out of the box, and it fills the requirement that it's actually on every box (well, every new Windows box anyway). There's a ton of sites out there that will explain how to do complex XML gymnastics, but in security audits generally all that is needed is a simple read of specific target variables. For instance, if you are auditing a VMware vCenter configuration against the VMware Hardening Guide, you should be looking at variables in the "vpxd.cfg" file, which is formatted in XML. One of the variables you'll want to look at is "enableHttpDatastoreAccess", which if enabled allows you to browse your ESX/ESXi datastores with a web browser (and appropriate credentials of course). The Hardening Guide recommends that this is turned off in some circumstances (their term is "SSLF" - Specialized Security Limited Functionality), so during an audit this value should at least be noted. In the config file, this value is represented as:
... other config variables and constructs ...
You can do this in 2 lines in powershell (though they may wrap on your display, depending on your screen resolution), with something like:
|[xml]$vpxdvars = Get-Content ./vpxd.cfg
||reads in an entire xml-formatted file into a Powershell variable "vpxdvars"|
|write-Host $vpxdvars.config.enableHttpDatastoreAccess||you can see in this example that the heirarchal format of the xml file is done by dot-separation. In this example we simply print (using write-Host) the target variable - represented as config.enableHttpDatastoreAccess from the XML file|
But how do you stuff this into a CMD file in windows? Simple - use the powershell "-Command" option, and string the Powershell commands together with semicolons. The line shown here will run from the command line or (more usefully) from within a CMD File:
powershell -Command "[xml]$vpxd = Get-Content ./vpxd.cfg" ; "write-Host $vpxd.config.enableHttpDatastoreAccess"
And yes, I know, I know, this probably has existed in Linux forever, but in most enterprises, Windows scripts tend to be preferred (he said as he looks hastily up for thunderclouds and lightning bolts). Having said that (and survived, so far anyway), I tend to use xpath in Linux if I need something simple in a bash script. It comes as part of the Perl Library XML::XPath, and is preinstalled on most major distributions (if you install perl). For instance, the query above might be represented as (command output is also shown):
# xpath -e '/config/enableHttpDatastoreAccess' ./vpxd.cfg
Found 1 nodes in ./test.xml:
-- NODE --
To get just the value, we'll use the "q" (for quiet) option, which filters out the "Found" and "NODE" lines, leaving only the path. Then we'll filter out the path by using grep to ignore anything with a ">" in it:
# xpath -q -e '/config/enableHttpDatastoreAccess' ./vpxd.cfg | grep -v '>'
And yes, you could do this simple example query in SED (though every time I think I have it right I find a case where it also breaks), GREP and AWK are also tools you can use XML parsing, with a similar caveat. But xpath commands are but much easier and much more "readable" - and readable scripts are REALLY important if you are planning to give them to a client, especially if they're not a SED / AWK / GREP / scripting guru. If you expect someone else to read your script, complex is NOT better. So you'll tend to see understandable, simple scripts in this series.,
For more complex XML operations and results, a more complex tool is usually required - if you need true "XML gymnastics", it might be time to write a more complex program in Perl, Powershell or Python (or your favourite language that supports XML, it doesn't necessarily need to start with a "P").
As always, I'm sure that there are true XML and Powershell experts out there (I'm not an expert at either) - if there's a better / simpler way to get this done than the one method I've described, please share on our comment form !!
If this particular example (and the certificate example I used on Monday) are of particular interest to you, they are both from the Security Class SANS SEC579 - Virtualization and Private Cloud Security ( http://www.sans.org/security-training/virtualization-private-cloud-security-1651-mid ), which will be offered first in January. (shameless plug - I'm a co-author for that course)
Nov 10th 2011
1 decade ago
Nov 14th 2011
1 decade ago
Oct 6th 2017
5 years ago