Thursday, December 4, 2008

How Not to Use XML

One aspect of my job is to work with 3rd parties to include commerce links to buy stuff from our site(s). This involves the 3rd party giving us a data feed of products which we take and try to match up with products in our own database. Sometimes the feeds will be a delimited text file. Sometimes they're be XML. Here's a small sample of some XML I recently had the pleasure of working with. The actual element names and data have been changed to protect the guilty party.

<Thing>
<stuff name="product_id">0926INGBS10075810</stuff>
<stuff name="company">Curabitur Fringilla Corp.</stuff>
<stuff name="title">Nullam Enim Justo (PC/Mac)</stuff>
<stuff name="Price">24.99</stuff>
<stuff name="Availability">See Site</stuff>
<stuff name="upc">0600100098554</stuff>
<stuff name="description">Lorem ipsum dolor sit amet</stuff>
<stuff name="boxshot">http://www.blah.com/product.gif</stuff>
<stuff name="purchase_url">http://www.blah.com/buy.asp?id=0123456789</stuff>
</Thing>

At this point why not just provide the data in a delimited text format? Everything having the same element name and using attributes to identify different fields actually makes the parsing of the feed *more* difficult and error prone.

It seems to me someone at this company said, "XML is what all the cool kids use. Our feed needs to be in XML. Then we'll be one of the cool kids too." Then someone who doesn't really understand what XML was intended to accomplish put this together.

Would it have been so hard to do this instead?

<thing>
<product_id>0926INGBS10075810</product_id>
<company>Curabitur Fringilla Corp.</company>
<title>Nullam Enim Justo (PC/Mac)</title>
<price>24.99</price>
<availability>See Site</availability>
<upc>0600100098554</upc>
<description>Lorem ipsum dolor sit amet.</description>
<boxshot>http://www.blah.com/product.gif</boxshot>
<purchase_url>http://www.blah.com/buy.asp?id=0123456789</purchase_url>
</thing>

No comments: