Excel Fixation

A couple of weeks back, I came across an interesting task – convert a PMML file to human-comprehensible format. PMML stands for Predictive Model Markup Language. It uses XML to store its data. A plain XML file is always human-readable, but you can’t do much with it on screen, other than feed it to an XML-aware application.

My first response was to write a stylesheet for the file. That worked pretty well. I had the data stacked for display in proper tables. It was more comprehensible than looking at the XML file in say, a text editor or in a browser without the stylesheet. The hitch? Human-comprehensible means “should open in Excel, ready for action”. The pretty thing being displayed in the browser was obviously not what they wanted.

So, I blew off the dust on my Perl skills, pulled in two modules and dumped the PMML data into honest-to-goodness Excel-compatible BIFF. The big question (whose size increased in part because the code I was writing, while trivial, was frighteningly repetitive and I’m not a fast typist) was: why in the world would these guys insist on Excel?

Back in my BE days, numerical techniques was one of the most interesting subjects around for the just-born programmers. Approximating mathematical functions into arithmetic algorithms and fleshing them out with our newly learned skills in C and Matlab – slick. If you are in the number crunching profession, you can’t ask for much more powerful tools. But does anything beat the convenience of the “snap of the finger” processing that Excel provides? By “snap of the finger”, I mean operations like sum, sort, count, average, rounding off or approximating, filters, frequency distribution, elementary logic, stats and math functions and (the most important?) fetching data from all over the place using cell addresses and ranges. This is certainly not an exhaustive list, for I have seen these guys do a lot more.

I certainly don’t know the answer. Let’s say you have the data in a plain text file – CSV, TSV or whatever – even XML. Then you use whatever is your tool – C, Perl, Python, Octave, Matlab – to operate on the file and show the output – in as pretty a format as you want. I’m sure high-level languages like Python or tools like Octave can ease the pain for non-programmers. And of course, software like Octave make it easy to write out “snap of the finger” operations.

But the operative word is convenience. I’ll be deluding myself if I say all those tools are more convenient than Excel. And so, Excel wins. In spite of flaws in its formulas – try to multiply 850 by 77.1 in Excel 2007 – and security framework. Even though its spreadsheets (at least till recently) were limited to 65K rows. Sure, Office 2007 breaks that limit, but introduces OO-XML… end of story.

OK, so maybe, this is a childish tantrum – maybe spreadsheets are a good enough tool for their purpose. If they were in a *nix environment, they would be using OpenOffice.org Calc and I wouldn’t be writing this. So why am I pissed?

Because it hurts. Like your aesthetic senses would be hurt when you are confronted with crap masquerading as art. Like you would wince when you hear noise claiming to be music. XML is meant to be read by multiple applications – yes. But that hardly means you should open it in Excel 2003/2007 and claim to be working on an accurate representation of the XML file’s payload. Not unless you have a filter or translator plugged in.

As a web developer in training, I know that an XML file has to “interpreted” and its payload “translated” before it can be termed as useful. Excel users assume Excel knows how to do this already; for any XML file they load in it.

Excel doesn’t. Atleast not at the “snap of a finger”.