HTML for Everyone
While reading an (excellent) article on how to build a better web browser (hat tip: Matt) I started thinking about why the makers of word processing software weren’t using HTML as their native format (this is one of the strengths of the human brain, it doesn’t matter how many terraflops a super computer has, it still can’t make leaps like that). I use MS Office at work and home, and I also have a copy of StarOffice, which I use as my default package, and I previously played with OpenOffice. I’ve also had experience of using a variety of other word processing/office packages over the years. Now, since the rise of the internet, most word processing software has offered the option to save the page as HTML. None of those that I have used, however, do this by default. Word has it’s own proprietary format, StarOffice’s Writer saves in an XML derived format I believe. What if they acted like a WYSIWYG editor, producing valid code?
It may not make full use of CSS and XHTML, even the pro web packages are having trouble creating that, but valid HTML 4 would be sufficient and easy enough to do. There isn’t anything that a word processing package does that isn’t available in HTML 4: font styles, alignment, images, font size, page layout. These cover probably 95% of the tasks carried out by 99% of users.
The front end, the look and operation, could remain the same, but the file would be saved, by default, in HTML. So what? Well, this is what HTML was designed for. XML, while an excellent markup language (I really must get on with learning more about it), is too cumbersome for simple text document markup, not to mention that without set tags, it’s far from universal, needing definition files to make it truly portable. For databases and datasets where searching or complicated processing is needed, great, but most documents consist of nothing more than headings and paragraphs. For this sort of thing, the defined tags of HTML are perfect, again, by design. We already have software that will support it on practically any OS and hardware you can think of. No more having to convert files to PDF or RTF so your Mac buddies can read it, no need to save your Word 2000 files as 97 so you can read it in the office (yes, I’m back on NT 4/Office 97 in Holland). Anyone who has a browser can read it. Anyone with a text editor could, at a push, edit it. It’s a free and open standard, which means no one can hijack it. It saves developing and maintaining multiple formats and standards. You can share your documents immediately, drag them onto a web server and anyone can view them, anywhere (no need for Word to be installed and hooked in to your browser or to download the file to view). META tags, part of the HTML standard, are also ideal for helping locate relevant documents. In an age where we are producing ever more electronic documents, and storing them for longer periods, we stand a good chance of drowning in a sea of information. META tags are designed to allow you to add a short description, keywords, the author, and a whole host of other things, perfect for allowing you to find files again.
For the most part, HTML is ideal for how most people use documents. It is let down by a few flaws, not least of which is the lack of support for embedding fonts. Now, images and other files I could live with. It’s not the neatest way, but if you wanted to email a document to a friend, you could attach the HTML file and all of the associated files, not great, but it’d work as long as the person receiving them saved them all to the same folder. Fonts, on the other hand, have practically no support outside the standard set that are guaranteed to be on every machine. Now, if the person receiving the file has the same font I wrote it in, fine, they’ll see it as I see it. If they don’t, well then it drops into a default font, and, while readable, this may not be what you want (I wrote may, personally I’d say that it was definitely not what I want). Maybe this is the reason for the lack of adoption. On the other hand, CSS would allow you to designate a family (a range) of fonts, and I'm sure building in a function that would automatically add some other suggestions to the font family designation when a non-standard font is selected would be easy to do. And how many documents do you produce with pictures or other files embedded in them?
It seems a shame really, that a wonderful, universal standard is let down by a few relatively small points. Sure, there are a couple of other things that are missing, but nothing crucial, and I’m sure none of these pose an insurmountable problem. Maybe we should all try using a WYSIWYG HTML editor to produce our text documents in future. Just a thought.