Dicussion on XML:
There are two ways of organizing an XML document: using something called a DTD and something called a Schema. One is somewhat more strict than the other (Schema) and one is easier to deal with (DTD).
The point to me is that there is SOME way of automatically validating a document. The most common ‘dirty data’ problems I’ve come across are with databases that are not validated properly against some kind of template. In the world of validaion, I separate problems into human-type errors and computer-type errors.
Human-type errors are things like miskeyed addresses, duplicating entries and writing in a premium value for the policy limit, for example. A computer-type error would be something like repeating the column heading as a value in every single field.
The point is that Computer-type errors are really easy for humans to spot and Human-type errors are easy for computers to spot. That’s why automatic validation is so powerful: if you can combine a computer validation process with a human dataset, you’re probably going to get something that works. And vice versa.
From a user-experience standpoint, a good validation process can be maddening (I have to CAPITALIZE the first letter every single time?!) because most of it is perceived as fiddly formatting busywork. Luckily, increases in computing power have ushered in the dawn of the autofill era and this bug has become a feature (we’ll help you type!).
Anyway, I’m supposed to learn that a DTD is a bit messier because the ID aren’t typed and there is no control for sets of keys, blah blah blah, but these problems aren’t there when using a Schema/xsd. The XSD document can be daunting so my ‘homework’ is to download schema for XML and play around with it.
How boring. I feel like I’m in “school” now, a feeling I loathe.
Speaking of which, relational algebra is a ridiculous topic. I don’t care if it’s the “underpinnings” of query languages. I hardly needed to learn C to build Python scripts, even though C is its “underpinnings”. If I had some deep problem with SQL I’d be happy to mess around with relational algebra to figure it out. But until then, keep me in the dark, please.