This is starting to seem quite important to me.
For the weekend project, I’ve been delving deeper and deeper into Python and have come across a distinction between “1998 HTML” and “2003 to today HTML”.
In 1998 there was much less standardization for coding websites. The HTML was poorly written or written by programs that produced sloppy code. An interesting consequence of this is that many older websites are harder to analyze with scraping programs.
The code today is massively improved. Today’s websites benefit from standards that get updated with ‘best practices’, which can spur automation of all kinds of functions.
The upshot is that information is becoming much easier to find, analyze and publish. And no new technology, just old technology maturing.
And we’re just getting started. A lot of websites now have something called an “API“, which is just a web address you can point your computer to and fling requests for information at. The idea is that regular browser websites are great for people, but computers don’t need all that stupid formatting. They just want data.
Well, for some reason, lots of websites will have different content offered on their API from the website.
It’s bizarre, particularly because, with minimal-to-moderate effort, any industrious programmer can build a simple scraping routine and pull the data out of the ‘human’ interface. Why the hurdle? It’s just wasting time.
The hurdle’s there for cultural reasons having nothing to do with technology. These cultural blocks are preventing the sharing of information and so are preventing innovation.
And they’ll change, I think.
Much of my job is concerned with translating one system’s way of recording insurance information into another system’s ‘language’. There is no standardization for the way information is stored in insruance management systems. This makes for lots of people with jobs like mine, which are basically wasting time and money. My business isn’t about analyzing information, after all.
Standardizing data formatting is going to be the next dislocation in the economy. We already pity those poor suckers in the back office wrestling with legacy systems.
Eventually they will go the way of the typing pool.