Sunday, March 1, 2009

Five theories why developers write garbage HTML

Justin James reports that garbage HTML appears to be alive and well. He offers his theories on why some developers still crank out junk and discusses whether he thinks the problem is hopeless.

—————————————————————————————————————

A little more than two years ago, I pleaded to my fellow developers to stop writing garbage HTML! I think that problem has gone from bad to worse.

A few weeks ago, I asked Programming and Development readers, “How good is your HTML?” and roughly 25% of you claim that it meets all of the standards and such. I know a lot of the readers of this blog fairly well, so I believe those numbers; but I also know that makes my readers a very small minority. Garbage HTML is, unfortunately, alive and well.

Here is some really sad news: less than 7% of the Alexa Top 500 validate! It’s just a problem with those pages, right? Wrong. According to the researchers, “This is a slightly higher percentage rate than the much larger overall MAMA population, but the quantity and difference are still too small to declare any trends.” In other words, that sorry 6.57% valid rate is actually better than the Internet on the whole.

For a moment, let’s assume that Web standards are not the “be all and end all” of Web development. Judging from the more than 50% of readers who said their HTML is “good enough for it to display the way I want it to on all or most major browsers,” I think it can be said that 100% standards compliance may be a goal, but it is not the most important one. For the majority of developers, the standards are a means to an ends, and that end is looking good. All that being what it is, I don’t think a Web page that is slightly off of spec is garbage HTML. It may not be perfect, but that doesn’t make it garbage.

Garbage HTML has a special something to it, a unique blend of being not just invalid, but disgustingly so by going beyond minor misunderstandings or typos and far into the realm of negligence — improperly nested tags, tags that are never properly ended, incorrect attribute usage, and so on. Why do developers crank out this junk? Here are my theories:

  • Ignorance: A good number of developers don’t know better, whether because they did not seek to learn good HTML, or where they learned it from did a lousy job of teaching HTML. The folks who “learned” HTML by copy/pasting other people’s junk code fall into this bucket.
  • Poor tools: Many developers count on a tool to produce their HTML, whether it’s some sort of HTML WYSIWYG system, or maybe a framework/library that is cranking out bad code. There are reasons why some of these tools can’t generate code. For example, until recently, those on-screen, Web-based HTML editors had a much easier time rendering the font tag than a CSS class or even an inline style. The end result? HTML widgets that were generating the font tag for HTML nearly 10 years after font had become deprecated. Another problem is that there is not a 1:1 mapping of stylistic effects and the way to make the effects happen. Without editors that are strictly contextual driven, there is no good way for the tool to know that [Ctrl]I should result in the em tag as opposed to a CSS style that makes something italic.
  • HTML is too semantic: HTML is transitioned to being as purely semantic as possible without completely breaking backwards compatibility. In general, this is a good thing. But for a developer who is trying to finish their work and go home, it is a nightmare. Which tag does the developer use? What if the developer wants the effect that a tag produces, without the meaning of the tag? It’s enough to drive anyone insane. So what happens? Developers stop caring about what the HTML “means” and just do enough to make the page render the way the client wants it to look. In the shuffle, the HTML ends up being a mess.
  • Developers who never updated their HTML skill sets: This is a huge problem. I know developers who this very week probably used the font tag or maybe a frameset. Maybe they learned HTML in 1997, or the book they bought on the bargain bin in 2002 was from 1997. Who knows? But their HTML is stuck in the HTML 3 years, and they haven’t updated it since.
  • Server-side technologies: In my opinion, using print statements (or the equivalent) to produce large amounts of HTML is asking for trouble. The page can’t be “visualized” as HTML without first mentally executing software. That is a sure way to not be able to get the code right.

Is the situation hopeless?

I think some parts of this problem are not likely to be corrected. Most developers who care about doing things right are using tools that combine just enough visualization to make life a bit easier, without completely taking over the process. Some developers may be running their code through validators or keeping current on what is correct and what isn’t. But at the end of the day, there is no way to force a developer to start caring about these things unless you are their boss. After all, browsers will display even the worst HTML code, as long as it somewhat makes sense.

I’d be willing to bet that the percentage of developers writing garbage HTML has remained steady over the years. The 80/20 rule probably applies here (as it seems to do in so many other areas). In my experience, only about 20% of developers really understand how to accomplish something as well as why they should do it that way and what the consequences are. That’s the same 20% or so who are probably writing decent HTML.

I am sure that the Web would be better off if more developers put forth the effort to write better HTML.

J.Ja

Disclosure of Justin’s industry affiliations: Justin James has a working arrangement with Microsoft to write an article for MSDN Magazine. He also has a contract with Spiceworks to write product buying guides.

No comments:

ITWORLD
If you have any question then you put your question as comments.

Put your suggestions as comments