I want to manipulate HTML and XHTML documents with the PHP DOM implementation. I use the DOMDocument->loadHTML() method to load the content.
In want to know if the loaded content is either XHTML or HTML. DOMDocument has a doctype object which contains the DOCTYPE declaration from the document itself. So far I thought about comparing $dom->doctype->publicId which contains strings like "-//W3C//DTD HTML 4.01//ENtext/html"
Is there any better way anyone can think of?
Sorry if my question was a bit unclear. I updated the question since it might have been confusing. But to make it clear now: This question is not about handling HTML with PHP DOM in general or whether XHTML is good or bad.
If you're loading from an external source, you can check the file's MIME type and see if it's
application/xhtml+xml; if it is, it's most definitely XHTML (of course it can lie and serve with that type, but with horribly malformed markup). Otherwise if it's
text/html then it'll be parsed as HTML tag soup. Validity of the actual markup aside, the doctype declaration is your next best way of telling whether the content is (or claims to be) HTML or XHTML.
Like you say, you can check the public identifier and/or the URI and determine the type from there.