Using DOMDocument to Modify HTML with PHP


One of the first things you learn when wanting to implement a service worker on a website is that the site requires SSL (an https address). Ever since I saw the blinding speed service workers can provide a website, I’ve been obsessed with readying my site for SSL. Enforcing SSL with .htaccess was easy — the hard part is updating asset links in blog content. You start out by feeling as though regular expressions will be the quick cure but anyone that has experience with regular expression knows that working with URLs is a nightmare and regex is probably the wrong decision.

The right decision is DOMDocument, a native PHP object which allows you to work with HTML in a logical, pleasant fashion. You start by loading the HTML into a DOMDocument instance and then using its predictable functions to make things happen.

// Formats post content for SSLfunction format_post_content($content = '') { $doc = new DOMDocument(); $doc->loadHTML($content); $tags = $doc->getElementsByTagName('img'); foreach ($tags as $tag) { $tag->setAttribute('src', str_replace('', '', $tag->getAttribute('src') ) ); } return $doc->saveHTML(); }

In my example above, I find all img elements and replace their protocol with `https://`. I will end up doing the same with iframe src, a href, and a few other rarely used tags. When my modifications are done, I call saveHTML to get the new string.

Don’t fall into the trap of trying to use regular expressions with HTML — you’re in for a future of failure. DOMDocument is lightweight and will make your code infinitely more maintainable.