当前位置: 动力学知识库 > 问答 > 编程问答 >

java - How to parse xhtml with jsoup without changing Html or parsing Html entities

问题描述:

I am using jsoup parser to manipulate xhtml file.

My file contains below tag as I/P

<param name="video_title" value="&lt;p&gt;Renewable Energy&lt;/p&gt;" />

网友答案:

Depending on the jsoup version this will work:

Document document = ...;
document.outputSettings().charset(Charset.forName("ASCII")); //$NON-NLS-1$
System.out.println(document.body().html());

A solution might be to downgrade to a Jsoup version below 1.8.x. The escape behavior changed from 1.7.x to 1.8.x.

Here an example:

  • 1.7.3 <a href="#" title="Test&lt;br&gt;Test">Test<br />Test</a>
  • 1.8.1 <a href="#" title="Test<br>Test">Test<br>Test</a>

There is some more information on this topic here:
jsoup: differnt result after updating from 1.7.3 to 1.8.1, how to avoid this?

Another solution could be, the apache commons StringEscapeUtils.
Escape the value after parsing and placing the escaped value back into the element attribute.

org.jsoup.select.Elements all_elements = blogContentDocument.select("*");
for (Element element : all_elements) {
    String escaped = StringEscapeUtils.escapeHtml(element.attr("value"));
    element.attr("value", escaped);
    System.out.println(element);
}

// check if the content is changed in the document
System.out.println(blogContentDocument.html());
分享给朋友:
您可能感兴趣的文章:
随机阅读: