Fix HTML, CSS and JS leaking into conversation list preview. Bug 714317
When generating the preview, only the first 128 bytes of the first MIME
part is fetched and used. If this part is text/html with a significant
amount of embedded CSS, then there is a good chance the string passed to
Geary.HTML::remove_html_tags() will be invalid, or be missing closing
elements. Since that function uses regexes that require balanced tags to
remove script and style blocks, then it was very possible that in these
cases this method will miss removing these blocks.
To solve this, remove_html_tags() is removed and its call sites are
replaced by calls to Geary.HTML::html_to_text(), which has been tidyied
up to produce more human-readable result.
Add unit tests to cover new html_to_text functionality and its call
sites.
* src/engine/util/util-html.vala: Remove remove_html_tags(). Update
html_to_text() to not just insert line breaks, but also insert spaces
and alt text, and ignore tags like HEAD, SCRIPT and STYLE, as
appropriate. Add an optional param to also allow skipping BLOCKQUOTE
elements, which we don't want in the preview.
2016-12-18 23:28:53 +11:00
|
|
|
|
/*
|
|
|
|
|
|
* Copyright 2016 Michael Gratton <mike@vee.net>
|
|
|
|
|
|
*
|
|
|
|
|
|
* This software is licensed under the GNU Lesser General Public License
|
|
|
|
|
|
* (version 2.1 or later). See the COPYING file in this distribution.
|
|
|
|
|
|
*/
|
|
|
|
|
|
|
2018-03-09 11:58:02 +11:00
|
|
|
|
class Geary.HTML.UtilTest : TestCase {
|
Fix HTML, CSS and JS leaking into conversation list preview. Bug 714317
When generating the preview, only the first 128 bytes of the first MIME
part is fetched and used. If this part is text/html with a significant
amount of embedded CSS, then there is a good chance the string passed to
Geary.HTML::remove_html_tags() will be invalid, or be missing closing
elements. Since that function uses regexes that require balanced tags to
remove script and style blocks, then it was very possible that in these
cases this method will miss removing these blocks.
To solve this, remove_html_tags() is removed and its call sites are
replaced by calls to Geary.HTML::html_to_text(), which has been tidyied
up to produce more human-readable result.
Add unit tests to cover new html_to_text functionality and its call
sites.
* src/engine/util/util-html.vala: Remove remove_html_tags(). Update
html_to_text() to not just insert line breaks, but also insert spaces
and alt text, and ignore tags like HEAD, SCRIPT and STYLE, as
appropriate. Add an optional param to also allow skipping BLOCKQUOTE
elements, which we don't want in the preview.
2016-12-18 23:28:53 +11:00
|
|
|
|
|
|
|
|
|
|
public UtilTest() {
|
|
|
|
|
|
base("Geary.HTML.Util");
|
2018-07-28 15:08:39 +10:00
|
|
|
|
add_test("preserve_whitespace", preserve_whitespace);
|
2017-10-28 16:50:14 +11:00
|
|
|
|
add_test("smart_escape_div", smart_escape_div);
|
|
|
|
|
|
add_test("smart_escape_no_closing_tag", smart_escape_no_closing_tag);
|
|
|
|
|
|
add_test("smart_escape_img", smart_escape_img);
|
|
|
|
|
|
add_test("smart_escape_xhtml_img", smart_escape_xhtml_img);
|
|
|
|
|
|
add_test("smart_escape_mixed", smart_escape_mixed);
|
|
|
|
|
|
add_test("smart_escape_text", smart_escape_text);
|
|
|
|
|
|
add_test("smart_escape_text_url", smart_escape_text_url);
|
Fix HTML, CSS and JS leaking into conversation list preview. Bug 714317
When generating the preview, only the first 128 bytes of the first MIME
part is fetched and used. If this part is text/html with a significant
amount of embedded CSS, then there is a good chance the string passed to
Geary.HTML::remove_html_tags() will be invalid, or be missing closing
elements. Since that function uses regexes that require balanced tags to
remove script and style blocks, then it was very possible that in these
cases this method will miss removing these blocks.
To solve this, remove_html_tags() is removed and its call sites are
replaced by calls to Geary.HTML::html_to_text(), which has been tidyied
up to produce more human-readable result.
Add unit tests to cover new html_to_text functionality and its call
sites.
* src/engine/util/util-html.vala: Remove remove_html_tags(). Update
html_to_text() to not just insert line breaks, but also insert spaces
and alt text, and ignore tags like HEAD, SCRIPT and STYLE, as
appropriate. Add an optional param to also allow skipping BLOCKQUOTE
elements, which we don't want in the preview.
2016-12-18 23:28:53 +11:00
|
|
|
|
add_test("remove_html_tags", remove_html_tags);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2018-07-28 15:08:39 +10:00
|
|
|
|
public void preserve_whitespace() throws GLib.Error {
|
2020-05-09 16:04:22 +10:00
|
|
|
|
assert_equal(smart_escape("some text"), "some text");
|
|
|
|
|
|
assert_equal(smart_escape("some text"), "some text");
|
|
|
|
|
|
assert_equal(smart_escape("some text"), "some text");
|
|
|
|
|
|
assert_equal(smart_escape("some\ttext"), "some text");
|
|
|
|
|
|
|
|
|
|
|
|
assert_equal(smart_escape("some\ntext"), "some<br>text");
|
|
|
|
|
|
assert_equal(smart_escape("some\rtext"), "some<br>text");
|
|
|
|
|
|
assert_equal(smart_escape("some\r\ntext"), "some<br>text");
|
|
|
|
|
|
|
|
|
|
|
|
assert_equal(smart_escape("some\n\ntext"), "some<br><br>text");
|
|
|
|
|
|
assert_equal(smart_escape("some\r\rtext"), "some<br><br>text");
|
|
|
|
|
|
assert_equal(smart_escape("some\n\rtext"), "some<br><br>text");
|
|
|
|
|
|
assert_equal(smart_escape("some\r\n\r\ntext"), "some<br><br>text");
|
2018-07-28 15:08:39 +10:00
|
|
|
|
}
|
|
|
|
|
|
|
2018-03-07 17:30:14 +11:00
|
|
|
|
public void smart_escape_div() throws Error {
|
2017-10-28 16:50:14 +11:00
|
|
|
|
string html = "<div>ohhai</div>";
|
2020-05-09 16:04:22 +10:00
|
|
|
|
assert_equal(smart_escape(html), html);
|
2017-10-28 16:50:14 +11:00
|
|
|
|
}
|
|
|
|
|
|
|
2018-03-07 17:30:14 +11:00
|
|
|
|
public void smart_escape_no_closing_tag() throws Error {
|
2017-10-28 16:50:14 +11:00
|
|
|
|
string html = "<div>ohhai";
|
2020-05-09 16:04:22 +10:00
|
|
|
|
assert_equal(smart_escape(html), html);
|
2017-10-28 16:50:14 +11:00
|
|
|
|
}
|
|
|
|
|
|
|
2018-03-07 17:30:14 +11:00
|
|
|
|
public void smart_escape_img() throws Error {
|
2017-10-28 16:50:14 +11:00
|
|
|
|
string html = "<img src=\"http://example.com/lol.gif\">";
|
2020-05-09 16:04:22 +10:00
|
|
|
|
assert_equal(smart_escape(html), html);
|
2017-10-28 16:50:14 +11:00
|
|
|
|
}
|
|
|
|
|
|
|
2018-03-07 17:30:14 +11:00
|
|
|
|
public void smart_escape_xhtml_img() throws Error {
|
2017-10-28 16:50:14 +11:00
|
|
|
|
string html = "<img src=\"http://example.com/lol.gif\"/>";
|
2020-05-09 16:04:22 +10:00
|
|
|
|
assert_equal(smart_escape(html), html);
|
2017-10-28 16:50:14 +11:00
|
|
|
|
}
|
|
|
|
|
|
|
2018-03-07 17:30:14 +11:00
|
|
|
|
public void smart_escape_mixed() throws Error {
|
2017-10-28 16:50:14 +11:00
|
|
|
|
string html = "mixed <div>ohhai</div> text";
|
2020-05-09 16:04:22 +10:00
|
|
|
|
assert_equal(smart_escape(html), html);
|
2017-10-28 16:50:14 +11:00
|
|
|
|
}
|
|
|
|
|
|
|
2018-07-28 15:08:39 +10:00
|
|
|
|
public void smart_escape_text() throws GLib.Error {
|
2020-05-09 16:04:22 +10:00
|
|
|
|
assert_equal(smart_escape("some text"), "some text");
|
|
|
|
|
|
assert_equal(smart_escape("<some text"), "<some text");
|
|
|
|
|
|
assert_equal(smart_escape("some text>"), "some text>");
|
2017-10-28 16:50:14 +11:00
|
|
|
|
}
|
|
|
|
|
|
|
2018-07-28 15:08:39 +10:00
|
|
|
|
public void smart_escape_text_url() throws GLib.Error {
|
2020-05-09 16:04:22 +10:00
|
|
|
|
assert_equal(
|
|
|
|
|
|
smart_escape("<http://example.com>"),
|
|
|
|
|
|
"<http://example.com>"
|
2018-07-28 15:08:39 +10:00
|
|
|
|
);
|
2020-05-09 16:04:22 +10:00
|
|
|
|
assert_equal(
|
|
|
|
|
|
smart_escape("<http://example.com>"),
|
|
|
|
|
|
"<http://example.com>"
|
2018-07-28 15:08:39 +10:00
|
|
|
|
);
|
2017-10-28 16:50:14 +11:00
|
|
|
|
}
|
|
|
|
|
|
|
2018-03-07 17:30:14 +11:00
|
|
|
|
public void remove_html_tags() throws Error {
|
Fix HTML, CSS and JS leaking into conversation list preview. Bug 714317
When generating the preview, only the first 128 bytes of the first MIME
part is fetched and used. If this part is text/html with a significant
amount of embedded CSS, then there is a good chance the string passed to
Geary.HTML::remove_html_tags() will be invalid, or be missing closing
elements. Since that function uses regexes that require balanced tags to
remove script and style blocks, then it was very possible that in these
cases this method will miss removing these blocks.
To solve this, remove_html_tags() is removed and its call sites are
replaced by calls to Geary.HTML::html_to_text(), which has been tidyied
up to produce more human-readable result.
Add unit tests to cover new html_to_text functionality and its call
sites.
* src/engine/util/util-html.vala: Remove remove_html_tags(). Update
html_to_text() to not just insert line breaks, but also insert spaces
and alt text, and ignore tags like HEAD, SCRIPT and STYLE, as
appropriate. Add an optional param to also allow skipping BLOCKQUOTE
elements, which we don't want in the preview.
2016-12-18 23:28:53 +11:00
|
|
|
|
string blockquote_body = """<blockquote>hello</blockquote> <p>there</p>""";
|
|
|
|
|
|
|
|
|
|
|
|
string style_complete = """<style>
|
|
|
|
|
|
.bodyblack { font-family: Verdana, Arial, Helvetica, sans-serif; font-size:
|
|
|
|
|
|
12px; }
|
|
|
|
|
|
td { font-size: 12px; }
|
|
|
|
|
|
.footer { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10
|
|
|
|
|
|
px; }
|
|
|
|
|
|
</style>""";
|
|
|
|
|
|
|
|
|
|
|
|
string style_truncated = """<html><head>
|
|
|
|
|
|
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
|
|
|
|
|
|
<style>
|
|
|
|
|
|
.bodyblack { font-family: Verdana, """;
|
|
|
|
|
|
|
2020-05-09 16:04:22 +10:00
|
|
|
|
assert_equal(html_to_text(HTML_BODY_COMPLETE), HTML_BODY_COMPLETE_EXPECTED);
|
|
|
|
|
|
assert_equal(html_to_text(blockquote_body), "hello\n there\n");
|
|
|
|
|
|
assert_equal(html_to_text(blockquote_body, false), " there\n");
|
|
|
|
|
|
assert_equal(html_to_text(HTML_ENTITIES_BODY), HTML_ENTITIES_EXPECTED);
|
|
|
|
|
|
assert_string(html_to_text(style_complete)).is_empty();
|
|
|
|
|
|
assert_string(html_to_text(style_complete)).is_empty();
|
|
|
|
|
|
assert_string(html_to_text(style_truncated)).is_empty();
|
Fix HTML, CSS and JS leaking into conversation list preview. Bug 714317
When generating the preview, only the first 128 bytes of the first MIME
part is fetched and used. If this part is text/html with a significant
amount of embedded CSS, then there is a good chance the string passed to
Geary.HTML::remove_html_tags() will be invalid, or be missing closing
elements. Since that function uses regexes that require balanced tags to
remove script and style blocks, then it was very possible that in these
cases this method will miss removing these blocks.
To solve this, remove_html_tags() is removed and its call sites are
replaced by calls to Geary.HTML::html_to_text(), which has been tidyied
up to produce more human-readable result.
Add unit tests to cover new html_to_text functionality and its call
sites.
* src/engine/util/util-html.vala: Remove remove_html_tags(). Update
html_to_text() to not just insert line breaks, but also insert spaces
and alt text, and ignore tags like HEAD, SCRIPT and STYLE, as
appropriate. Add an optional param to also allow skipping BLOCKQUOTE
elements, which we don't want in the preview.
2016-12-18 23:28:53 +11:00
|
|
|
|
}
|
|
|
|
|
|
|
2016-12-20 12:19:01 +11:00
|
|
|
|
private static string HTML_BODY_COMPLETE = """<html><head>
|
Fix HTML, CSS and JS leaking into conversation list preview. Bug 714317
When generating the preview, only the first 128 bytes of the first MIME
part is fetched and used. If this part is text/html with a significant
amount of embedded CSS, then there is a good chance the string passed to
Geary.HTML::remove_html_tags() will be invalid, or be missing closing
elements. Since that function uses regexes that require balanced tags to
remove script and style blocks, then it was very possible that in these
cases this method will miss removing these blocks.
To solve this, remove_html_tags() is removed and its call sites are
replaced by calls to Geary.HTML::html_to_text(), which has been tidyied
up to produce more human-readable result.
Add unit tests to cover new html_to_text functionality and its call
sites.
* src/engine/util/util-html.vala: Remove remove_html_tags(). Update
html_to_text() to not just insert line breaks, but also insert spaces
and alt text, and ignore tags like HEAD, SCRIPT and STYLE, as
appropriate. Add an optional param to also allow skipping BLOCKQUOTE
elements, which we don't want in the preview.
2016-12-18 23:28:53 +11:00
|
|
|
|
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
|
|
|
|
|
|
<style>
|
|
|
|
|
|
.bodyblack { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; }
|
|
|
|
|
|
td { font-size: 12px; }
|
|
|
|
|
|
.footer { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; }
|
|
|
|
|
|
</style>
|
|
|
|
|
|
</head>
|
|
|
|
|
|
<body><table cellSpacing="0" cellPadding="0" width="450" border="0" class="bodyblack"><tr><td>
|
|
|
|
|
|
<p><br />Hi Kenneth, <br /> <br /> We xxxxx xxxx xx xxx xxx xx xxxx x xxxxxxxx xxxxxxxx.
|
|
|
|
|
|
<br /> <br /> <br /> <br />Thank you, <br /> <br />XXXXX
|
|
|
|
|
|
X XXXXXX<br /><br />You can reply directly to this message or click the following link:<br /><a href="https://app.foobar.com/xxxxxxxx752a0ab01641966deff6c48623aba">https://app.foobar.com/xxxxxxxxxxxxxxxx1641966deff6c48623aba</a><br /><br />You can change your email preferences at:<br /><a href="https://app.foobar.com/xxxxxxxxxxxxx">https://app.foobar.com/xxxxxxxxxxx</a></p></td></tr>
|
|
|
|
|
|
</table></body></html>
|
|
|
|
|
|
""";
|
|
|
|
|
|
|
2016-12-20 12:19:01 +11:00
|
|
|
|
private static string HTML_BODY_COMPLETE_EXPECTED = """
|
Fix HTML, CSS and JS leaking into conversation list preview. Bug 714317
When generating the preview, only the first 128 bytes of the first MIME
part is fetched and used. If this part is text/html with a significant
amount of embedded CSS, then there is a good chance the string passed to
Geary.HTML::remove_html_tags() will be invalid, or be missing closing
elements. Since that function uses regexes that require balanced tags to
remove script and style blocks, then it was very possible that in these
cases this method will miss removing these blocks.
To solve this, remove_html_tags() is removed and its call sites are
replaced by calls to Geary.HTML::html_to_text(), which has been tidyied
up to produce more human-readable result.
Add unit tests to cover new html_to_text functionality and its call
sites.
* src/engine/util/util-html.vala: Remove remove_html_tags(). Update
html_to_text() to not just insert line breaks, but also insert spaces
and alt text, and ignore tags like HEAD, SCRIPT and STYLE, as
appropriate. Add an optional param to also allow skipping BLOCKQUOTE
elements, which we don't want in the preview.
2016-12-18 23:28:53 +11:00
|
|
|
|
|
|
|
|
|
|
Hi Kenneth,
|
|
|
|
|
|
|
|
|
|
|
|
We xxxxx xxxx xx xxx xxx xx xxxx x xxxxxxxx xxxxxxxx.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thank you,
|
|
|
|
|
|
|
|
|
|
|
|
XXXXX
|
|
|
|
|
|
X XXXXXX
|
|
|
|
|
|
|
|
|
|
|
|
You can reply directly to this message or click the following link:
|
|
|
|
|
|
https://app.foobar.com/xxxxxxxxxxxxxxxx1641966deff6c48623aba
|
|
|
|
|
|
|
|
|
|
|
|
You can change your email preferences at:
|
|
|
|
|
|
https://app.foobar.com/xxxxxxxxxxx
|
|
|
|
|
|
|
2016-12-20 12:19:01 +11:00
|
|
|
|
""";
|
|
|
|
|
|
|
|
|
|
|
|
private static string HTML_ENTITIES_BODY = """<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>
|
|
|
|
|
|
<div style="font-family: Verdana;font-size: 12.0px;">
|
|
|
|
|
|
<div>What if I said that I'd like to go to the theater tomorrow night.</div>
|
|
|
|
|
|
|
|
|
|
|
|
<div> </div>
|
|
|
|
|
|
|
|
|
|
|
|
<div>I think we could do that!</div>
|
|
|
|
|
|
""";
|
|
|
|
|
|
|
|
|
|
|
|
private static string HTML_ENTITIES_EXPECTED = """
|
|
|
|
|
|
|
|
|
|
|
|
What if I said that I'd like to go to the theater tomorrow night.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I think we could do that!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fix HTML, CSS and JS leaking into conversation list preview. Bug 714317
When generating the preview, only the first 128 bytes of the first MIME
part is fetched and used. If this part is text/html with a significant
amount of embedded CSS, then there is a good chance the string passed to
Geary.HTML::remove_html_tags() will be invalid, or be missing closing
elements. Since that function uses regexes that require balanced tags to
remove script and style blocks, then it was very possible that in these
cases this method will miss removing these blocks.
To solve this, remove_html_tags() is removed and its call sites are
replaced by calls to Geary.HTML::html_to_text(), which has been tidyied
up to produce more human-readable result.
Add unit tests to cover new html_to_text functionality and its call
sites.
* src/engine/util/util-html.vala: Remove remove_html_tags(). Update
html_to_text() to not just insert line breaks, but also insert spaces
and alt text, and ignore tags like HEAD, SCRIPT and STYLE, as
appropriate. Add an optional param to also allow skipping BLOCKQUOTE
elements, which we don't want in the preview.
2016-12-18 23:28:53 +11:00
|
|
|
|
""";
|
|
|
|
|
|
|
|
|
|
|
|
}
|