{"id":10514,"date":"2026-02-10T04:03:46","date_gmt":"2026-02-10T04:03:46","guid":{"rendered":"https:\/\/serisec.com\/index.php\/2026\/02\/10\/32692\/"},"modified":"2026-02-10T04:03:46","modified_gmt":"2026-02-10T04:03:46","slug":"32692","status":"publish","type":"post","link":"https:\/\/serisec.com\/index.php\/2026\/02\/10\/32692\/","title":{"rendered":"Quick Howto: Extract URLs from RTF files, (Mon, Feb 9th)"},"content":{"rendered":"<p>    Quick Howto: Extract URLs from RTF files, (Mon, Feb 9th)<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p>Malicious RTF (Rich Text Format) documents are back in the news with the <a href=\"https:\/\/www.helpnetsecurity.com\/2026\/02\/03\/russian-hackers-are-exploiting-recently-patched-microsoft-office-vulnerability-cve-2026-21509\/\">exploitation of CVE-2026-21509 by APT28<\/a>.<\/p>\n<p>The malicious RTF documents\u00a0<a href=\"https:\/\/www.virustotal.com\/gui\/file\/c91183175ce77360006f964841eb4048cf37cb82103f2573e262927be4c7607f\">BULLETEN_H.doc<\/a> and\u00a0<a href=\"https:\/\/www.virustotal.com\/gui\/file\/b2ba51b4491da8604ff9410d6e004971e3cd9a321390d0258e294ac42010b546\">Consultation_Topics_Ukraine(Final).doc<\/a> mentioned in the news are RTF files (despite their .doc extension, a common trick used by threat actors).<\/p>\n<p>Here is a quick tip to extract URLs from RTF files. Use the following command:<\/p>\n<pre>\n<code>rtfdump.py -j -C SAMPLE.vir | strings.py --jsoninput | re-search.py -n url -u -F officeurls<\/code><\/pre>\n<p>Like this:<\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/isc.sans.edu\/diaryimages\/images\/20260209-120912.png?ssl=1\" style=\"width: 993px; height: 187px;\"><\/p>\n<p>BTW, if you are curious, this is how that document looks like when opened:<\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/isc.sans.edu\/diaryimages\/images\/20260208-212413.png?ssl=1\" style=\"width: 1600px; height: 846px;\"><\/p>\n<p>Let me break down the command:<\/p>\n<ul>\n<li>\n<a href=\"https:\/\/github.com\/DidierStevens\/DidierStevensSuite\/blob\/master\/rtfdump.py\">rtfdump.py<\/a> -j -C SAMPLE.vir: this parses RTF file SAMPLE.vir and produces JSON output with the content of all the items found in the RTF document. Option -C make that all combinations are included in the JSON data: the item itself, the hex-decoded item (-H) and the hex-decoded and shifted item (-H -S). So per item found inside the RTF file, 3 entries are produced in the JSON data.<\/li>\n<li>\n<a href=\"https:\/\/github.com\/DidierStevens\/DidierStevensSuite\/blob\/master\/strings.py\">strings.py<\/a> &#8211;jsoninput: this takes the JSON data produced by rtfdump.py and extract all strings<\/li>\n<li>\n<a href=\"https:\/\/github.com\/DidierStevens\/DidierStevensSuite\/blob\/master\/re-search.py\">re-search.py<\/a> -n url -u -F officeurls: this extracts all URLs (-n url) found in the strings produced by strings.py, performs a deduplication (-u) and filters out all URLs linked to Office document definitions (-F officeurls)<\/li>\n<\/ul>\n<p>So I have found one domain (wellnesscaremed) and one private IP address (192.168&#8230;). What I then like to do, is search for these keywords in the string list, like this:<\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/isc.sans.edu\/diaryimages\/images\/20260209-122140.png?ssl=1\" style=\"width: 993px; height: 243px;\"><\/p>\n<p>If found extra IOCs: a UNC and a &#8220;malformed&#8221; URL. The URL has it&#8217;s hostname followed by @ssl. This is not according to standards. @ can be used to introduce credentials, but then it has to come in front of the hostname, not behind it. So that&#8217;s not the case here. More on this later.<\/p>\n<p>Here are the results for the other document:<\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/isc.sans.edu\/diaryimages\/images\/20260209-122843.png?ssl=1\" style=\"width: 993px; height: 353px;\"><\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/isc.sans.edu\/diaryimages\/images\/20260208-212510.png?ssl=1\" style=\"width: 1600px; height: 846px;\"><\/p>\n<p>Notice that this time, we have @80.<\/p>\n<p>I believe that this @ notation is used by Microsoft to provide the portnumber when WebDAV requests are made (via UNC). If you know more about this, please post a comment.<\/p>\n<p>In an upcoming diary, I will show how to extract URLs from ZIP files embedded in the objects in these RTF files.<\/p>\n<p>\u00a0<\/p>\n<p>Didier Stevens<br \/>\nSenior handler<br \/>\n<a href=\"http:\/\/blog.didierstevens.com\/\">blog.DidierStevens.com<\/a><\/p>\n<p> (c) SANS Internet Storm Center. https:\/\/isc.sans.edu Creative Commons Attribution-Noncommercial 3.0 United States License.<\/p><\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><\/p>\n<p> \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/isc.sans.edu\/diary\/rss\/32692\">Go to isc.sans.edu<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Quick Howto: Extract URLs from RTF files, (Mon, Feb 9th) Malicious RTF (Rich Text Format) documents are back in the news with the exploitation of CVE-2026-21509 by APT28. The malicious RTF documents\u00a0BULLETEN_H.doc and\u00a0Consultation_Topics_Ukraine(Final).doc mentioned in the news are RTF files (despite their .doc extension, a common trick used by threat actors). Here is a quick [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[56],"tags":[69],"class_list":["post-10514","post","type-post","status-publish","format-standard","hentry","category-isc-sans-edu","tag-isc-sans-edu"],"_links":{"self":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/10514"}],"collection":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/comments?post=10514"}],"version-history":[{"count":0,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/10514\/revisions"}],"wp:attachment":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/media?parent=10514"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/categories?post=10514"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/tags?post=10514"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}