Office: About OLE and ZIP Files

Published: 2020-09-07
Last Updated: 2020-09-07 16:41:49 UTC
by Didier Stevens (Version: 1)
1 comment(s)

A reader asked if a particular Emotet sample was a malformed ZIP file. It is not, and I will explain why you might think it is in this diary entry.

I create an example Word document, and save it as a .doc file (OLE file).
When I look at it with my tool zipdump.py, I get this output:


Why do I get output for a ZIP file, when the .doc file is an ole file?


What the reader noticed, is that when they used my tool zipdump.py with option -f L to find and list all PKZIP record, the output showed that there was data before the first PKZIP record (p = prefix, 10566 bytes) and after the last PKZIP record (s = suffix, 12898 bytes):


We have indeed seen ZIP files with data prepended or appended, to try to fool anti-virus products. But this is not the case here.
What is going on, is that each .doc file created with Office contains an embedded ZIP file with theme data.
When I use oledump.py with its YARA option to do an ad hoc search for filename theme1.xml, I see that this string is in the 1Table stream. This is where the ZIP file is embedded:


This file theme1.xml, found in a ZIP file embedded in an OLE file (.doc), is also present in the OOXML format (.docx):


.doc files (and also .xls files) created with Microsoft Office contain an embedded ZIP file with theme data, and this ZIP file can be found with zipdump.py.

 

Didier Stevens
Senior handler
Microsoft MVP
blog.DidierStevens.com DidierStevensLabs.com

Keywords:
1 comment(s)

Comments


Diary Archives