CFBF Files Strings Analysis
The Office file format that predates the OOXML format, is a binary format based on the CFBF format. I informally call this the ole file format.
It's a binary file format, and is uncompressed (disregarding application specific exceptions, like VBA source code).
That lends itself to strings analysis, as I've wrote about in previous diary entries.
There is a potential problem when you run the strings command on a .doc file, for example. The CFBF file format, is similar to a file system format: it is made up of sectors, and has File Allocation Tables. This means that the data that is contained into streams, is written into sectors. These sectors don't have to be sequential.
If you are looking for URLs for example, you could run the strings command on a .doc file, and grep for string http.
It can happen, that a URL straddles the boundary of 2 sectors: its first part is at the end of sector N, and its last part is at the start of sector N+1. If both sectors are written sequentially into the CFBF file, there is no problem. But if they are not contiguous, the strings command can not extract the complete URL, as it is split into 2 strings that are separated by other data.
I have a couple of solutions to this problem.
The first one I'll cover in this diary entry, is quite simple: my tool oledump.py, a tool to analyze CFBF files, has an option to extract all strings from a stream: -S.
Take this Word document, a .doc file, where I have typed a URL on the first page:
Stream 6 contains the content that I typed:
This is for a single stream. Use "-s a" to extract strings from all streams:
With this last command, you can extract all strings from all streams. And you will not extract strings that are not located in streams, like the names of the streams for example.
You don't have control over oledump's string extraction method, for example, you can not specify a minimum string length (it's minimum 4).
There are other methods where you can do that: that is for another diary entry.
Didier Stevens
Senior handler
Microsoft MVP
blog.DidierStevens.com
Comments
Anonymous
Dec 3rd 2022
9 months ago
Anonymous
Dec 3rd 2022
9 months ago
<a hreaf="https://technolytical.com/">the social network</a> is described as follows because they respect your privacy and keep your data secure. The social networks are not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go.
<a hreaf="https://technolytical.com/">the social network</a> is not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go. The social networks only collect the minimum amount of information required for the service that they provide. Your personal information is kept private, and is never shared with other companies without your permission
Anonymous
Dec 26th 2022
8 months ago
Anonymous
Dec 26th 2022
8 months ago
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> nearest public toilet to me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> public bathroom near me</a>
Anonymous
Dec 26th 2022
8 months ago
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> nearest public toilet to me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> public bathroom near me</a>
Anonymous
Dec 26th 2022
8 months ago
Anonymous
Dec 26th 2022
8 months ago
https://defineprogramming.com/
Dec 26th 2022
8 months ago
distribute malware. Even if the URL listed on the ad shows a legitimate website, subsequent ad traffic can easily lead to a fake page. Different types of malware are distributed in this manner. I've seen IcedID (Bokbot), Gozi/ISFB, and various information stealers distributed through fake software websites that were provided through Google ad traffic. I submitted malicious files from this example to VirusTotal and found a low rate of detection, with some files not showing as malware at all. Additionally, domains associated with this infection frequently change. That might make it hard to detect.
https://clickercounter.org/
https://defineprogramming.com/
Dec 26th 2022
8 months ago
rthrth
Jan 2nd 2023
8 months ago