Friday, 30 May 2014

Dissecting Tips: OLE and Office Open XML

DISCLAIMER: The choice of tools is based on a personal preference. The same results can be achieved using similar set of tools. This is not a step-by-step guide - these are just some tips.

If you're to try the described below you'll need to have the following skills and tools:

  • Fair understanding of Object Linking and Embedding (OLE) file structure
  • Fair understanding of Office Open XML file structure

NOTE: The file samples used in this blog post were sourced from phishing emails roaming around at the end of March 2014.

The fastest way to check if an OLE file has any malicious content embedded is to run it through 'OfficeMalScanner' tool. There is a couple of option keys to help you do that - 'scan' and 'info'. There is also a couple of switches available - 'brute' and 'debug' - that can further increase the chances of finding malicious content.

OfficeMalScanner usage

The screenshots below shows the tool output for a DOC file that was attached to a phishing email. Taking that we do not know what hides inside, it makes sense to analyse the file using both options - 'scan' option first.

OfficeMalScanner 'scan' option output

No suspicious content has been found, but note the comment at the bottom of the output - the tool is recommending to analyse the file using 'info' option key.

OfficeMalScanner 'info' option output

Now the tool detected an embedded VB script and dumped it into a folder. Quick glance at the script shows that it will download and execute a file.

part of extracted VB script

'OfficeMalScanner' tool can detect and extract embedded EXE files. The screenshot below shows an example of the tool output when it detects an EXE file embedded into a DOC file.

OfficeMalScanner output - detected and dumped embedded EXE

'OfficeMalScanner' tool can also handle Office Open XML files. Below is an example of the tool output when used with 'inflate' option.

OfficeMalScanner 'inflate' option output

NOTE: Simply changing an Office Open XML file extension to 'zip' and opening the file with an archiving tool of your choice will allow you to extract its file structure as well.

The decompressed files will be stored in 'DecompressedMsOfficeDocument' folder in user's '%TEMP%' location. In this particular example, the tool highlighted one file - 'word/vbaProject.bin' to be suspicious and suggested to run the tool against it using 'scan' or 'info' options.

OfficeMalScanner 'info' option output for embedded into DOCX file VB script

The tool has found and extracted an embedded VB script. This script doesn't seem to be reaching out to any external sources, like, we've seen in a previous example. Instead, it extracts 'text'(<w:t>) from each 'paragraph'(<w:p>) in the document, saves extracted data to a file and executes it.

Extract from malicious VB script (no execution part included)

At this point we know there is an executable file hidden in this document, but since it's represented as text, 'OfficeMalScanner' tool will not detect it. The screenshot below shows an example of the paragraphs and the text stored in them that reassemble the executable file.

'word/document.xml' file view in 'XML Explorer' tool

The following simple Python script can help to reconstruct the file from the text strings.

import zipfile, re

def saveFile(filename, content):
    fo = open(filename, "wb")

def main(inputFile, outputFile):
    docxFile = zipfile.ZipFile(inputFile)
    textContent ='word/document.xml')
    textContentInOneString = re.sub('<(.|\n)*?>','',textContent)
    bytesOnlyRegexGroup ="&amp;H") + ".*[a-zA-Z0-9]{2}", textContentInOneString)
    bytesOnly ="&amp;H","").decode('hex')
    saveFile(outputFile, bytesOnly)

readFrom = "C:\\infected\\27.05.2014\\Law Society message.docx"
saveTo = "C:\\infected\\27.05.2014\\extracted.bin"

main(readFrom, saveTo)

Checking the extracted file.

extracted file header

Target confirmed. Further info on the file is available on VT.

Other files contained in Office Open XML file structure that might be useful during an analysis

'\[Content_Types].xml' file view in XML Explorer tool

'[Content_Types].xml' file holds the list of all the content types used in the document.

'\word\_rels\document.xml.rels' file view in XML Explorer tool

'\word\_rels\document.xml.rels' file contains details about any embedded elements. In the example above it shows 4 embedded OLE objects. These are not necessarily malicious objects. Anything embedded into a DOCX file is stored as an OLE object. These objects can be found in '\word\embeddings' folder and can be analysed with 'OfficeMalScanner' tool. If the tool finds nothing suspicious 'SSViewer(Structure Storage Viewer)' utility can be used to extract the content of an OLE object for further analysis. The screenshot below shows an OLE file opened in SSViewer tool. OLE file components can be extracted and saved as a data stream file.

extracting content of an OLE file using SSViewer tool

The content will be saved into a file with '.stream' extension. Further file header analysis is required to determine the file type. In this particular example, the extracted content turned out to be WMF(Windows Metafile) file.

example of a file extracted from an OLE object

Saving a stream to a file will not always reconstruct the original file. The snapshot below shows a stream extracted from an OLE object that was embedded into DOCX file.

example of a stream file extracted from an OLE object

Apart from showing the location where the embedded file is stored on the originating machine, note the 'MZ' and 'This program must' strings. It's safe to assume that the embedded file is an EXE file, but in the current format it has some extra bits. To be able to restore the original file from the saved stream, we need to remove the data preceding the EXE header and the extra data at the end. Where the preceding data is not a problem and simply removing everything up to 'MZ' will give us the beginning of the original file, dealing with the extra bit at the end might be tricky. One of the ways to deal with it is to use 'PEStudio' tool.

'overlay' detected in PEStudio

PEStudio has detected some extra bytes(overlay) starting at offset 0x00322E00. Now we need to find the offset address at the end of the stream file and remove the overlay.

the end of the extracted stream containing the overlay

Once the extra data is removed, the original EXE file is fully restored and can be analysed further. If for whatever reason we want a copy of the overlay data PEStudio can be used to save it into a file.

saving 'overlay' to a file

extracted 'overlay' file

Hope these tips are helpful.


  1. Hi,

    Could you please to share the sample file ? Thanks !

    Best Regards,

  2. Could you please share used samples or others if you have more.