Triage is Key! Python to the Rescue!

    Published: 2025-07-29. Last Updated: 2025-07-29 09:29:53 UTC
    by Xavier Mertens (Version: 1)
    0 comment(s)

    When you need to quickly analyze a lot of data, there is one critical step to perform: Triage. In forensic investigations, this step is critical because it allows investigators to quickly identify, prioritize, and isolate the most relevant or high value evidence from large volumes of data, ensuring that limited time and resources are focused on artifacts most likely to reveal key facts about an incident. Sometimes, a quick script will be enough to speed up this task.

    Today, I'm working on a case where I have a directory containing +20.000 mixed files. Amongst them, a lot of ZIP archives (mainly Office documents), containing also lot of files. The idea is to scan all those files (including the ZIP archives) for some keywords. I wrote a quick Python script that will scan all files against the embedded YARA rule and, if a match is found, copy the original file into a destination directory.

    Here is the script:

    #
    # Quick Python triage script
    # Copy files matching a YARA rule to another directory
    #
    import yara
    import os
    import shutil
    import zipfile
    import io
    
    # YARA rule
    yara_rule = """
    rule case_xxxxxx_search_1
    {
        strings:
            $s1 = "string1" nocase wide ascii
            $s2 = "string2" nocase wide ascii
            $s3 = "string3" nocase wide ascii
            $s4 = "string4" nocase wide ascii
            $s5 = "string5" nocase wide ascii
        condition:
            any of ($s*)
    }
    """
    
    source_dir = "Triage"
    dest_dir = "MatchedFiles"
    os.makedirs(dest_dir, exist_ok=True)
    rules = yara.compile(source=yara_rule)
    
    def is_zip_file(filepath):
        """
        Check ZIP archive magic bytes.
        """
        try:
            with open(filepath, "rb") as f:
                sig = f.read(4)
                return sig in (b"PK\x03\x04", b"PK\x05\x06", b"PK\x07\x08")
        except Exception:
            return False
    
    def safe_extract_path(member_name):
        """
        Returns a safe relative path inside the destination folder (Prevent .. in paths).
        """
        return os.path.normpath(member_name).replace("..", "_")
    
    def scan_file(filepath, file_bytes=None, inside_zip=False, zip_name=None, member_name=None):
        """
        Scan a file with YARA.
        """
        try:
            if file_bytes is not None:
                matches = rules.match(data=file_bytes)
            else:
                matches = rules.match(filepath)
    
            if matches:
                if inside_zip:
                    print("[MATCH] {member_name} (inside {zip_name})")
                    rel_path = os.path.relpath(zip_name, source_dir)
                    filepath = os.path.join(source_dir, rel_path)
                    dest_path = os.path.join(dest_dir, rel_path)
                else:
                    print("[MATCH] {filepath}")
                    rel_path = os.path.relpath(filepath, source_dir)
                    dest_path = os.path.join(dest_dir, rel_path)
                
                # Save a copy
                os.makedirs(os.path.dirname(dest_path), exist_ok=True)
                shutil.copy2(filepath, dest_path)
        except Exception as e:
            print(e)
            pass
    
    # Main
    for root, dirs, files in os.walk(source_dir):
        for name in files:
            filepath = os.path.join(root, name)
            if is_zip_file(filepath):
                try:
                    with zipfile.ZipFile(filepath, 'r') as z:
                        for member in z.namelist():
                            if member.endswith("/"):  # Skip directories
                                continue
                            try:
                                file_data = z.read(member)
                                scan_file(member, file_bytes=file_data, inside_zip=True, zip_name=filepath, member_name=member)
                            except Exception:
                                pass
                except zipfile.BadZipFile:
                    pass
            else:
                scan_file(filepath)

    Now, you can enjoy some coffee while the script does the job:

    [MATCH] docProps/app.xml (inside Triage\xxxxxxx.xlsx)
    [MATCH] xl/sharedStrings.xml (inside Triage\xxxxx.xlsx)
    [MATCH] xl/sharedStrings.xml (inside Triage\xxxxxxxxxxxxxxxxxxxx.xlsx)
    [MATCH] ppt/slides/slide3.xml (inside Triage\xxxxxxxxxxxxxxxxxxxxxx.pptx)
    [MATCH] ppt/slides/slide12.xml (inside Triage\xxxxxxxxxxxxxxxxxxxxxx.pptx)
    [MATCH] ppt/slides/slide14.xml (inside Triage\xxxxxxxxxxxxxxxxxxxxxx.pptx)
    [MATCH] ppt/slides/slide15.xml (inside Triage\xxxxxxxxxxxxxxxxxxxxxx.pptx)
    [MATCH] xl/sharedStrings.xml (inside Triage\xxxxxxxx.xlsx)
    [MATCH] Triage\xxxxxxxxxxxxxxxxxxxxxxx.pdf
    [MATCH] Triage\xxxxxxxxxxxxxxxxxxx.xls
    [MATCH] xl/sharedStrings.xml (inside Triage\xxxxxxxxxxxxxxxx.xlsx)
    [MATCH] Triage\xxxxxxxxxxxxxxxxxxxxxxxxxx.xls

    You can see that, with a few lines of Python, you can speedup the triage phase in your investigations. Note that the script is written to handle my current files set and is not ready for broader use (lile to handle password-protected archives or other types of archives)

    Xavier Mertens (@xme)
    Xameco
    Senior ISC Handler - Freelance Cyber Security Consultant
    PGP Key

    0 comment(s)
    ISC Stormcast For Tuesday, July 29th, 2025 https://isc.sans.edu/podcastdetail/9546

      Comments


      Diary Archives