-
-
Notifications
You must be signed in to change notification settings - Fork 303
The EMBA book ‐ Chapter 1: Firmware Extraction Layer
Imagine you've just bought a new smart device, like a fancy Wi-Fi router or a high-tech security camera. These devices run on special software called firmware. If you're curious about how secure your device is, or if you want to find out what software components it's truly running, you can't just "look inside" it directly. The firmware is usually packed up like a digital onion, with many layers and different types of files hidden within.
This is where EMBA's Firmware Extraction Layer comes in! It's the very first crucial step EMBA takes. Think of it like a meticulous detective carefully taking apart a complex device. Its job is to peel back these digital layers, identify all the different parts inside the firmware image, and extract them into a usable format for deeper inspection.
Our central goal in this chapter is to understand how EMBA automatically takes a single firmware file and breaks it down into all its individual components, preparing them for later analysis.
At its heart, a firmware image is just one big file that contains all the software a device needs to operate. It's often compressed, encrypted, or specially formatted by the manufacturer.
Imagine you have a sealed toy box. Inside, there aren't just toys; there are different bags of LEGO bricks, an instruction manual, and maybe even a few small tools. The entire toy box is like your firmware image. EMBA's job is to open this box and sort out everything inside.
The "things inside" the firmware image are often:
-
Filesystems: These are like organized folders and files, similar to what you find on your computer (e.g., Linux filesystems like
ext4orSquashfs). - Data Blobs: These are chunks of raw data that might be encrypted, compressed, or just unstructured information the device needs.
- Executables and Libraries: The actual programs and code that make the device work.
You, as the user, don't need to be a digital locksmith to use EMBA. You simply point EMBA to your firmware file, and it intelligently handles the extraction process.
EMBA doesn't just try one tool. It has a strategy that involves:
- Initial Identification (Pre-checking): First, it tries to identify the type of firmware file. Is it a standard archive? A disk image? Something proprietary?
- Specialized Extraction: If it identifies a known type, it uses a specific, best-suited tool to unpack it.
-
General-Purpose Extraction: If one of the specialized modules don't work or only partially succeed, EMBA uses powerful general-purpose unpackers like
unbloborbinwalkto try and extract whatever they can. - Deep Extraction (Recursive Unpacking): Some firmware packages are like "Matryoshka dolls" – a package inside a package, inside another package! If a Linux root filesystem isn't immediately found, EMBA will recursively try to extract files found within the first layer, then files from the second layer, and so on, until it finds a root filesystem or runs out of options.
Let's see how this works in practice.
To start an analysis with EMBA, you would typically run a command like this:
./emba -f my_router_firmware.bin -l ~/your_log_directoryWhen you run this command, EMBA automatically activates its Firmware Extraction Layer. It will begin by inspecting my_router_firmware.bin, identifying its type, and then using the most suitable tools to extract its contents. You'll see output on your screen describing what EMBA finds, such as:
[+] Identified Linux ext2 filesytem - using EXT filesytem extraction module
[*] Extracting firmware to directory /logs/firmware/unblob_extracted
This tells you EMBA recognized a specific type of filesystem and is now extracting it.
Let's peek behind the curtain to see how EMBA’s Firmware Extraction Layer works its magic.
The process can be visualized as a sequence of steps:
sequenceDiagram
participant User
participant EMBA
participant Pre-Checker
participant SpecializedExtractor
participant ExtractedFirmwareFolder
User->>EMBA: Provide firmware file (e.g., my_router_firmware.bin)
EMBA->>Pre-Checker: "What kind of firmware is this?"
Note over Pre-Checker: Analyzes headers, strings, and entropy
Pre-Checker->>EMBA: "It looks like a Linux Ext2 filesystem!" (or other type)
EMBA->>SpecializedExtractor: "Extract this Ext2 firmware for me!"
Note over SpecializedExtractor: Uses specific tools like mount or unblob
SpecializedExtractor->>ExtractedFirmwareFolder: Places extracted files here
ExtractedFirmwareFolder->>EMBA: "Extraction complete!"
EMBA->>EMBA: Identify root filesystem, architectures, etc.
EMBA->>User: Display extraction results
-
Initial Inspection (Pre-Checking): EMBA first runs a "pre-checking" module (
P02_firmware_bin_file_check.sh). This module is like a quick scanner that examines the firmware file's characteristics (like its file type, checksums, and initial bytes) to guess what kind of firmware it might be.Here's a simplified look at how EMBA might detect a firmware type:
# From modules/P02_firmware_bin_file_check.sh # Simplified version of fw_bin_detector function fw_bin_detector() { local lCHECK_FILE="${1:-}" local lFILE_BIN_OUT # Stores output of 'file' command # Use the 'file' command to identify the file type lFILE_BIN_OUT=$(file "${lCHECK_FILE}") # Check for specific patterns in the file output if [[ "${lFILE_BIN_OUT}" == *"Linux rev 1.0 ext2 filesystem data"* ]]; then # If it's an Ext2 filesystem, set a flag for other modules print_output "[+] Identified Linux ext2 filesytem" export EXT_IMAGE=1 # This variable tells EMBA to use the Ext extractor elif [[ "${lFILE_BIN_OUT}" == *"VMware4 disk image"* ]]; then # If it's a VMDK image, set another flag print_output "[+] Identified VMWware VMDK archive file" export VMDK_DETECTED=1 # This flag triggers the VMDK extractor # ... other detection logic for different types fi }
Based on what
file(a common Linux command for identifying file types) reports or other internal checks, EMBA sets internal flags (likeEXT_IMAGE=1orVMDK_DETECTED=1). These flags act as signals for which specialized extractor modules should be activated next. EMBA also generates a visual "entropy graph" of the firmware, which can sometimes reveal hidden encrypted sections. -
Specialized Extraction: Once EMBA has a good idea of the firmware's type, it calls upon one of its many specialized extraction modules. Each module is designed to handle a particular firmware format or encryption scheme.
Here are a few examples of specific extractors and what they do:
-
EXT Filesystem Extractor (
P14_ext_mounter.sh): If EMBA detects a Linuxext2,ext3, orext4filesystem, this module will "mount" it (make it accessible like a regular folder) and copy its contents.# From modules/P14_ext_mounter.sh (simplified) ext_extractor() { local lEXT_PATH_="${1:-}" local lTMP_EXT_MOUNT="${TMP_DIR}/ext_mount_${RANDOM}" mkdir -p "${lTMP_EXT_MOUNT}" print_output "[*] Trying to mount ${lEXT_PATH_} to ${lTMP_EXT_MOUNT} directory" # Mounts the image read-only mount -o ro "${lEXT_PATH_}" "${lTMP_EXT_MOUNT}" # If successful, copy files to the extraction directory if mount | grep -q ext_mount; then cp -pri "${lTMP_EXT_MOUNT}"/* "${lEXTRACTION_DIR_}" umount "${lTMP_EXT_MOUNT}" # Unmount after copying fi }
This snippet shows how the module creates a temporary mount point, mounts the filesystem image there, copies the contents, and then cleans up by unmounting it.
-
Windows Executable Extractor (
P07_windows_exe_extract.sh): If the firmware contains Windows.exeor.msifiles (common in some IoT devices), EMBA uses7zto extract them.# From modules/P07_windows_exe_extract.sh (simplified) exe_extractor() { local lFIRMWARE_PATH="${1:-}" local lEXTRACTION_DIR="${2:-}" mkdir "${lEXTRACTION_DIR}" print_output "[*] Extracting with 7z to ${lEXTRACTION_DIR}" # Extracts the contents of the Windows executable/installer 7z x -o"${lEXTRACTION_DIR}" "${lFIRMWARE_PATH}" }
This command is like unzipping a regular archive, but it's specifically for Windows executable formats.
-
Foscam Decryptor (
P20_foscam_decryptor.sh): For encrypted firmware from specific vendors like Foscam, EMBA might try a known decryption method using tools likeopensslwith pre-defined keys.# From modules/P20_foscam_decryptor.sh (simplified) foscam_enc_extractor() { local lFOSCAM_ENC_PATH_="${1:-}" local lEXTRACTION_FILE_="${2:-}" local l_FOSCAM_KEY="your_secret_key" # Example key print_output "[*] Testing FOSCAM decryption key ${l_FOSCAM_KEY}." # Tries to decrypt the file with a known key openssl enc -d -aes-128-cbc -md md5 -k "${l_FOSCAM_KEY}" \ -in "${lFOSCAM_ENC_PATH_}" > "${lEXTRACTION_FILE_}" }
This shows EMBA attempting to decrypt a Foscam firmware file using a specific algorithm (
aes-128-cbc) and a known key. -
Unblob Extractor (
P55_unblob_extractor.sh): This is a powerful, generic extraction module that acts as EMBA's primary "go-to" tool. If other specialized extractors don't apply or fail,unblobis often used as a robust fallback. It can identify and extract various embedded formats.Beside
unblobEMBA also usesbinwalkas the second generic firmware extraction framework:Unpacker Tool What it Does Best Analogy Binwalk Identifies and extracts embedded files and filesystems within a firmware. Great for finding hidden structures and recursing. A multi-tool with many specialized openers. Unblob A powerful, modern binary blob extractor designed to handle many complex and custom formats automatically. An advanced robot that intelligently disassembles complex devices. # From modules/P55_unblob_extractor.sh (simplified) unblobber() { local lFIRMWARE_PATH="${1:-}" local lOUTPUT_DIR_UNBLOB="${2:-}" mkdir -p "${lOUTPUT_DIR_UNBLOB}" print_output "[*] Extracting with unblob to ${lOUTPUT_DIR_UNBLOB}" # The core unblob command for deep extraction unblob -k -e "${lOUTPUT_DIR_UNBLOB}" "${lFIRMWARE_PATH}" }
The
unblobtool attempts to find and extract many different types of nested files within a firmware image. The-kflag tells it to keep going even if it encounters errors, ensuring a thorough extraction. -
Binwalk Extractor (
P50_binwalk_extractor.sh):The
binwalker_matryoshkafunction (fromhelpers/helpers_emba_extractors.sh) wraps thebinwalkcommand:# Simplified from helpers/helpers_emba_extractors.sh binwalker_matryoshka() { local lFIRMWARE_PATH="${1:-}" # Path to the firmware file local lOUTPUT_DIR_BINWALK="${2:-}" # Directory for extracted files print_output "[*] Extracting firmware with binwalk..." # This is the core binwalk command: "${BINWALK_BIN[@]}" -v -e -c -M -d "${lOUTPUT_DIR_BINWALK}" "${lFIRMWARE_PATH}" # -e: Extracts all files # -c: Carves known file types (e.g., JPEGs) # -M: Enables "Matryoshka" mode for recursive extraction # -d: Specifies the output directory # ... more options and error handling ... }
This command tells
binwalkto go deep (-Mfor Matryoshka mode) and extract everything it finds (-e). -
Deep Extractor (
P60_deep_extractor.sh): This module is the "last resort" and tries multiple rounds of extraction on every file found so far. It ensures that even deeply nested or unusually packed components are uncovered.EMBA doesn't stop at one layer. It performs deep extraction (
P60_deep_extractor.sh) by repeatedly scanning and extracting files found within the firmware. If a new archive or compressed file is found inside an already extracted layer, EMBA will try to unpack that as well. This is how it handles the "Matryoshka doll" effect.Additionally, firmware often contains common package formats like Debian packages (
.deb), Android packages (.apk), or OpenWrt packages (.ipk). TheP65_package_extractor.shmodule specifically identifies and extracts these, adding their contents to the overall extracted filesystem.For instance, here's a simplified look at how
P65_package_extractor.shmight handle a Debian package:# Simplified from modules/P65_package_extractor.sh deb_extractor() { local lDEB_ARCHIVES_ARR=() # List of .deb files found print_output "[*] Identifying Debian archives..." # Finds all .deb files in the extracted firmware mapfile -t lDEB_ARCHIVES_ARR < <(find "${FIRMWARE_PATH_CP}" -name "*.deb") if [[ "${#lDEB_ARCHIVES_ARR[@]}" -gt 0 ]]; then print_output "[*] Found ${#lDEB_ARCHIVES_ARR[@]} Debian archives - extracting them..." for lDEB in "${lDEB_ARCHIVES_ARR[@]}"; do local lR_PATH="${FIRMWARE_PATH_CP}" # Extract to the main firmware directory print_output "[*] Extracting $(basename "${lDEB}") to ${lR_PATH}" # Uses dpkg-deb to extract the package contents dpkg-deb --extract "${lDEB}" "${lR_PATH}" || true done fi }
This snippet shows how EMBA specifically targets common package types to ensure every possible layer is uncovered.
-
-
Post-Extraction Organization and Analysis: After the initial extraction, EMBA isn't done. It then:
-
Fixes permissions and symlinks: Firmware images often lose file permissions or have broken symbolic links after extraction. Helper scripts like
helpers/fix_bins_lnk_emulation.shfix these issues, making the extracted files more usable for later analysis. -
Identifies root directories: EMBA tries to figure out where the "main" part of the Linux filesystem is within the extracted data using
helpers/helpers_emba_prepare.sh. This is crucial for navigating the extracted firmware efficiently. - Analyzes architectures: It scans the extracted executable files to determine their CPU architecture (e.g., ARM, MIPS, x86) and endianness (how data is stored in memory). This information is vital for later steps like emulation.
-
Fixes permissions and symlinks: Firmware images often lose file permissions or have broken symbolic links after extraction. Helper scripts like
The Firmware Extraction Layer is designed to be comprehensive, ensuring that EMBA gets the most complete view possible of the device's internal software.
This entire process is automated. You feed EMBA a firmware file, and it automatically applies its "unpacker" team to get it ready for analysis.
Here's a table summarizing some of the common firmware types EMBA can extract and the modules/tools it uses:
| Firmware Type | Key Detection Clue | Primary Extractor Module | Core Tool/Method |
|---|---|---|---|
| Generic Binary Firmware | Any binary blob | P55_unblob_extractor.sh |
unblob |
| Generic Binary Firmware | Any binary blob | P50_binwalk_extractor.sh |
binwalk v3 |
Linux extX Filesystem |
file output contains "extX" |
P14_ext_mounter.sh |
mount |
| VMware VMDK Image |
file output contains "VMware" |
P10_vmdk_extractor.sh |
guestmount, 7z
|
| UEFI/BIOS Firmware | Strings like "UEFI", "BIOS" | P35_UEFI_extractor.sh |
UEFITool, uefi-firmware-parser
|
| DJI Drone Firmware | Specific header strings like "PRAK" | P40_DJI_extractor.sh |
dji-firmware-tools, unblob
|
Windows Executable (.exe) |
file output contains "PE32" |
P07_windows_exe_extract.sh |
7z |
Android OTA (payload.bin) |
Magic bytes "CrAU" | P25_android_ota.sh |
payload_dumper.py |
| UBI Filesystem |
file output contains "UBI image" |
P15_ubi_extractor.sh |
ubireader |
| Zyxel Encrypted ZIP |
.ri file + specific ELF executable |
P22_Zyxel_zip_decrypt.sh |
qemu-user, 7z
|
| QEMU QCOW2 Image |
file output contains "QEMU QCOW2" |
P23_qemu_qcow_mounter.sh |
qemu-nbd |
| Compressed (GPG) Firmware | Specific GPG header bytes | P17_gpg_decompress.sh |
gpg |
Package Archives (.deb, .apk, .ipk, .rpm) |
Filename extension, file type | P65_package_extractor.sh |
dpkg-deb, unzip, cpio
|
The Firmware Extraction Layer is the foundational step in EMBA's analysis. By automatically and intelligently dissecting complex firmware images, EMBA transforms a raw, often unusable binary blob into a well-organized collection of files and directories. This allows the subsequent analysis phases to work on meaningful data, uncovering vulnerabilities and providing detailed insights.
Now that EMBA has successfully extracted and organized the firmware's components, the next step is to examine their contents in detail without running them. This is where the Analysis Core comes into play.
EMBA - firmware security scanning at its best
Sponsor EMBA and EMBArk:
The EMBA environment is free and open source!
We put a lot of time and energy into these tools and related research to make this happen. It's now possible for you to contribute as a sponsor!
If you like EMBA you have the chance to support future development by becoming a Sponsor
Thank You ❤️ Get a Sponsor
You can also buy us some beer here ❤️ Buy me a coffee
To show your love for EMBA with nice shirts or other merch you can check our Spreadshop
EMBA - firmware security scanning at its best
- Home
- The EMBA book
- Feature overview
- Installation
- Usage
- Development
- Sponsoring EMBA
- EMBA Merchandise
- FAQ
- EMBArk enterprise environment