The exFAT file system was designed with Unicode file names and optional vendor-specific extensions in mind. To keep things simple, the file system specification allows the usage of multiple directory entries to describe a single file (so, additional file metadata is described in additional directory entries). This solution is similar to the VFAT extension for the FAT12/16/32 file systems, which was designed as a hack to the original file system format (originally, only one directory entry was used to describe a single file, so long file names were implemented as additional directory entries, which are “invisible” to operating systems without the VFAT support).
In the exFAT file system, a typical file consists of these entries (in this order, with no other entries between):
- one file entry,
- one stream extension entry,
- one or more file name entries (as needed to store the file name),
- zero, one or more vendor-specific entries (which can be ignored if not supported).
The first two entries describe all file metadata (its attributes, timestamps, data size, first cluster, etc.), while the file name entries contain strings to form the file name (each file name entry stores no more than 15 Unicode characters and the file name is no longer than 255 characters). Together, these entries are called a directory entry set (and it must contain at least three entries).
When a file is deleted, its directory entry set is marked as free. This process is very similar to what happens to a deleted file in the FAT12/16/32 file systems: the first byte of a directory entry is changed to mark it as free.
And, of course, it is possible to recover a deleted file when its directory set and data clusters are not overwritten. If the directory entry set is partially overwritten (with new directory entries), the following can be observed:
- if a file entry is overwritten, the timestamps and attributes are lost;
- if a stream extension entry is overwritten, the first cluster and data size become unknown, so remnant file data is no longer linked to this file;
- if file name entries are overwritten, the file name is lost.
Now, let’s take a look at things happening when free directory entries are overwritten in the FAT32 and exFAT file systems.
In the FAT32 file system (and in the FAT12/16 file systems too), long file names are stored in entries preceding the short file name entry (and without any gaps between these entries). And only the short file name entry contains timestamps and a pointer to file data (its first cluster), long file name entries store Unicode characters only, not counting some internal metadata to identify and validate the entry (and entries are stored in the reversed order: first characters of a long name are stored in the last long file name entry, immediately before the short name entry).
When new entries are allocated (in the example shown above, a file with no long name is created), they typically go to the beginning of the array of directory entries (overwriting previously deallocated directory entries). So, a deleted file loses its long name characters when its free directory entries are overwritten.
For example, let’s create a file with the following long name:
Then, let’s delete this file and check its name. It is the same:
Now, let’s create a new file with a short file name (“1.TXT”). Then, let’s check the deleted file, its name now is:
As you can see, five characters (“0.txt”) were lost, because the first directory entry of the deleted file (containing the last part of the long file name) was overwritten (see the layout above). Still, we have a short file name entry intact, so we can read the data.
This means that we can recover deleted file metadata (except the full long file name) until its short file name entry is overwritten. And its short file name entry is not going to be overwritten before long file name entries… (Unless we consider some unusual directory entry allocation scenarios.)
The situation is different in the exFAT file system.
Here, file name entries are stored after the stream extension entry (and their order is usual: the first entry following the stream extension entry contains first characters of the file name).
When new entries are allocated (in the example shown above, a file with a shorter name is created, so it uses one file name entry instead of two), they can go to the beginning of the array of directory entries (overwriting previously deallocated directory entries). So, a deleted file loses its critical metadata (its attributes, timestamps, data size, first cluster, etc.), but some of its file name entries can survive!
Tests show that the Windows exFAT driver is not following this allocation algorithm. Instead, for relatively small directories, the driver extends the array of directory entries and appends new entries (writes them to the end of the extended array). Deleted directory entries are intact in this case.
However, two exFAT drivers found in the Linux kernel and in the macOS operating system respectively try to allocate new entries closer to the beginning of the array.
Since a file can’t be described using less than three directory entries, putting a file described using three directory entries into a gap (space between two allocated entries) previously occupied by a file described using four directory entries leaves one free directory entry intact (like the “File name, part 2” entry in the layout above). This entry can’t be allocated, because new files require three or more continuous directory entries (so, this entry can’t be allocated unless the next file or the preceding file is deleted).
This means that the exFAT implementations found in the Linux kernel and the macOS operating systems tend to leave many orphan file name entries in gaps between allocated files, such entries contain partial (or full, if lucky enough) file names of previously deleted files. (Here, orphan means having no corresponding file and stream extension entries.)
This behavior was confirmed for Ubuntu 21.10 and macOS Monterey (also, I observed it in real-world data).
An example (macOS Monterey) is below.
Here is a list of files in a directory (as shown by The Sleuth Kit, 4.11.1):
r/r 83971: 123456789012345678901234567890.txt r/r 83977: 2.txt r/r 83980: 3.txt r/r 83983: s0.txt r/r 83986: s1.txt r/r 83989: s2.txt r/r 83992: s3.txt r/r 83995: s4.txt r/r 83998: s5.txt r/r 84001: s6.txt r/r 84004: s7.txt r/r 84007: s8.txt r/r 84010: s9.txt r/r 84013: s10.txt r/r 84016: s11.txt r/r 84019: s12.txt [...] r/r 84727: s248.txt r/r 84730: s249.txt d/d 84733: 321 d/d 84736: 123
(255 files in total. No files marked as deleted found.)
And here is a HEX dump of that directory (only the beginning of the directory is shown):
And, finally, this is how this directory is shown when using the dfir_ntfs project: