If you don’t know why transaction log files are important when dealing with registry hives from installations of Windows 8.1 & 10, please read this and this.
In this post, I will talk about an easy way to programmatically explore intermediate states of a registry hive using its transaction log files.
What is an intermediate state of a registry hive?
It’s a state of a registry hive after a log entry has been applied to it but before the recovery is finished.
Since a Windows kernel delays writes to primary files of registry hives up to an hour (not counting hibernation and sleep periods, so a real delay may be longer if a computer isn’t actively used), a registry flush results in dirty (modified) data being appended to a transaction log file (while a primary file remains unmodified, see the links above for more information). By applying log entries from transaction log files one by one, we can explore every state of a hive recorded by recent flush operations.
For example, we can collect more data for a timeline when multiple timestamps of a registry key are recorded in log entries (and this isn’t something unusual).
Take a look at the following timeline built by the yarp-timeline tool (the registry hive is from the 2018 Lone Wolf Scenario):
$ yarp-timeline ./Users/jcloudy/NTUSER.DAT | grep -Fa 'Software\Microsoft\Windows\CurrentVersion\Explorer\UserAssist\{CEBFF5CD-ACE2-4F4F-9178-9926F41749EA}\Count' ./Users/jcloudy/NTUSER.DAT Software\Microsoft\Windows\CurrentVersion\Explorer\UserAssist\{CEBFF5CD-ACE2-4F4F-9178-9926F41749EA}\Count False True 2018-04-06 12:50:40.341634 ./Users/jcloudy/NTUSER.DAT Software\Microsoft\Windows\CurrentVersion\Explorer\UserAssist\{CEBFF5CD-ACE2-4F4F-9178-9926F41749EA}\Count False True 2018-04-06 12:47:52.767166 ./Users/jcloudy/NTUSER.DAT Software\Microsoft\Windows\CurrentVersion\Explorer\UserAssist\{CEBFF5CD-ACE2-4F4F-9178-9926F41749EA}\Count False True 2018-04-06 12:43:39.746196
As you can see, it’s easy to catch additional timestamps!
Extracting data from intermediate states of a registry hive
The yarp library implements a simple interface to access intermediate states of a registry hive when applying transaction log files – a log entry callback.
If a log entry callback function was assigned to the Registry.RegistryHive instance, then this function is called after applying a log entry (thus, this function can be called many times). And the Registry.RegistryHive instance can be used from the callback function to access everything in the “intermediate hive” just like in the “normal hive”.
Let’s take a look at a value in the same registry hive. The code is here:
#!/usr/bin/env python3 from yarp import * primary = open('/mnt/tmp/Users/jcloudy/NTUSER.DAT', 'rb') log1 = open('/mnt/tmp/Users/jcloudy/ntuser.dat.LOG1', 'rb') log2 = open('/mnt/tmp/Users/jcloudy/ntuser.dat.LOG2', 'rb') hive = Registry.RegistryHive(primary) previous_data = None def parse_key(): """This is a log entry callback.""" global previous_data, hive key = hive.find_key('Software\\Microsoft\\Windows\\CurrentVersion\\Explorer\\UserAssist\\{CEBFF5CD-ACE2-4F4F-9178-9926F41749EA}\\Count') timestamp = key.last_written_timestamp() value = key.value('S:\Cebtenzf\Vzntre_Yvgr_3.1.1\SGX Vzntre.rkr') data = value.data_raw() print(timestamp) if previous_data is None or data != previous_data: print('Data:') print(RegistryHelpers.HexDump(data)) previous_data = data else: print('Same data') print('---') hive.log_entry_callback = parse_key # Assign the log entry callback. parse_key() # Run it before replaying the log files. hive.recover_auto(None, log1, log2) # Replay the log files. primary.close() log1.close() log2.close()
The output of that code is:
2018-04-06 12:47:52.767166 Data: 00000000 00 00 00 00 01 00 00 00-01 00 00 00 E1 41 02 00 .............A.. 00000010 00 00 80 BF 00 00 80 BF-00 00 80 BF 00 00 80 BF ................ 00000020 00 00 80 BF 00 00 80 BF-00 00 80 BF 00 00 80 BF ................ 00000030 00 00 80 BF 00 00 80 BF-FF FF FF FF C0 2B 91 90 .............+.. 00000040 A4 CD D3 01 00 00 00 00 ........ --- 2018-04-06 12:43:39.746196 Data: 00000000 00 00 00 00 01 00 00 00-01 00 00 00 4E 1B 02 00 ............N... 00000010 00 00 80 BF 00 00 80 BF-00 00 80 BF 00 00 80 BF ................ 00000020 00 00 80 BF 00 00 80 BF-00 00 80 BF 00 00 80 BF ................ 00000030 00 00 80 BF 00 00 80 BF-FF FF FF FF C0 2B 91 90 .............+.. 00000040 A4 CD D3 01 00 00 00 00 ........ --- 2018-04-06 12:43:39.746196 Same data --- 2018-04-06 12:43:39.746196 Same data --- 2018-04-06 12:47:52.767166 Data: 00000000 00 00 00 00 01 00 00 00-01 00 00 00 E1 41 02 00 .............A.. 00000010 00 00 80 BF 00 00 80 BF-00 00 80 BF 00 00 80 BF ................ 00000020 00 00 80 BF 00 00 80 BF-00 00 80 BF 00 00 80 BF ................ 00000030 00 00 80 BF 00 00 80 BF-FF FF FF FF C0 2B 91 90 .............+.. 00000040 A4 CD D3 01 00 00 00 00 ........ --- 2018-04-06 12:50:40.341634 Data: 00000000 00 00 00 00 01 00 00 00-01 00 00 00 7B D0 04 00 ............{... 00000010 00 00 80 BF 00 00 80 BF-00 00 80 BF 00 00 80 BF ................ 00000020 00 00 80 BF 00 00 80 BF-00 00 80 BF 00 00 80 BF ................ 00000030 00 00 80 BF 00 00 80 BF-FF FF FF FF C0 2B 91 90 .............+.. 00000040 A4 CD D3 01 00 00 00 00 ........ ---
(Sorry if the output isn’t using a monospaced font, this is a WordPress issue. Here is a screenshot of the output.)
So, we got 6 different states of a single registry value, 3 of them have unique value data. The only difference between the states of value data is 4 bytes at the offset 12 bytes.
Since we were parsing the UserAssist key, we can find the meaning of these bytes in existing documentation: these bytes represent the focus time.
An attentive reader could notice that the last written timestamp taken from the dirty primary file is “in the future” (time: 12:47), because the last written timestamp taken from the first log entry (time: 12:43) is preceding it. This is explained here.