Long time ago when I was a kid I used to play this video game that I really like... I'm not going to name it because I'm not sure if anyone still cares for this game, but it was released in 2000 for Windows 98 and I think there were rumors for a PlayStation1 release, but that never happened. The game is currently sold on GoG, Steam, and probably a few other platforms. It runs on Windows and Linux thanks to Steam's Proton. Though I hear GoG version runs on plain Wine as well.
Well damn it, I was not satisfied that I couldn't run this 25 year old game at my 4K resolution and 144Hz. So like any self-respecting masochistic ignorant nerd I said "Sure, I can remake this in Rust". This is how my journey stared. According to Ghidra, the game is ~260K source lines of code when decompiled. I'm not "dreaming" to remake this game in Rust, but loading all of the assets in Bevy actually helped the reverse engineering efforts, so so far that's where I'm heading. Information in this blog post isn't new (shoutout to FringeSpace folks for advising and moral support). However a lot of the reverse engineering efforts have been ad-hoc and it sounds like on average I have already caught-up with what has been recovered in the past 25 years. I wanted to attract/teach people reverse engineering games for modding/preservation and also a single-place repository for information about this specific game's file structures, so that's why I'm making these blog post series.
You don't need to know reverse engineering, ASM, ghidra, or imhex to enjoy this blog post, but you need to know your C.
So let's dive right into it. This post's goal is to figure out how the game stores its asssets, whatever those are. I downloaded the game through Steam and look at the files:
The command just sorts files by size and removes dlls from view. The big outlier is Tachyon.pff taking the majority of the folder's size - 365MB. That's likely our target. I tried running common file ID tools against it and searching online, and while I did find the FringeSpace community who already RE'd the file - the consensus was that the PFF file structure was internal to the company that developed this game. So What can we do? Well the game reads the file... so it should have code inside of it that reads the file... let's open it in ghidra and see what happens.
I should note that I made a mistake already - Tachyon.exe is a small intro app that lets you browse the authors' website, look for updates, and configure the game before running it. This isn't the actual game, which is the 1.8MB space.exe executable that you see in the file listing above. However this mistake was fruitful at the end - the main game binary is obfuscated while this intro app isn't. At the same time - both the intro app and the main game load Tachyon.pff file - so we're in luck!
When you open Tachyon.pff in Ghidra it presents you with this view. I'd like to note that my view isn't quite default - I adjusted the color theme and added a few windows I find useful at the bottom. All of these can be added from the Window pulldown menu. Quick note about Jython - it's python with ghidra's java bindings. It can be useful for quick scripting, but I find the console presence to be really nice for quick dec->hex->bin conversion.
So at this point there are several things we can do. Before I dive into what I did - let's think for a second what is Reverse Engineering? I like to think of RE as pumping context into math, and you are the pump. Context being this abstract multidimensional latent space that you know. Ok so what do you know about this game? What do I know about it? Well, I know I want to find how to read tachyon.pff file. So I can search for strings and see if one of them is tachyon.pff. I also know that the only way to open files is to ask Windows to open them for you - so we can look for variations of the open() function.
When searching for strings - I was not able to find the whole word - tachyon.pff in the file. Bummer. But I was able to find these really interesting strings - this has to be used somewhere where I need to look, right?
If we click on one of the strings - the listing view will take us to it. From there we can see that Ghidra found exactly one function that references this address (the XREF label) - click on that function and you'll see:
Ok, where is that string?
Unfortunately the analysis didn't propagate the fact that this data is a human-readable text to the decompiler. Let's look back to where the string is stored - it's at 0x42e578... Oh look the code lists that as a fourth argument to this other function call - FUN_004063a0. We can right click on it and change parameter definitions:
Ok, so looking at the code now, we call FUN_0040aec0, then if that variable is 0, we return null, otherwise we do something and call a function that says "PFF LOADED FILE" ok, that sounds like we're in the right place.
Remember how I mentioned that reverse engineering is pumping context into math? Well, we are looking at some function and we just got a bit of context - it loads a pff file successfully. Now we need to nudge context from random directions until we bring enough of it to figure out what this function does. I like strings. Human readable strings is where a lot of context lives for us. During my first run-in with this file I went the manual way. I looked at this function, looked at other functions nearby, looped for KERNEL32.DLL::_lopen() function and see who called that... Eventually I brought enough context to figure this function out. However I also developed a few scripts to help me along the way. One of them is modification of ghidra's standard recursive string finder, however I modified it slightly - It now prints not only strings, but function names and static label names that don't start with FUN_ or DAT_ or LAB_ - essentially everything that I manually named already. Let's run that script on this function:
Oh look at that - devs were kind enough to even leave us function names in the log strings. So FUN_004063a0 is "mem_GetMemEx", FUN_0040fe1b frees something on the heap, FUN_00407260 is a wrapper for _llseek, FUN_0040b220 shows a MessageBoxA with an error message - so that's definitely an error handler of sorts - even more - FUN_0040fc1a accepts a format string as a SECOND argument - I bet that's fprintf! So let's spend some time renaming nearby functions that we can figure out. Eventually we get:
Oh wow that looks... Quite reasonable! And all I did was rename some functions based on what other strings or function calls they had that I knew about. Neat! Ok, so I'm guessing here but it looks like FUN_0040aec0 maybe reads the file? Over all param_2 must be a FILE *. Now what's this odd check after we read the file..?
Bit operations? That's odd. XOR? If your spidey-senses aren't tingling yet - don't fret. The year is 2025. This function calls no other functions, and doesn't de-reference anything - it's a perfect contender for what I call "vibe decoding".
Nice! You can read more about what it is but essentially it's a function that can decrypt or encrypt data. Run encrypted data through it - and you get plaintext. Run plaintext through it - and you get encrypted data. Though "encrypted" is weak by modern standards. But hint hint - looking on online forums you'll find references that the game's files are encrypted. And conveniently the decryption key is right there in uVar1. At this point I spent some time going from system calls and checking what kinds of data they accept to propagate everything. Eventually our target function FUN_40b0a0 will look like this:
I asked AI to help several times more during this effort - this "allocatedTaggedBlock" function is part of custom-implemented memory management engine that's found all throughout the game. You'll also notice that only one other function calls this function. That function also has a nice little error string inside of it which names it - FUN_407960 is called "file_LoadFileEx". From there I spend some more time marking nearby functions. It turns out that main pff file name isn't stored in the binary - it's loaded from a sort of config file called front.cfg. But overall, the read_resource function has everything you need to read the PFF file. Here's the resulting pff extractor:
No comments:
Post a Comment