MHServerEmu Progress Report: November 2023
Hey everyone! Crypto here with the very first MHServerEmu progress report. I will try to do these semi-regularly to shine a light on all the latest things we are working on. Without further ado, let’s dive right in!
Game Database
While things have been on the quiet side this month, it doesn’t mean nothing is happening. Our biggest roadblock has been the game database, and its rebuilding will most likely remain our highest priority for the foreseeable future. So what is it and why does it take so long?
The thing is, Marvel Heroes is engineered in a way that’s somewhat unusual for an online RPG. Usually the client contains the bare minimum of information it needs to function, and one of the most time-consuming parts of reverse engineering a server for a game like this is collecting data and creating an approximation of things like where all the enemies should be, how much damage they should deal, what loot they should drop, their AI, scripting, and so on. This can be done using sophisticated tools that analyze packets received from the server, by watching what is happening in-game, or even by simply guessing in some cases. The problem is, the servers for Marvel Heroes are long gone, and it’s impossible to gather any new data, right? Not necessarily.
As we discovered doing our research, the game client actually contains a complete mirror of all the data used by the server. In a way, Marvel Heroes is built more like a single player game with an optional multiplayer mode, not unlike Diablo II, but where the single player functionality was stripped out at the last moment. So while the client lacks the “glue” that keeps everything together, all the individual pieces that make the game what it is, also known as prototypes, are completely preserved. But there’s a lot of pieces (there are 93114 prototypes in the version we are currently working with, and thousands more of auxiliary data files), and putting them together is tricky.
Gazillion developed their own custom framework for managing game data called Calligraphy. It appears most of the actual game was made with it, and it must have also included a set of custom tools reminiscent of programs like the Warcraft III World Editor:
In fact, this is a common way of developing video games, also known as data-driven design. Engineers develop a set of tools that expose various systems to game designers, who in turn define what the actual game is supposed to be through data, like what abilities should a hero have access to and how much damage should an enemy deal. These tools often contain some form of scripting functionality that allows designers to also define behaviours (like the trigger editor from Warcraft III shown on the screenshot above).
So while we do have all the original data for Marvel Heroes, to make use of it we also need to reverse engineer the framework it ran on. Calligraphy is a pretty powerful one: it features pretty advanced scripting capabilities and was used to define enemy AI, mission logic, damage formulas, and much more. And as we know, with great power comes great responsibility: there is a lot of moving parts, and just setting everything up requires a lot of steps.
A good example of this is the hierarchy of data:
-
To interface with loaded data, there are over a thousand C++ classes that heavily rely on inheritance.
-
Data is deserialized into these classes using a combination of blueprint and prototype files: blueprints contain field definitions, and prototypes contain the actual values.
-
Each blueprint is paired with a default prototype that contains default values for a given blueprint. Most of the prototypes that are actually used in the game inherit from these default prototypes and override values as needed. For example, there is a blueprint / default prototype for an avatar, and prototypes for each playable hero override this default avatar prototype. This is a simple example, but there are often multiple levels of data inheritance here.
-
Blueprints have their own inheritance that is really more like composition: each blueprint defines its own field group, but it can also reference other blueprints. And then a prototype gets its field groups from the blueprint it is directly bound to and all the other blueprints referenced in it.
-
Some prototypes have “mix-in” prototypes that you are supposed to pass some of the field groups to.
-
There are also resource prototypes that completely ignore everything mentioned above and use custom deserialization routines. However, they still fit into the same C++ class hierarchy.
As you can see, there’s a lot to unwrap here, and the devil is in the details. So figuring out all of this has been very time consuming, but progress is definitely being made. And once we get it all up and running, it’s going to open the way for implementing all sorts of in-game systems.
MHDataParser
When we began investigating what data even was present in the client, we started by examining various file types on their own, and that led to the server emulator getting some data parsing and exporting functionality. As our understanding of the bigger picture grew, it became evident that rather than using the data structures present in the files as is, the game actually does a significant amount of post-processing during initialization. So it became harder and harder to maintain this functionality as our implementation of the game database matured.
However, there is a lot of value to having raw data parsed and represented as is in a more readable format. It’s more version-agnostic, it can be used for datamining, and eventually it can even be expanded for modding game data. So, to keep the main codebase clean while not losing anything, I separated raw data parsing into a separate tool called MHDataParser. I tested it with various builds of the game, and so far it seems to work on PC clients going as far back as June 2015, as well as console ports. It won’t work with 2013-2014 versions of the game due to data archive format differences, but implementing support for them is definitely possible in the future.
It’s essentially a piece cut from an older MHServerEmu version, so it works very similarly: you copy game data files to the tool’s subdirectory, run it, wait for it to initialize, and then enter commands to export the data you want to human-readable JSON/TSV files. However, all the file definitions are up to date with the latest versions of the server, and there’s even a little bit of extra functionality that allows you to parse and export locale and string files.
As I was working on this, I made a pretty hilarious discovery: in earlier versions of the game there is a hidden “Pig Latin” localization that was used for testing:
And with a little bit of file renaming, hex editing, and config adjustment I was able to get it running in version 1.52:
Not even S.H.I.E.L.D. Agent Stan Lee is safe from it (or should I say, Say.Hay.Iay.Eay.Lay.Day Agentay Anstay Eelay):
Having fun aside, this is actually a working proof-of-concept for translating the game into more languages. And way more ambitious mods may also be possible far in the future.
Region Generation
Even though a lot of our efforts has gone into solving the game database conundrum, Alex has also been working very hard on reverse engineering procedural region generation. While it’s still very much work-in-progress. I will give you a brief overview of what it entails.
As you probably know, Marvel Heroes has a lot of Diablo DNA that brings with it heavy reliance on procedural generation, so every time you play the game you can explore slightly different zones, fight different enemies, and get random loot. The game world is actually structured like this:
-
There is a number of games running on a server. Each game is very similar to games on Battle.net in Diablo II, but you transition in and out of them seamlessly.
-
Each game hosts a number of regions, which are places like Avengers Tower, Midtown, and so on. As you transition between regions, you also often transfer between different games and even servers.
-
Each region consists of areas that have their own names and other characteristics. You can go to different areas in a region without a loading screen by moving around. For example, in the Madripoor region the first area would be Buccaneer Beach, from which you can go to Bamboo Forest, and then eventually reach Lowtown.
-
Each area is built from cells, which are basic building blocks of the game world.
The way Marvel Heroes approaches area generation is actually very similar to Diablo III, and I highly recommend watching this talk on Diablo Dungeon Design by Ed Hanes from Blizzard, where he goes quite in-depth and even demonstrates some of their internal tools. Unfortunately, the video is age-restricted, so I can’t embed it here, but I will add some of the slides relevant to Marvel Heroes.
Cells, just like tiles in Diablo, are classified by their cardinal directions:
These cells are combined into areas according to sets of rules defined by game designers in development tools like these:
Marvel Heroes uses a number of different region and area generators for various cases:
Some regions are actually static and do not use procedural generation. But they still follow the same general rules and are built out of cells: regions like Avengers Tower can get away with using a single huge cell, while places like Midtown are collections of regular cells in a predefined layout, also known as a district.
Region generation heavily relies on data from the game database, so it won’t be out for a little while, but there may be something exciting happening in the experimental branch sooner.
Documentation
Another thing that happened this month is that I took some time to overhaul our documentation:
-
All existing information was restructured and updated with latest discoveries.
-
There are two new sections: Networking lays out some of our understanding of the client <-> server communication, while Game Data focuses on the intricacies of file formats, prototypes, and all the other game database related topics.
While it’s still nowhere near its final form, hopefully this will help potential developers get up to speed with what we’ve been doing. When I first started doing this, the lack of pretty much any publicly available technical information on Marvel Heroes was one of the hardest hurdles to overcome, especially when compared to many other well-documented online games.
If you have any suggestions for topics you would like to see covered, be sure to let us know!
Version Research
On a separate note, I’ve been doing some digging into the various versions of Marvel Heroes that we have access to. As you may know if you’ve read our documentation, there are almost 700 different client builds still available to download from Steam if you have ever played the game and still have a license on your account. So what I’ve been doing is downloading and documenting every single one of them. It’s a massive amount of data to go through, and you may wonder if it’s even worth doing. In my opinion, it is, for three reasons: preservation, getting additional data to reference, and finding internal builds.
Preservation should be pretty self-explanatory. As a live service game, Marvel Heroes went through a lot, and many of these moments were rather fleeting. Be it resource gathering to open the Bifrost for the very first time, some event you spent a few too many hours grinding through the night, or just a memorable login screen, to have definitive records of all of this we need old versions of the game.
Another important result of this task is getting additional data to reference, which can be extremely useful for development. A very good example of this is version 1.25 released on July 31, 2014. The following was mentioned in the patch notes by the infamous master of spoilers Doomsaw himself:
This update features major changes to how the game processes certain aspects of data, including how data it transmitted to you and players around you.
This will result in a decent improvement to server performance as well as client performance, depending on your exact system specs, CPU and GPU.
To explain what it really means we need to go through some fundamental aspects of the game’s netcode. At the heart of it are Google’s Protocol Buffers (protobufs): in case you are unfamiliar with them, they are kind of like XML or JSON, but instead of human-readable text files your data is serialized into binary using a custom wire format. They are widely used in video games (for example, Blizzard does a lot of Battle.net communication with protobufs), however they are a general-purpose technology that can be sub-optimal in cases where you have thousands of messages to process, and every microsecond counts.
To work around this, Gazillion developed a custom archive system based on the protobuf wire format. Without going too in-depth here, what they essentially did is cut all the extra bells and whistles that make protobufs more flexible while keeping all the tricks that reduce message size and serialization time. And a significant part of the optimizations done in 1.25 is actually taking some of the “heavier” and frequently used protobuf messages (NetMessageEntityCreate
, NetMessageLocomotionStateUpdate
, NetMessageActivatePower
, NetMessagePowerResult
, and NetMessageEntityEnterGameWorld
), and expanding the custom archive system to include them.
For example, let’s take a look at NetMessageEntityEnterGameWorld
as it was in 1.24:
message NetMessageEntityEnterGameWorld {
required uint64 entityId = 1;
required NetStructPoint3 position = 2;
optional NetStructPoint3 orientation = 3;
optional int32 avatarWorldInstanceId = 4;
optional NetStructLocomotionState locomotionState = 5;
optional uint64 entityPrototypeId = 6;
optional bool isClientEntityHidden = 7;
optional bool newOnServer = 8;
}
message NetStructLocomotionState {
optional uint32 locomotionflags = 1;
optional int32 method = 2;
optional float movespeed = 3;
optional uint32 height = 4;
optional uint64 followentityid = 5;
optional float followentityrange = 6;
required bool updatepathnodes = 7;
repeated NetStructLocomotionPathNode pathnodes = 8;
optional int32 pathgoalnodeindex = 9;
}
message NetStructLocomotionPathNode {
required NetStructPoint3 vertex = 1;
required int32 vertexSideRadius = 2;
}
And here is NetMessageEntityEnterGameWorld
in 1.25+:
message NetMessageEntityEnterGameWorld {
required bytes archiveData = 1;
}
As you can see, a side effect of this optimization is that some messages became a lot less verbose, and therefore less readable. Although it is possible to figure out the overall structure by decompiling and examining the deserialization routine, it takes more time to do blindly, and you have to guess most of the field names. So in some cases looking at older versions of the game can provide invaluable insight for reverse engineering the more recent builds.
Finally, not all builds are made equal. Most publicly available game clients are compiled using the Shipping
configuration that automatically removes all the spicy stuff, such as the developer console and cheats. On the other hand, Internal
builds contain everything but the kitchen sink (and even that after Deadpool’s level 52 review). For instance, MichaelMayhem and Ryolnir demonstrated some of their dev might during the Rogue preview livestream in August 2014:
Turns out, there was a number of slip-ups, and some of these internal builds actually ended up being uploaded to Steam. So far I’ve found five of them, with the earliest one being 1.10.0.69
from late May 2013, and the newest so far is 1.0.4932.0
from June 2015. While we have no way of getting them up and running right now, eventually it’s going to be possible, and we may end up being able to access forbidden powers reserved only for the few. Exciting!
And this wraps it up for today. Thanks for reading (or scrolling) all the way to the end, and I hope to see you all again next time!