Playing around with the s2protocol

Finally, last week Blizzard has released something official to parse replays properly: s2protocol. Before that, there were a bunch of scripts parsing StarCraft 2 replays more or less successfully. But with every SC2 patch, you had to be afraid, that you’re script isn’t working anymore.
Blizzard’s now saying that they’re supporting every old and every new version of the replays:

s2protocol supports all StarCraft II replay files that were written with retail versions of the game. The current plan is to support all future publicly released versions, including public betas.

The s2protocol is written in python and comes along with a small python script to parse a replay already and print the results as python-objects/dicts, that’s pretty cool for easy and quick testing of it’s abilities.
The great thing is, that it provides you with such a huge amount of data, from the lobby states (slots, etc.), over player settings, battlenet-IDs, regions and for sure all actions done in the match from moving the camera, selecting, moving and building units, chat messages, etc… Ah and for sure the players and the results ;) And muuuuch more!
The horrible thing is: It’s not documented! It… is… NOT… DOCUMENTED!! And this can drive you crazy! I was facing a couple of problems for which I haven’t found any solutions in the web yet (neither temliquid nor day9…) but which I was able to solve. And I want to to tell you about what I have found out until yet:

General info

First I want to tell you what you can ecpect from the different data the script is giving you:

Trackerevents

Everything related to the units/infrastructure of a player. Here you can find events when a unit is completed, has died, was moved or an upgrade is finished.
One thing which took me a while to understand is, what they write at GitHub for tracker events:

NNet.Replay.Tracker.SUnitInitEvent events appear for units under construction. When complete you’ll see a NNet.Replay.Tracker.SUnitDoneEvent with the same unit tag.
NNet.Replay.Tracker.SUnitBornEvent events appear for units that are created fully constructed.

A unit is not only a Probe, a Medivac or an Ultralisk. Also buildings are units! And I was asking myself “Which army-unit is appearing fully constructed? Every unit has a production time, except the MULE”. But you have to think from the replays perspective: For the replay, every army-unit appears fully constructed, because when it’s in the build process, it’s not appearing on the map!
So for army-units, you will only receive a NNet.Replay.Tracker.SUnitBornEvent event, For buildings, you will receive a NNet.Replay.Tracker.SUnitInitEvent event when the build process is initiated and a NNet.Replay.Tracker.SUnitDoneEvent when the building is complete!

A complete list of events can be found here: https://github.com/Blizzard/s2protocol/blob/master/protocol24944.py#L307

Gameevents

This is basically everything and more what counts into the APM of a player. Selecting units, grouping units, moving them, moving the camera, etc.

A complete list of events can be found here: https://github.com/Blizzard/s2protocol/blob/master/protocol24944.py#L204

Details

General match information like the mapname, matchtime, playernames and the match result. See the “Problems & solutions” section for how to get full player info and what to to with the matchtime!

Header

Contains the elapsed gameloops and the client version of the replay.

Initdata

This is the biggest information pool, I think. It contains basically the whole lobby data, including the slots, it’s settings, and many information about the player: his settings, bnet ID, if he’s using an own layout, and so on… But more about this initdata in the “Problems & solutions” section for how to get full player info.

Messageevents

Yeah, all chat messages :) And I think also the ingame messages like “game paused” but not sure yet.

Attributeevents

Haven’t taken a look into that yet, will give you an update on this in the next days.

Problems & solutions

What is this weird value inside m_timeUTC?

If you want to find out the matchtime you think “Ah, there is this m_timeUTC field with a timestamp in it… I just convert this and… WTF!?”. Yes: WTF! This thing doesn’t look anything like a UNIX timestamp! To better understand:
UNIX timestamp: 1368796283
A value inside m_timeUTC: 130132642839470955
After a bit of research a college was able to help me out: It is a Windows NT timesamp :) Which means:

Windows NT time is specified as the number of 100 nanosecond intervals since January 1st, 1601. UNIX time is specified as the number of seconds since January 1st, 1970. There are 134,774 days (or 11,644,473,600 seconds) between these dates.

Funny, heh? So to get a UNIX timestamp out of it, you have to divide it by 10,000,000 and subtract 11,644,473,600 from it. Now you can use it outside of Windows applications ;)

And what to do with the m_timeLocalOffset field?

Yeah, that’s also a bit weird, but if you know how to handle the m_timeUTC field above, it’s also “easy” to decode this field:
This is, as expected, the UTC timezone offset for the matchtime in hours, but (as the m_timeUTC field) it’s stored in 100 nanoseconds! So to convert it back to hours, simply divide it by 36000000000 (= 60*60*10^7 (10^7, because we need the 100 nanoseconds!))! That’s it! :)

playerID <=> userID? Two IDs?

Yes, two IDs! A player has two IDs in a replay:

  • A userID => Used in the lobby and to identify the users gameevents
  • A playerID => Used in the playerslist in the details section and to identify the players trackerevents

Every user in the lobby has a userID, for sure. That means every observer, every AI and every player! You can find the userID inside the slotlist which can be found in the initdata section.
A playerID instead is only available for… guess it… players, correct! :) The playerID is currently guessed: For the five test replays I have it works everytime: It is the index of the player inside the playerslist of the details section starting with the 1!

How to get a “complete” player and the userID? And what is this toon?

If you take a look at all these gameevents for example, they all have a m_userId field but in the playerlist inside the data of the details section, there is no userID! That’s why… Man, I don’t know… But I know where you can find it:
Inside the initdata section there are two lists including some information:

  • the whole lobbydata including the slots and
  • the initial data of a player, whatever that means…

The initial player data for example includes the player name, the clantag and other information. But the most useful information is inside the slotlist: It contains the userID! With this userID, you know which events belong to which player, finally \o/ And with that knowledge, you can reference this initial player data, because the index of an element in this list is the userID (this is also guessed, but worked for all my tests)!
But again: How to reference from the slotdata to that player inside the playerlist you got from the details section (for example getting the result of a player)? There’s no userID, like I mentioned before. But I found out, that a replay knows about something called toon. It seems to be a unique identifier through the whole battlenet (If  you know more of that, please let me know! Even, what toon means ;) ). It consists of the players

  • Region
  • Program ID (S2 for StarCraft 2)
  • Realm
  • ID (the battlenet ID)

The format is

1
$playerToon = m_region-m_programId-m_realm-m_id

So for example you get something like this: 2-S2-1-1234567
With this information you can reference to an entry inside the playerlist, because the slot has the field m_toonHandle and the playerlist as a field m_toon which contains the toon in pieces.

What’s a gameloop and how to convert them into seconds?

Every event has the field _gameloop which contains the “time” when it happened in the game.  Converting the gameloop is not necessary to detect the build order, but necessary to display it in a human readable format: in hours/minutes/seconds, because it’s better to say “He build the cybernetics core at second X”!
And wow, that took me a time to find out, how to convert it! After multiple tries of Google  research, I finally found a statement from a blizzard developer (“Rotidecs”. And no, the user “turtles” is not me or somebody else from Turtle Entertainment ;) ):

Note that you can use a wait time of zero to wait for the smallest possible amount of time, which is one game loop (or 1/16 of a second).
(source: http://us.battle.net/sc2/en/forum/topic/7004015250#2

So my guess was valid: 16 gameloops are happening in a second, so just divide the gameloop by 16 and you know, which second it is. Also note, that this are ingame seconds, so it depends on the gamespeed! If you, for example divide the m_elapsedGameLoops of the header section by 16, you will get the matchduration in ingame time! To find out how to convert this ingame time to the real seconds happened, please see this article at Liquipedia: http://wiki.teamliquid.net/starcraft2/Game_Speed

Still unknown

Which events are counting into the APM?

Currently, I try to reconstruct the average APM in a match, which can be seen in the APM tab in the replay. But I still don’t know which events to count in… At the moment I’m counting only these gameevents:

  • ‘NNet.Game.SSelectionDeltaEvent’,
  • ‘NNet.Game.SCmdEvent’,
  • ‘NNet.Game.SControlGroupUpdateEvent’,
  • ‘NNet.Game.SGameUserLeaveEvent’

But I don’t know if this is correct and if I have to count also some trackerevents… Or multiply it by the gamespeed difference, as mentioned here (what makes not much sense): http://wiki.teamliquid.net/starcraft2/APM
As far as I’m able to reconstruct it, I’ll let you know!

 

Sooo, I think that’s it from me at this point! If you have any feedback or also found out something, please feel free to contact me or leave a comment.
I hope this was a bit helpful to start working with the s2protocol!

Greetings,

Andy! 

[UPDATE]

I found out, that a user/player has two different IDs in a replay! I’ve edited the article accordingly.

[UPDATE 2]

I found out, how to convert the gameloop and I’ve added my current problem, the APM. I’ve edited the article accordingly.

[UPDATE 3]

And more found out: Now I can also read the value inside the m_timeLocalOffset of the details section. See above!

Author: Andreas Hofmann

  • Sebastian

    I’m currently working on it, too, so i’d love to see more updates on this :)
    I struggled at first with the python objects, since i wanted to use another language, but i got it to export as JSON (with some encoding problems at first).
    Currently my attempt to userid vs playerid is incrementing the userid by one, because the playerid 0 is the environment. guess it’s the same way you do it in practice.
    At least i can already create nice benchmarks :)
    gl with your further work!

    • Andreas Hofmann

      Hey Sebastian,

      yeah, I encountered these encoding problems, too. You have to drop these ‘m_cacheHandles’ value from the initData event:

      del initdata['m_syncLobbyState']['m_gameDescription']['m_cacheHandles']

      Also, in replays older than 2.0.8, the cacheHandles are also in the details event, so do this than:

      del details['m_cacheHandles']

      I also started with a language, that I have worked more with as with Python: PHP. But yesterday I started to rewrite my whole code in Python now. It’s 10 times faster than converting it to json in python, calling the python script from php, fetching the output, parsing the output/json and then process it ;) But it was only for testing, I knew I would do it on python someday ;)
      I’ll keep posting if I find out something. I would also love to hear more, what you have found out!

      Greeting from Cologne,

      Andy!

      • sudosu

        Hi, first of all thanks a lot for making it open source.

        I’m working on the python php binding to get the replay results and make everything functional in my website (just need to get back a little bit in python it’s been a while). Have you already some piece of code doing this ? It would be interesting.

        regards.

        • http://andy-hofmann.com/ Andreas Hofmann

          Hey sudosu,

          what do you mean with python-php-binding? You mean, that you upload a replay on your php powered website and receive/parse it in your python script1? We’re using a message broker for this: RabbitMQ. But this might be a bit overkill for you to set up (depending on your website size and infrastructure) ;)

          But a small introduction how this works:
          With a message queue like RabbitMQ, you can define exchanges and queues where you can listen on (like a mailbox). If a user uploaded a replay in your php script, you can send a message into this MQ-queue to say “hey, here’s a replay! Parse it and tell me back!”. Your python script will listen on this queue, receives the message, parses the replay and throws a message back into another queue saying “hey I’m finished!”.

          The easiest way will be to call the python script from within your php script, with popen(), exec(), system(), whatever :)

          Hope I could help you!? If you have more questions, just ask :)

          Greetings,

          Andy!

          • sudosu

            I meant exactly what you just described (obviously didn’t meant implementing a php-python binding :p).

            I just wondered how anyone else would implement this :).
            It seems producing a json, fetching and parsing it is the better way to use it in a simple php-based app that do not have any specific performance requirements.

          • Sebastian

            beware! it is not a json, even if it looks like one. It is a python-object.
            You can parse it somehow with some regexp, but i wouldn’t recommend it.

            Ive done it the following way: after uploading a replay, the python script get called by php (via exec). The python script itself got modified by me, ive changed the return to real json. furthermore i had to drop some data, which contained weird, undecodable stuff.

            The return-data of the python now gets saved in a temp variable (keep in mind, that some replays can produce big data, eg. a goody replay with 10 observers in it. I had do allow more memory for this script).

            Now you can decode it as json and have fun with the data :)

          • sudosu

            Thanks for clearing that up.

            What was the data you had to drop ?

            Any chance to be able to take a look at your modified python script ?

          • http://andy-hofmann.com/ Andreas Hofmann

            I think he’s talking about the cache handles, we discussed earlier. I’m also dropping them, as you can see here:
            https://github.com/TurtleEntertainment/aiur/blob/master/teSc2ReplayParser.py#L95

            In older replays, they also appear in the details, but (at least since 2.0.8, maybe earlier) currently they only appear in the initData.

          • sudosu

            yeah okay, thanks for being helpful guys :)

          • http://andy-hofmann.com/ Andreas Hofmann

            No problem, you’re welcome! Feel free to ask.

            Maybe you haven’t seen, but we released the code, I was talking about in this article. Can be found here: https://github.com/TurtleEntertainment/aiur

            It’s a small solution which gives you common information you need for a replay/match. Just make an instance of teSc2ReplayParser, pass the filename to it and call getMatchDetails(). This will give you a dict containing many information about the match.

            Feel free to use, fork and develop :)

          • http://andy-hofmann.com/ Andreas Hofmann

            Correct :) Looks like JSON, but it isnt. It’s a dict.
            @sudosu:disqus: To convert/export it to JSON, import the json module at the top and call dumps(). For example:

            import json
            json.dumps(myMatchDict)

            This will output it as pure JSON.

  • Sebastian

    Update to the tracker-events:
    It seems like not only buildings generate init/done events (while units do have the unitBorn event), but also warped in units by the protoss player. I guess the logic behind this is, that they already appear on the map while warping in.
    This makes it more diffucult to distinguish betweend buildings and units.
    I guess i have to go for the not so elegant way of mapping unit/building names to types.
    I wonder if zerg units also have the init and done events, since they appear already while constructing as a cocoon.
    Have to check that later on