As previous posts might have already implied, I'm a gamer. That's why a lot of the work I do is directly related to games that I play and the data that evolves around them. For this article I've decided to give a little insight into steam leaderboards, specifically for a game known as Awesomenauts.
Locating the data
The data for steam leaderboards are available at the following url:
- http://steamcommunity.com/stats/
<appid>
/leaderboards/?xml=1
Since we will examine the leaderboard data for Awesomenauts we will replace <appid>
with 204300
.
Parsing
We also need a program that can parse xml. I am going use xmlstarlet. xmlstarlet has many different commands you can use for extracting and manipulating the data. The ones we will take a closer look at are sel
(select) and el
(elements). Starting out with the latter, it can be used to find out the elements that reside within the document. Let's find all unique elements:
% wget -qO- 'http://steamcommunity.com/stats/204300/leaderboards/?xml=1' | iconv -f iso-8859-15 -t utf-8 | xml el -u
xmlstarlet
was not pleased with the encoding of the document, hence the use oficonv
to convert it to utf-8.
The above command will return the following:
response
response/appFriendlyName
response/appID
response/leaderboard
response/leaderboard/display_name
response/leaderboard/displaytype
response/leaderboard/entries
response/leaderboard/lbid
response/leaderboard/name
response/leaderboard/onlyfriendsreads
response/leaderboard/onlytrustedwrites
response/leaderboard/sortmethod
response/leaderboard/url
response/leaderboardCount
Let's jump ahead a bit. What we want is the url where name starts with PLAYERRANK. To find these we can make use of the XPath function called starts-with().
% wget -qO- 'http://steamcommunity.com/stats/204300/leaderboards/?xml=1' | iconv -f iso-8859-15 -t utf-8 | \
xml sel -t -v "response/leaderboard[starts-with(name, 'PLAYERRANK')]/url"
Quick overview of the arguments passed to xml
:
-t
, or template which will give us access to the options following the argument-v
, print the value of the XPATH expression"response/leaderboard[starts-with(name, 'PLAYERRANK')]/url"
, following the path response/leaderboard we wish to retrieve the url of entries where the name starts with PLAYERRANK
Example output:
http://steamcommunity.com/stats/204300/leaderboards/89564/?xml=1
http://steamcommunity.com/stats/204300/leaderboards/145095/?xml=1
http://steamcommunity.com/stats/204300/leaderboards/331874/?xml=1
http://steamcommunity.com/stats/204300/leaderboards/397491/?xml=1
http://steamcommunity.com/stats/204300/leaderboards/483346/?xml=1
Extending
We now have the urls for leaderboards representing each season. Unfortunately they are scrambled but we catch a break as the value seems to increment for each season. Meaning all we need to do is sort this list according to the leaderboard id:
% wget -qO- 'http://steamcommunity.com/stats/204300/leaderboards/?xml=1' | iconv -f iso-8859-15 -t utf-8 | \
xml sel -t -v "response/leaderboard[starts-with(name, 'PLAYERRANK')]/url" | \
sed 's|.*leaderboards/\([^/]*\).*|\1|' | sort -n
sed
, extract the id from the url/leaderboards/<id>/?xml=1
While we're at it, let's declare this a function:
% function awsmboards() { ... code from above ... }
Because that will make what I'm about to do way more readable. I want to create an array of season ids in zsh:
% seasons=(${(f)"$(awsmboards)"})
So, what just happened? We performed Parameter Expansion within an array declaration where we called the function previously defined. The parameter expansion flag f splits the values at each newline, meaning each line of the output is its own element within the array.
Let's print the first 5 elements of the season array:
% echo ${seasons:0:5}
89564 145095 145658 165967 167738
At the time of writing this we're at the end of season 11. Awesomenauts very first season was 0 and since zsh array indexes starts from 1 that means that the id of season 11 should be found at ${seasons[12]}
. Let's have a look at the elements of this seasons document:
% wget -qO- "http://steamcommunity.com/stats/204300/leaderboards/${seasons[12]}/?xml=1" | xml el -u
response
response/appFriendlyName
response/appID
response/entries
response/entries/entry
response/entries/entry/details
response/entries/entry/rank
response/entries/entry/score
response/entries/entry/steamid
response/entries/entry/ugcid
response/entryEnd
response/entryStart
response/leaderboardID
response/nextRequestURL
response/resultCount
response/totalLeaderboardEntries
Fairly straight forward what all of the entries mean. Let's write a function that returns the rank of a specific steamid for this season:
function awsmrank() {
wget -qO- "http://steamcommunity.com/stats/204300/leaderboards/${seasons[12]}/?xml=1&steamid=$1" | \
xml sel -t -v "response/entries/entry[steamid=$1]/rank" -n
}
What's new here is an additional parameter to the url that specifies the user whos rank we are interested in finding out. This document will also include all of said users friends who are also ranked on the leaderboard. This is why the XPATH expression will specifically ask for the rank of the user of the steamid which will be passed as an argument to the function.
Let's try out our new function:
% awsmrank 12345678901234567
125
Another thing worth mentioning is the value at response/entries/entry/details
. Let's have a look at the details belonging to the user on the top of the leaderboard:
% wget -qO- "http://steamcommunity.com/stats/204300/leaderboards/${seasons[12]}/?xml=1&end=1" | xml sel -t -v "response/entries/entry/details" -n
0200000076080000a0050000a50e0000710400000a000000060000008f0100005c0000000800000000000000
What we are looking at are most likely information that are displayed on the leaderboard that doesn't fit into any of the elements. This would be stats including wins, losses and favourite naut.
% details=0200000076080000a0050000a50e0000710400000a000000060000008f0100005c0000000800000000000000
% echo $(( ${#details} % 8 ))
0
% for (( i=0;i<${#details};i+=8 )); do sub=${details:$i:8}; echo $sub; done
02000000
76080000
a0050000
a50e0000
71040000
0a000000
06000000
8f010000
5c000000
08000000
00000000
- We assign the value to a variable
- Let's assume every piece of information is 8 characters long
- Print each part on its own line
Seems we have 11 values. What are they? Having a look at the ApplicationPersistent.log
file located in the Awesomenauts directory reveals the following:
ColumnType: LCT_ENTRY_VERSION
ColumnType: LCT_WINS
ColumnType: LCT_LOSSES
ColumnType: LCT_KILLS
ColumnType: LCT_DEATHS
ColumnType: LCT_PRESTIGE_LEVEL
ColumnType: LCT_FAVORITE_CLASS_INDEX
ColumnType: LCT_SEASON_WINS
ColumnType: LCT_SEASON_LOSSES
ColumnType: LCT_PRESTIGE_ICON_CONTRIBUTION_PRIMARY
ColumnType: LCT_PRESTIGE_ICON_CONTRIBUTION_SECONDARY
Let's try converting the hexadecimal values:
% for (( i=0;i<${#details};i+=8 )); do
sub=${details:$i:8}
echo $(( 16#${sub:6:2}${sub:4:2}${sub:2:2}${sub:0:2} ))
done
2
2166
1440
3749
1137
10
6
399
92
8
0
- loop over the length of the variable
$details
in 8 character increments - assign a sub variable
- perform arithmetic evaluation and let the shell know it's a hexadecimal number
- change endianness (reverse the byte order)
A quick look at the leaderboard confirms that this is in fact correct.
Most of these columns don't need much explanation except LCT_FAVORITE_CLASS_INDEX and the ones regarding CONTRIBUTION. I'm more interested in finding out what different nauts people use so I've compiled a list:
arr[1]=froggy
arr[2]=lonestar
arr[5]=leon
arr[6]=clunk
arr[3]=voltar
arr[12]=gnaw
arr[8]=coco
arr[9]=skolldir
arr[4]=yuri
arr[11]=rae
arr[7]=derpl
arr[16]=vinnie
arr[18]=genji
arr[14]=ayla
arr[20]=swiggins
arr[21]=mcpain
arr[19]=penny
arr[22]=sentry
arr[23]=skree
With this information we could write yet another function. Let's find out what the top nauts are in League 1:
function nauts250() {
wget -qO- "http://steamcommunity.com/stats/204300/leaderboards/${seasons[12]}/?xml=1&end=251" | \
xml sel -t -v 'response/entries/entry/details' | \
while read details; do
echo $arr[$(( 16#${details:48:2} ))]
done | sort | uniq -c | sort -nr
}
- mostly the same as before, we change the end parameter in the url to only retrieve the first 250 entries
- this time around we're only interested in the data containing the naut and we know the specific location within the string
- sort, make sure list is unique and sort by number
Trying it out:
% nauts250
29 leon
29 froggy
28 lonestar
21 coco
19 skolldir
14 vinnie
14 ayla
12 rae
11 skree
11 penny
10 clunk
9 yuri
9 mcpain
8 genji
7 swiggins
7 sentry
6 gnaw
4 voltar
2 derpl