There are 3 primary Hotsapi usage scenarios:

  1. A public dataset on Google BigQuery. If you want to quickly run some queries on Hotsapi dataset without setting up your own database instance it can be using this dataset. Those queries are done in a familiar SQL and can be of any complexity without worrying about server performance. It functions in "requester pays" mode where Hotsapi pays only for data storage and users pay for the queries performed. Most casual users will probably fit into 1Tb/month free usage tier (though will still need to activate google cloud account to use it). This can be useful for ad hoc queries, posts like "patch ... 7 days later", and similar data mining efforts.

  2. A stream of parsed match details objects. If you only need the data that is already extracted by Hotsapi you can save on parsing replay files. You will need to keep your own database with downloaded replay data, and run all their queries against it. First you need to download a seed database dump and import it into your sql server. Then periodically poll /replays/parsed endpoint with min_parsed_id to get newly parsed replay data.

  3. A stream of raw replay files. If you need to extract some advanced data from replay files you can parse them yourself. First you will need to do batch download and parse existing files from our AWS S3 storage. The storage functions in "requester pays" mode where Hotsapi pays only for data storage and users pay for file downloads, downloads are free within the same AWS region (eu-west-1). Then you need to periodically poll /repays endpoint to get new replays.

Interacting with BigQuery

Our public dataset has id cloud-project-179020:hotsapi and can be found here. You will need to create Google Cloud account to use it. Queries against it cost $5 per Tb of data processed and the first Tb/month is free (most likely all your queries will fit into free tier). BigQuery like all data warehouses uses columnar storage format so it doesn't matter much how complex your query is but it does matter how many columns you are looking at.

Using BigQuery doesn't require you to install any software/servers, you can perform all queries from the web UI. It uses a dialect of SQL so it's easy to quickly start querying hotsapi data.

Hotsapi dataset contains denormalized data: a whole replay info is stored in a single table using nested columns. In a way it is similar to document (json) databases like Mongo. A human readable yaml schema of tables can be found here.

Database dumps

Since importing a full database dump in .sql format can take days or weeks, we split the dump into few parts:

The CSV files contain only parsed replays and are append-only since the data after parsing is mostly immutable (except some unimportant flags like deleted that will be out of sync with hotsapi current state). CSV files can be imported into a MySQL isntance using LOAD DATA INFILE statement, which works significantly faster than loading .sql files. Keep in mind that uncompressed data for MySQL instance can take more than 10x size of compressed .csv files. max_parsed_id file contains maximum parsed_id contained in this dump. Here's a full list of files for seed DB:

https://storage.googleapis.com/hotsapi/db/schema/heroes.sql.gz
https://storage.googleapis.com/hotsapi/db/schema/schema.sql.gz
https://storage.googleapis.com/hotsapi/db/data/replays.csv.gz
https://storage.googleapis.com/hotsapi/db/data/bans.csv.gz
https://storage.googleapis.com/hotsapi/db/data/players.csv.gz
https://storage.googleapis.com/hotsapi/db/data/scores.csv.gz
https://storage.googleapis.com/hotsapi/db/data/player_talent.csv.gz
https://storage.googleapis.com/hotsapi/db/data/max_parsed_id

Downloading replay files

All files are stored on Amazon S3 service. Currently it is in "Requester pays" mode which means that traffic fees are payed by clients that download files instead of website owner. This allows us to keep server costs low and run service even with low donations. S3 traffic is free if you download to an AWS EC2 instance in EU (Ireland) (eu-west-1) region or $0.09/GB if you download to non-amazon server. A good way for you to avoid costs is to launch a free tier EC2 instance, use it to download and analyze replays, and then stream results to your main website. In any case you will need an AWS account and authenticate every request to download files. Further documentation can be found here. If your downloads fail make sure you didn't forget to include x-amz-request-payer in your request header or make corresponding setting in your SDK.

To save costs we delete the old replay files from storage. The metadata in database is stored forever. We currently retain files fitting at least one of the following criteria:

API reference

Interactive API documentation can be found on this page