Bot Name: MetaBot
Bot Race: Protoss
Author Name(s): Anderson Tavares, Tiago Oliveira, Guilherme Oliveira, Luiz Chaimowicz
Affiliation(s): Anderson and Guilherme: Universidade Federal do Rio Grande do Sul (UFRGS), Tiago and Luiz: Universidade Federal de Minas Gerais (UFMG)
Nationality(s): Brazil
Occupation(s): Lecturer (Anderson), Professor (Luiz), Undergrad students (Tiago and Guilherme)

(These will be listed on the competition website)
Bot URL: https://github.com/andertavares/MetaBot
Personal URL:
Affiliation URL: www.inf.ufrgs.br -- www.dcc.ufmg.br

Questions about your bot (please answer as many as you can, especially Q 1-3)

Q: What is the overall strategy/strategies of your bot? Why did you choose them?

MetaBot is the successor of MegaBot. As it predecessor, it selects a sub-bot to play a match in its behalf. Its predecessor had the sub-bots' code integrated into a single, monolithic project. MetaBot has only the meta-reasoning module, and loads the sub-bots' DLL. Currently, MetaBot portfolio contains the AIIDE 2015 versions of AIUR, Ximp and Skynet. The predecessor MegaBot used Skynet, Xelnaga and NUSBot.

The meta-reasoning module implements minimax-Q, maintaining a payoff matrix with a line per portfolio member
and a column per tournament opponent. MetaBot does not know all opponents beforehand, so the columns are
added as new opponents are faced. 

When MetaBot selects portfolio member 'a' against opponent 'o', the payoff matrix V is updated as:

V[a, o] <- (1-alpha) * V[a, o] + alpha * result

Where result is +1 for victory, 0 for draw and -1 for defeat, and alpha in [0, 1] is the learning rate. This is the minimaxQ update rule, reduced to the single-shot case (no state information).

MetaBot uses epsilon-greedy to select the portfolio member to activate: select randomly with probability epsilon, or select greedily otherwise. Within the minimax-Q framework, acting greedily means to calculate the minimax policy for the current payoff matrix (which corresponds to Nash Equilibrium) and sample the selected portfolio member from the calculated policy. However, as we know the opponent, i.e., the 'column' of the matrix we're playing against, the minimax policy reduces to selecting the best-performing portfolio member against that opponent. In other words, the selection process reduces to a contextual bandit, where 'context' is the opponent we're playing against.

Although we could formulate the meta-reasoning directly as a contextual bandit, we formulate as minimaxQ because it handles naturally multi-stage settings (i.e. switch the activated portfolio member according to the state). We have plans of implementing mid-game switches in StarCraft (as we successfully did on microRTS - see [3]), but we had some technical issues (bots don't resume a game started by another bot) so this idea is held until we work around the issue.

Related publications:
[1] Tavares, Azpúrua, Santos, Chaimowicz. Rock, Paper, StarCraft: Strategy Selection in Real-Time Strategy Games. In Artificial Intelligence and Interactive Digital Entertainment (AIIDE), pages 93--99, October 2016.

[2] A. R. Tavares, D. K. S. Vieira, T. Negrisoli, and L. Chaimowicz. Algorithm Selection in Adversarial Settings: From Experiments to Tournaments in StarCraft. IEEE Transactions on Games, 2018 (to appear). DOI 10.1109/TG.2018.2880147

[3] Anderson Rocha Tavares, Sivasubramanian Anbalagan, Leandro Soriano Marcolino, and Luiz Chaimowicz. Algorithms or Actions? A Study in Large-Scale Reinforcement Learning. In International Joint Conference on Artificial Intelligence (IJCAI), pages 2717--2723.

Q: Did you incorporate any of the following AI techniques in your bot?
   If you did, please be as specific as possible
   
   a) Search-Based AI (Path-Finding, A*, MiniMax, MCTS, etc)
No

   b) Offline Machine Learning (Supervised or Unsupervised, but not RL)
No

   c) Offline Reinforcement Learning 
No, although nothing prevents we from training MetaBot to start with some knowledge against existing bots.

   d) Online Learning of any kind (Including competition file IO for strategy selection)
Yes, as outlined in the bot description (minimaxQ reduced to contextual bandit).

   e) Influence Maps
No

   f) Custom Map Analysis
Not in the meta-reasoning code, but yes in the portfolio bots.

   g) Hard-coded or rule-based strategy / tactics
Not in the meta-reasoning code, but the bots in the portfolio use some rule-based strategies.

   h) Analysis of bots from previous competitions / hard-coded specific bot counter strategies
No

   i) Any techniques not mentioned here
No

Q: How did you become interested in Starcraft AI?
StarCraft poses challenges to AI techniques in many directions. And it is engaging since I enjoyed playing the game back in the day.

Q: How long have you been working on your bot?
The bot exists for about two years, but direct work in it has not surpassed a couple of months.

Q: About how many lines of code is your bot?
All answers are according to the reported by cloc command line utility, excluding comments and blank lines:

The meta-reasoning code has 5837 lines of code. This is the code of MetaBot, as it loads the sub-bots' dll to play the game. The reported numbers below are from the source code of the current bots in the portfolio:
Skynet: 16562
Ximp: 7502
AIUR: 19611


Q: Why did you choose the race of your bot?

Protoss had bots that interacted in cyclical ways - like rock, paper, scissors (see [1,2] in the publications mentioned before) in previous tournaments and we thought that they would make good components for our portfolio.

Q: Did you use any existing code as the basis for your bot? If so, why, and what did you change?

The meta-reasoning code is almost the same as MegaBot. We only adapted the parts that activate the sub-bots to load their dlls.
The use of the sub-bots code is not mentioned because MetaBot loads their compiled dlls as black-boxes. Their source code is submitted to tournaments due to the openness rule.

Q: What do you feel are the strongest and weakest parts of your bot's overall performance?

The strongest part is the ability to activate the appropriate portfolio component to the context. 
The weakest part is that, so far, the context is only the opponent's identification. MetaBot is still unable to select a new bot during the game according to the current state information. This due to technical issues (the bots don't resume a game started by another bot).

Q: If you competed in previous tournaments, what did you change for this year's entry?
We changed from a monolithic, all-sources-incorporated project to a modular, dll-loading project. We also changed the portfolio: Xelnaga and NUSBot were replaced by Ximp and AIUR.

Q: Have you tested your bot against humans? If so, how did it go?
No tests against humans so far.

Q: Any fun or interesting stories about the development / testing of your bot?
We were having a hard time developing a StarCraft bot, then the idea of MegaBot/MetaBot came: "what if we use the available bots rather than develop a new one?". We were aware that bots had their strengths and weaknesses, so a meta-reasoning module would be able to recognize those, mapping each situation to the strongest bot available in the portfolio.

We were happy that it placed among the top 50% bots in CIG and AIIDE StarCraft AI tournaments in 2016 (its debut year).

Q: Any other projects you're working on that you'd like to advertise?

We successfully tested the mid-game switching approach in microRTS (see [3]), and hopefully we'll port it to MetaBot in StarCraft.

Optional Opinion Questions:

Q: What is your opinion on the current state of StarCraft AI? How long do you think before computers can beat humans in a best-of-7 match?

I think that ideas from Open AI Five might leverage bots performance, although only big companies would have the required infrastructure for that. However, in as much as in chess, some time after, more sophisticated ideas will allow the same level of performance on a much lighter infrastructure.

Q: What do you feel is the biggest hurdle (technological or otherwise) in improving your bot's AI?

The bots we tested so far are not able to catch an ongoing game and resume it (like they're not "Markovian"). Of course, it is not a fault of their developers, it is just that we were trying to use them on a scenario they were not designed for.

Q: Which bots are the most interesting to you and why?

I'm a little bit outdated on newer bot details, but from the ones I know (2017 and back), UAlbertaBot was the most interesting because it was able to employ actual scientific approaches and still reach good performance.

AIIDE Specific Question:

Q: Do you feel that the current format of iterated round-robin win percentage
   is a good indicator of bot skill ranking? If not, how would you change it?

I'm happy with the current format. I'd only suggest some other 'honorable mention' categories, for example:  highest learning curve, shortest code (in terms of lines or commands), etc. 

It would also be fun to have a small developer tournament, where all developers present at the conference would play to see whether they're better as programmers or players :)