From: AI 2015 <ai2015@easychair.org>
Date: Sat, Feb 28, 2015 at 9:11 PM
Subject: AI 2015 notification for paper 58
To: Scott Watson <scott.a.a.watson@gmail.com>


Dear Scott,

Following up on a previous email, we are sending you the reviewers' scores,
to supplement the narrative reviews you already received.
Hopefully you will find them useful.

Please note that there was discussion among the PC before the decision was
reached. It was more than a pure numerical exercise, as it depended on the
confidence of the reviewers, and the completeness and degree of elaboration
and justification of their review scores. Strong positive or negative scores
without adequate justification were discounted during the deliberations.

Regards,
Evangelos & Denilson


----------------------- REVIEW 1 ---------------------
PAPER: 58
TITLE: Exploring Options for Efficiently Evaluating the Playability of
Computer Game Agents
AUTHORS: Todd Wareham and Scott Watson

OVERALL EVALUATION: -2 (reject)
REVIEWER'S CONFIDENCE: 4 (high)
Is the research objective clearly stated?: 3 (fair)
Is the description of the methodology clear?: 2 (poor)
How is the readability?: 2 (poor)
How is the organization?: 3 (fair)
Is related work adequately referenced?: 3 (fair)
Are the results / claims compelling and well supported by experimental
evaluation or proofs?: 2 (poor)
How is the originality / novelty?: 2 (poor)
Is the paper technically sound?: 2 (poor)

----------- REVIEW -----------
This paper studies game playability evaluation.  It is intended to
analyze when such evaluation is and is not tractable.  However, no
rigorous analysis is presented as I elaborate below.

First, the types of games they consider are not clearly presented.
Section 2 attempts to define the games where agents can only
exchange items and facts.  What practical games can be represented
as such is not clearly articulated.  The example given in page 4
does not illustrate the issue clearly.  What is the objective of the
game?  What are the consequences of the actions?  As is presented,
the interaction sequences presented in Fig.2 make no sense to the
reader.

In Sections 3, 4 and 5, a total of 7 complexity results on
playability evaluation are presented.  None of them has a proof
nor sufficiently convincing justification.  The readers are
supposed to just accept them as being correct.  As a result, the
paper contains no convincing contribution at all to the issue
of playability evaluation.


----------------------- REVIEW 2 ---------------------
PAPER: 58
TITLE: Exploring Options for Efficiently Evaluating the Playability of
Computer Game Agents
AUTHORS: Todd Wareham and Scott Watson

OVERALL EVALUATION: -1 (weak reject)
REVIEWER'S CONFIDENCE: 4 (high)
Is the research objective clearly stated?: 3 (fair)
Is the description of the methodology clear?: 2 (poor)
How is the readability?: 3 (fair)
How is the organization?: 3 (fair)
Is related work adequately referenced?: 4 (good)
Are the results / claims compelling and well supported by experimental
evaluation or proofs?: 1 (very poor)
How is the originality / novelty?: 3 (fair)
Is the paper technically sound?: 2 (poor)

----------- REVIEW -----------
In this paper, Wareham and Watson present some theoretical results around
the difficulty of essentially testing interactions between an NPC and a
player
for achieving given goals. As expected, without quite substantial
limitations
on the problem, this problem is intractable in the usual theoretical sense,
i.e. being solvable in polynomial time.

Since these results are rather expected, many people might think that
this research is not interesting, especially given the fact that from a
practical point of view (i.e. the view of game developers) general
intractability has little impact, as long as the particular algorithm
used to create NPCs comes with guarantees that the necessary interactions
of a player with an NPC to achieve goals are possible (by, for example,
requiring that limited searches lead to finding the right interactions).
Despite that, I would be leaning to accept the paper, if the theory part
would be stronger or, in other words, if the proofs or at least proof
sketches were provided. To me, these proof sketches would be the
contribution
of the paper and naturally the contribution should be in the paper (and
not requiring looking them up on the Internet). I am aware that 12 pages
are a big challenge for this, but if the authors need more space, they
should consider a conference allowing for more pages (or a journal).

In its current form, the paper does not provide any interesting contribution
to me although there is clearly potential. Hence my weak reject.


----------------------- REVIEW 3 ---------------------
PAPER: 58
TITLE: Exploring Options for Efficiently Evaluating the Playability of
Computer Game Agents
AUTHORS: Todd Wareham and Scott Watson

OVERALL EVALUATION: 0 (borderline paper)
REVIEWER'S CONFIDENCE: 3 (medium)
Is the research objective clearly stated?: 4 (good)
Is the description of the methodology clear?: 3 (fair)
How is the readability?: 3 (fair)
How is the organization?: 3 (fair)
Is related work adequately referenced?: 3 (fair)
Are the results / claims compelling and well supported by experimental
evaluation or proofs?: 2 (poor)
How is the originality / novelty?: 2 (poor)
Is the paper technically sound?: 3 (fair)

----------- REVIEW -----------
The paper presents proofs that in most cases, evaluating agent playability
is NP-hard and not fixed-parameter tractable.  The proofs appear to be
correct, but there are some problems with the paper that need to be
addressed:

- The results appear to be quite straightforward, even though the setting
used is not as general as other intractabilty results from multi-agent
systems.  The proof techniques used are standard.  Therefore, it is not
clear what is the novelty of this approach.

- The results show that only unrealistic cases are tractable.  the paper
discusses this issue and suggests that the results provide ways of breaking
a game into solvable pieces, but the argument presented is still quite
weak.  In practice, using simulations as well as human testers is a
methodology which has been used quite successfully, and will likely to
continue to be used.  This is akin to the fact that in theory, proving that
a program is correct or confirms to a specification is very hard, but in
practice, people use test cases and establish correctness anyway (albeit
not in a formal manner). Hence, it is likely that these results will have
very little practical impact.  The issue of significance of the results
from a practical point of view needs to be discussed further.

- While the paper is generally well written, it is troublesome that all the
proofs are in the appendix.  It would be better if the example agents would
be significantly shortened and replaced with an overview of the main proof
steps, as the proofs appear to be the main contribution.