The last issue of the journal of usability studies has recently been released. Among other papers, there is one about usability evaluation and game development:Do Usability Expert Evaluation and Testing Provide Novel and Useful Data For Game Development? by Sauli Laitinen (Journal of Usability Studies, Issue 2, Vol. 1, February 2006, pp. 64-75)

In sum, game developers were then asked to rate the findings of a usability test and give other feedback about the methods used and the results gained. Some results:

Practitioner's Take Away Traditional usability expert evaluation and testing provide novel and useful data for game development. All the usability specialists who participate in the usability expert evaluation of a game do not necessarily have to be double experts. When designing a game usability test it is important to notice that thinking aloud and interrupting the player are not always possible. Design the test so that there is a mixture of think aloud and uninterrupted play. The game developers are interested to learn about the user experience. Use post-test questionnaires and other survey methods to study the user experience. (...) In addition to the usefulness and face validity of the methods it was studied whether the usability experts participating in the game usability expert evaluation should be double experts. It was found that there was no significant difference in the number or the rated relevancy of the problem the gamer and non-gamer usability specialists found.

Why do I blog this since I work on user experience evaluation of video games (sometimes involving usability testing), this paper offers some thoughts about it. The field of usability testing in games seems to be more and more formalized lately. However, I have doubts about some of the issues the author raise. For instance using post-test questionnaires and other survey methods to study the user experience. I find way better to use interview (first open interviews and then semi-structured ones with probes); I also finf very useful to have self-confrontation of gamers to a replay of what they did. Of course, the cost of doing that is high but I found more valuable data from this kind of verbalization, coupled with both logfiles/games data and videos of players. Maybe, this is because we had enough time to do it and because I tend to favor the mix of both qualitative and quantitative data (in research and R&D projects but also testing).