GDC09: Advanced Data Mining and Intelligence from Large-Scale Game Data
Posted by Ben Zeigler on March 27, 2009
Here are my notes from the GDC 09 session Advanced Data Mining and Intelligence from Large-Scale Game Data. As an overview, it was a discussion of academic analysis of Everquest 2 log data, co-presented by Dmitri Williams from USC and Bruce Ferguson from Sony Online Entertainment. I very well may have written something down incorrectly, so feel free to correct me in the comments.
- Data analyzed came from a combination of database, chat, and gameplay log files, as well as survey data collected directly be the researchers. Data had to be cleaned of personal information, but still collatable. Survey data could be combined with log data to compare reporting to actual behavior.
- An in-game item was provided as incentive for completing the surveys, and this proved to be more effective than more traditional small cash payments for research participation.
- Data was analyzed by a team of one half-time coder, and a group of 20 researchers with varying degrees of involvement.
- Storage needed ended up being about 3x the data set, due to analysis techniques. Analysis was performed on beefy machines using some custom code on top of SQL databases.
- The NSF and US Army provided most of the funding for the project, a total of $1.5 million. Rationale for funding was to study how this could help team dynamics.
- In general, play time increased as age increased. Much of the time spent playing EQ2 appears to have come at the expense of Television, and not at the expense of social or other activities.
- Female players were fewer (80-20), but tended to play the game more on average, and enjoyed their time more. However, they tended to under report play time more (3 hours under reported vs 1 hour for males)
- Hardcore roleplayers took up about 5% of sample size. In general they tended to be unhappier vs non roleplayers, but this appears to be explainable by the fact that a larger percentage are from marginalized social groups, such as those with disabilities. Unhappiness seems to lead to role playing, instead of the other way around
- The data was useful for basic economic analysis. Because they had access to 100% accurate measures, it was good for studying the relationship between things such as GDP and Price stability
- Group size and performance was measured. Solo play was 16% of content, with average XP of 68 per content-unit. 6-player groups was 23% of content with average XP of 78. Average XP was lowest for 4 player teams, despite them still being fairly popular. This could be a gameplay balance problem, and is worth looking in to.
- Political leaning was studied. Players with more moderate political beliefs generally ended up in better groups than those with more extreme beliefs, plausible that extreme beliefs could alienate group members.
- Just started on research into Gold Farmers. Research was requested by Sony, and they have not done any explicit study of the actual harm of Farmers.
- 4 Types of accounts: Collectors who get the money and materials, Mules who hold it, Spammers who advertise, and Traders who do the final transaction. Each had different properties
- As an early filter, many gold farmers were from Alaska or Antarctica (first in list). Simple guessing could filter out a large %
- More advanced methods were discussed (Regression analysis, Brute force pattern matching, Network analysis) but nothing conclusive was available because research is still ongoing.
Overall, I quite enjoyed the talk. The mix of academic info and practical knowledge was perfect for me, and the speakers were engaging. It was a good use of my time and I will seek out any future talks on this subject.
3 Responses to “GDC09: Advanced Data Mining and Intelligence from Large-Scale Game Data”
Sorry, the comment form is closed at this time.