Massively has an article about EVE Online’s server architecture and their plans for the future. The article is a great overview, and matches with notes I took from a GDC ’07 presentation CCP gave. I’m frankly impressed with CCP’s ability to get a max concurrent of 40k, but I really don’t think there’s much room left for improvement. Eve’s population is growing just slowly enough that the can keep up with it, but the fact that they’ve started putting in zone limits shows that even they realize this. Why does the EVE model work, and why can’t it go much farther?
First of all, they obviously have some solid programmers. Getting 40k concurrent on one database server is impressive, ESPECIALLY one based on SQL. My understanding is that they have some hardcore SQL programmers who write a lot of logic in higly optimized stored procedures. But, they still needed to buy a military-grade static ram hard drive to keep up, and I know they’ve been getting help from IBM and other companies to get performance as high as possible. So, database performance is stretched near breaking, but isn’t actually the current problem.
The problem is the performance on their application servers, or SOL servers. These are the ones that handle combat and all of the player interaction, and these have always been incredibly lagged. But, how are they able to get a few thousand people in a zone in the first place, without resorting to the client-side heavy method used by WoW (WoW does almost everything client side, which is why hacked servers are possible)? The answer is that it’s heavily optimized for automation. For instance, ship movement is not synced every frame, but is instead sent down only when players actually change parameters. In normal movement, the client solves complicated differential equations to predict the location, which works perfectly when you’re mining.
When does this model break down? It breaks down in the most complicated, hardest to optimize and yet most important part of the game: combat. During combat players are constantly changing movement and using powers, which kills all of their optimizations. I’m sure they’ve done work since launch, but interaction between players has always been deemphasized. From the GDC talk, I learned that the original version of combat in EVE was entirely deterministic, and it took a LOT of complaining from designers to make combat fun at all. So, the entire EVE architecture is designed to optimize highly parallel, noninteractive processes in the vein of a supercomputer. So how are they proposing they fix the performance problems with combat, which is the least parallel computing activity I know of?
As mentioned in the article, they want to fix this by… adding in a bunch of supercomputer features. The main thing they’re working on now is to set up Infiniband network connects to make it easier to swap processes between physical machines. I guess the idea is to split up the over-taxed zones between several physical machines, but this is going to be fiendishly complicated. My understanding is that large fleet battles include a large variety of connections between players, so splitting these up accross machines, even with a fast net connect, means that anything involving connections between players in different physical machines is going to be slow. They’re also going to have to rewrite a large chunk of their code.
Paralellizing multiplayer combat accross different processes and physical machines is an insanely complicated task, and I frankly don’t have much confidence that CCP will actually be able to do it effectively. I could be proven wrong, we’ll see if EVE is still having horrible combat performance problems in a year.