The Problem with Damage Simulators

By BigStupidJellyfish | 2022-February-28 | Updated 2022-August-13

If it disagrees with experiment, it's wrong.

-Richard Feynman

Girls' Frontline has a lot of dolls, with all sorts of unique mechanics, skills, tile combinations, and more to influence combat. To provide people with better ways of understanding how echelons perform and craft stronger formations, a few online damage simulators were developed. The only reasonably current/functional sim is Hycdes, but before that there was also the Brainlets damage sim (at some point picked up by Matsuda, and eventually abandoned.) Either way, they work in a similar fashion - place some dolls in formation, edit their equipment and fairy, define enemy stats, and get some damage-over-time numbers.

An example setup.

I get the feeling that, for a lot of people, these results are considered an important part of building and evaluating teams. Today, I'd like to talk about the limitations such sims face - any such program, in fact. While Hycdes is the only currently usable damage simulator I can reference, this is not a matter exclusive to it.

Sanity Check^#

Starting with an easy question: do these simulators get the right numbers at all? The closest in-game setup I can think of to a simulator is an echelon of dolls with no special skills, against a lone Target Practice dummy. We can compare the actual clear times against the time a damage simulator lists to deal that much damage, and see if they line up.

Damage variance, accuracy/crit RNG, and walk time differences will mean getting an exact match is unlikely, no matter how accurate the simulator is. You can click the "100x average" button in Hycdes several times and see a bit of variance even after taking that many trials, or "Run Once" for even more variation. Longer battles will help reduce the influence of these factors.

Results look pretty good. There is a minor difference with the walk times: with the dolls in a forward F formation, they appear to begin damaging the dummy almost immediately. The usual start-of-battle lag makes it difficult to measure exactly how much time this takes, especially as battle seems to begin before the loading screen even starts to open. This experiment was done back when we still had the brainlets sim, which showed 0.43s passing before the first bit of damage is dealt. The "adjusted prediction" subtracts that difference, and the average measured performance seems to consistently fall inbetween the two.

We can also look at its numbers in reference to hand calculations. Using my typical methods for calculating DPS, results are within 1-2% of Hycdes' 100-run averages.

That's far smaller a difference than anything that could reasonably be detected in normal combat. (To further simplify the scenario, a large delay was added to G41's skill so that it's effectively turned off.)

You'll have to take my word for it now, but the Brainlets sim was also accurate enough in this regard.

Midgame SF^#

(Note: the following, and previous, tests are lifted from an earlier draft of this writeup before chips were implemented. Exact values will vary, but the same principles still apply.)

So, we can expect reasonably accurate numbers from a damage simulator when everything we're doing lines up with its assumptions. Let's add some difficulty and replace the stationary dummy with some real enemies. They fight back, have far more dummies, and will get weaker as you kill their links. We're still using basic direct-fire dolls and aren't kiting.

Problems show up rather quickly. Overkill, link protection, and so on will make enemies take longer to kill. AN-94's skill has a passive component the simulator doesn't model, so she should perform better in practice. Despite that, real clear times were 15-20% slower than the simulator prediction across the board (whereas if AN-94's skill was the dominant factor we'd expect to clear faster in practice).

While the simulator cannot give any sort of estimate of this, I've also charted the damage taken against each enemy to the right. This is often a much more important metric than clear time.

Let's replace AN-94 with CZ2000 to see if we get any better predictions, and swap out the enemies for a new roster from 10-3E.

Still mostly in the 15-20% error range. The estimate for id1586 was a bit closer (-11%), but that's also with a fairly wild evasion value plugged in - there's a lot of Guards with 0 evasion, and some Scouts with 80 evasion. Taking the HP-weighted average evasion is not awful, but it's certainly not accurate. It's entirely possible this estimate is simply bad in the opposite direction so the estimate is moved closer to the true time.

From bad to worse^#

Damage simulators start having issues even in the simplest possible real battles. Combat in Girls' Frontline goes much farther than that.

Grenades

Suppose you want to use 416 mod in your ARSMG. She launches a grenade 6 seconds in, doing 16x damage to each link it hits. A simulator will assume some arbitrary number of enemy links here, and multiply that by 16 and report that as its damage.

Now, consider whatever enemy group you're up against. Suppose it's something like this:

How many enemies will 416's grenade hit? You could say that there are 3+4=7 groups of 5-linked enemies, but you'll obviously have killed at least some of the brutes. But what if there's one partially-linked Brute left, and it takes the whole grenade? What if you kill that Brute just in time and her grenade instead deletes every single Striker?

Either way, those enemies pictured have at most 1140 HP/link. With no external buffs and all SPEQs, 416 has 111 FP. Her grenade will deal on average 1776 damage per link, already massively overkilling whatever it does hit. But damage simulators don't know this, and will tell you that every FP buff you add to 416 will continue sending her grenade damage higher and higher.

Incidentally, as her grenade will always kill anything it hits in that example, we know her second skill will trigger a secondary explosion. Damage simulators, again, don't know this. They might offer you a toggle box to choose between that effect and the damage amp option, but in many cases the result isn't guaranteed. (And how much is the damage sim helping you if you need to carefully check the mechanics of each individual fight to configure it?)

This is one of the larger sources of error that cause damage simulators to assign bigger numbers to 2AR1HG backlines over 3AR ones, despite the latter typically performing better in actual combat.

Does the left echelon look like a good team? Is it twice as good as the right echelon? Will 4 Shiki deal double M14's damage? Do those numbers even mean anything?

Piercing

It is completely impossible for damage simulators to ever give you even vaguely accurate numbers corresponding to the performance of dolls with piercing skills: Type88 mod, 4 Shiki, Kord, and so on. Kiting pattern variations, hitbox spaghetti, varying enemy ranges and starting positions, and so much more mean the number of enemies pierced each fight is only discoverable through observation and testing.

Targeting

When shooting a single infinite-health dummy, there's hardly a way to distinguish different targeting types. They clearly matter, however - AUG tends to do better than Angelica (pre-chip) from her random spray, despite a theoretical DPS disadvantage. Liu's targeting modes can significantly change your results, despite having similar on-paper DPS. ARSMG teams tend to suffer against Guard/Jaeger comps as the Jaegers have plenty of time to rip your SMGs apart.

Targeting matters, and theoretical damage numbers don't tell you about this.

Other Skillshots

The utility of skillshots is highly dependent on two factors:

The value of the target they hit
If they successfully take out that target

How helpful damage charts are here should be rather obvious.

Conditional Skills

G11 mod gains damage based on the max HP of what she shoots. This obviously can vary wildly by encounter, and even throughout the fight against mixed groups. Simulators cannot assign any sort of generally applicable value to this.

Uzi's flames spread to nearby enemies. How many? Simulators don't know.

NTW needs to kill enemies with each skill shot to chain them. Mosin gets FP/ROF boosts depending on how and when she kills them.

Stechkin reduces the EVA of low-HP enemies.

G3's grenade gets different effects based on what % HP enemies have left.

SVCh gets FP buffs by switching targets.

AK-Alfa does more damage to enemies in front of her.

AK-15 strains herself more against larger groups.

Desert Eagle makes specific enemies vulnerable and prioritizes them.

QBU has an explosive active and does another whenever she attacks the same target thrice.

Grape does 45x damage to a non-elite, almost certainly massive overkill. (But very effective!)

I could go on and on scrolling through my armory, but hopefully you get the idea. None of these skill mechanics can be decently represented by damage simulators. The numbers it gives you when you do put these dolls in become nonsense.

Non-offensive skills

A damage simulator will tell you that Px4 Storm is almost always better than P22. Ranking usage statistics make it clear players don't feel the same way. The EVA & shield buffs are really good, but damage simulators can't assign any value to those.

Damage sims can't tell you anything remotely useful if trying to decide between UMP45, UMP9, or RO635 as your main tank.

ADS has a halfhearted grenade, but isn't really a damage dealer. How good are her debuffs when compared to a normal selfbuffer?

Does Welrod reduce the overall damage you take, or would a buffer be better?

How much does M950A's slow field help? Her MS buff?

Does some echelon have enough armor/other buffs for M500's shieldspam to be enough to survive?

Damage simulators tell you about damage (rather poorly). Damage matters, but is a far cry away from everything involved in proper teambuilding.

Kiting

Outside of farming, you probably should be doing at least some kiting. Proper kiting can turn a fight that would critically injure your tanks into 0 damage. When your SMGs/HGs are kiting, they aren't shooting.

Damage simulators have no decent way of representing this, and will happily add every last theoretical bullet from your frontline into your total damage. This can inflate the value of team-wide buffs (example: when comparing Webley's leader vs non-leader buffs in an RFHG), and generally makes its results less representative of actual combat.

Lost links on your frontline throughout a series of battles similarly isn't considered. SR-3MP can look decent on paper, but gets a lot more depressing after she takes a few hits.

Enemies^#

The enemies you face also don't always map on that well to damage results. Many bosses have iframes, making them immortal for periods of time. Orthrus care a lot about your initial damage, and then about how many hits per second you can get out. No single enemy statline can be assigned in a simulator to a Hydra/Cyclops mob. Patrollers are best countered with manual skill control, and surehit burst attacks. Minotaurus require kiting to evade their tasers, and alternate between their normal mode and a mode where they take 80% less damage.

Spaghetti^#

Damage simulators are made by players of the game, not developers. There are a lot of weird behaviors that border on bugs in actual combat. Said behaviors have no guarantees of being properly implemented, and can be a big deal in the end result.

Did you know that Webley's buff applies two frames after its activation (vs 0 frames after for most skills)? This makes it able to apply its CDR even to skills that activate momentarily afterwards. Your guess is as good as mine as to if that is accurately programmed in.

Even worse, while simple to discover through regular testing, SVCh's skill is implemented completely incorrectly. And that's not even considering her FP-buff-on-kill.

Here's C-93's pigeonwings not even close to giving the right buff.

Hycdes will tell you that your dolls first deal damage 0.00 seconds in. The brainlets damage sim will tell you that G41 first deals damage 0.63 seconds in, and WA2000 takes 1.23s. These are both completely wrong, and these incorrect offsets mean all future shots are also marked at incorrect times. Even ignoring how walk time exists and will vary by case, dolls have at least a 1-frame startup period and bullets have travel time. They do not wait for their full ROF-based frame delay to shoot. If there's no walk time, damage will start from ~0.13-0.30 seconds in. (See over here for some examples/details on this subject. Usually not a big deal, but it (a) shows yet another area they get wrong, and (b) can lead to certain shots being improperly buffed or not buffed by various skills.)

Conclusion^#

Damage simulators are at best a questionable measurement of a team's offensive capabilities. They can be mildly useful when handled with extreme caution, but have many ways to lead you far astray. They should not be used as a primary metric for teambuilding.

Please take any damage simulator results for a full echelon with a grain of salt. I would not consider it even remotely convincing evidence of any teambuilding-related idea without significant external justification. There are a massive number of dolls it simply cannot handle decently, and all sorts of other problems either way. Almost any analysis I do relies heavily on some mix of hand calculations and data from combat in Target Practice, and it's for a reason - almost nothing a damage simulator outputs is helpful to me. At least hand calculations let me be very careful and aware of the assumptions I make, and how the results are interpreted. Damage simulators happily present numbers for all sorts of absurd situations without the slightest disclaimer.

Average skill complexity is on a clear upwards trajectory - these issues are only getting more and more severe. Many modern echelons are already 4 or 5/5 on dolls not even remotely well approximated by a damage simulator.

Proper analysis depends on all sorts of methods and variables so it's hard to give a singular answer on what to use instead. A basic napkin calculation may be all you need in some cases. In others, even hundreds of test runs would fall short of proper support.

Whatever the case, a damage simulator is probably not the final answer.