The Outlier Tax — A Combine Matchmaking Retrospective

Summary

The shape of the problem and the shape of the fix.

The current production matchmaker (base_mmr_delta_pct = 0.50) optimises hard for the populated middle of the MMR distribution. The pair-gate window is generous enough that 411-MMR players match in roughly a minute. The cost is paid by extreme-MMR players: the 0–199 MMR cohort hits a 71-minute max wait and an 84% match rate; the 1000+ MMR cohort only matches at 88%. Both ends suffer because their percent-of-MMR window is small in absolute terms relative to the populated band, and there aren't enough peers to fill a lobby.

Tighter delta makes balance look better on paper. It also kills the cross-tier exposure combines need.

The naive fix — tighten base_mmr_delta_pct — produces matches that are uniform within tier (98%+ TIGHT spread) but eliminates the cross-tier exposure that combines exist to provide. Calibration data dries up. The right fix is to add the MMR clamp shipped in PR #300, which lets extreme-MMR players "see themselves" at a populated MMR for the pair gate while preserving real MMR for snake-draft balance. With the clamp in place, the wide pair-gate window can stay wide, MIXED-spread lobbies (the calibration sweet spot) stay frequent, and the tails get matched faster.

The hard part is choosing how aggressive the clamp should be. A 100-MMR Recruit clamped to 150 stays inside Recruit; clamped to 300 they're queued into matches against Contenders. The simulator shows the latter is dramatically faster on tail wait but stretches tier fairness. One night of arrival data is too thin to commit to a single answer, so this report presents three candidate configurations and recommends starting with the most conservative one — the codification of what your operators are already doing manually.

Suggested starting hypothesis · iterate after one combine night

Codify the manual MMR-bump-to-150 as `mmr_floor`.

base_mmr_delta_pct = 0.50    # unchanged
max_mmr_delta_pct  = 0.55    # marginal headroom
mmr_floor   = 150            # NEW — automates the manual bump you do today
mmr_ceiling = 1100           # NEW — covers the populated edge of Premier
# everything else unchanged

This is the most conservative possible change: the operator team has been manually editing low-MMR players up to ~150 to make matches happen under the existing config. Setting mmr_floor = 150 automates exactly that. A 100-MMR Recruit clamped to 150 with a 50% delta matches in raw range 75–225 — entirely inside the Recruit tier (100–254), so no cross-tier mismatching.

There are stronger candidates in the simulator (see §9), but they aggressively cross tier boundaries — a 100-MMR player clamped to 300 matches in 150–450, spanning Recruit through Contender. The simulator can't fully resolve which side of that trade-off matters more for your league because we have one night of arrival data and the simulator's 5-second tick under-counts production's faster firing. So: ship the safe change, observe one combine night under the new config, then escalate to a more aggressive clamp if the bottom tail is still waiting too long.

Three concrete candidates are tabulated in §9 — they span the tier-fairness ↔ tail-recovery trade-off. Pick whichever level of aggression matches your appetite for risk.

Data sources

Three layers of evidence, joined by player ID.

…

Historical matches
Jan 5 – Apr 29 (2026)

…

Nights with ≥10 matches

…

Simulator configurations replayed

384

Unique joiners on the analysed night

Bot interaction logs from Loki ({service_name="csc-bot"} |~ "created an interaction with command (queue|leavequeue)"). Each row carries a Discord user ID and millisecond timestamp. Loki retention is approximately six weeks, which means only the most recent combine night supplies replay-grade arrival data.
CombineMatches rows from core's PostgreSQL — scheduled_date is the de-facto pop timestamp (set to tz_now() in _pop_mmr_queue_sync). M2M relations home/away give the rosters. PR #300's CombineQueueMatchLog would give exact per-player wait times directly, but those rows didn't exist for the analysed night — the PR shipped after.
Player snapshots — current MMR, type (DE/FA/Signed), tier_id. MMR is treated as stable retroactively; drift since the queue event is small relative to the population spread.
Historical match rosters from January 5–19 (a 14-night season) and April 27–29. The January nights have no Loki coverage, so they contribute production-side balance and skill-spread metrics only — no replay analysis is possible without arrival timestamps.

The algorithm

What _pop_mmr_queue_sync actually does.

The combine matchmaker lives in apps/matches/mutations.py:_pop_mmr_queue_sync. Each invocation runs as a single transaction with row-level locks on every CombinesQueue row, which is what lets concurrent callers safely race for the same pop. The five-step recipe:

Snapshot under lock. CombinesQueue.objects.select_for_update().filter(…) pulls every queue row into memory. Each row carries its created_at, the player's most recent requeue_reason (NONE/CANCELLED/FINISHED), and a requeue_count.

Compute per-player priority.

type_weight    ∈ {DE: 1.0, FA: 0.6, Signed: 0.3}
time_factor    = log(time_in_queue + 1) / 10
requeue_adj    ∈ {NONE: 0, CANCELLED: +0.5, FINISHED: −0.3}
priority_score = type_weight + time_factor + requeue_adj

Scores break ties between otherwise-compatible candidates and decide who gets anchored first. DEs and long-waiters anchor.

Compute the per-player MMR window.

effective_pct   = clamp(base_mmr_delta_pct + integrated_pct,
                         min_mmr_delta_pct, max_mmr_delta_pct)
effective_mmr   = clamp(player.mmr, mmr_floor, mmr_ceiling)
                  # if/elif — never clamps twice
effective_delta = effective_pct * abs(effective_mmr)

integrated_pct grows with time-in-queue via the warmup ramp + linear expansion in _compute_effective_delta_pct. The clamp only participates in effective_delta and the pair-gate distance below; snake-draft balance and tier resolution always read raw player.mmr.

Greedy 10-player group formation. For each anchor (highest priority first), iterate candidates and keep those whose |c.effective_mmr − a.effective_mmr| ≤ min(a.effective_delta, c.effective_delta). Score each compatible candidate by priority + tier_proximity_adj (same tier +tier_proximity_bonus, ±1 tier +tier_proximity_bonus*0.5, beyond −max_tier_difference_penalty * (tier_diff − 1)). Take top 9, validate every pair in the resulting 10 is mutually within both deltas, lock in.
Snake-draft and persist. Sort the 10-player group by raw MMR descending, deal them home/away on pattern [0,1,1,0,0,1,1,0,0,1]. Create the CombineMatches row with scheduled_date=tz_now(), write a CombineQueueMatchLog audit row (post-#300), observe combines_queue_wait_seconds per player, request a Dathost server.

Two parameters dominate observed behaviour: base_mmr_delta_pct (initial pair-gate width as a fraction of effective_mmr) and the optional mmr_floor/mmr_ceiling clamp. Everything else is around the edges — tier proximity is a soft preference within the candidate set, time-based expansion just grows the window over a multi-minute timescale.

3.1Try it: pair-gate window

Pick a player, adjust the config knobs, watch the pair-gate window move across the eligible population. The histogram below is the full 635-player eligible roster — every Player with mmr > 0, type &neq; Spec in the DB right now — not just one night's queue. Tier bands are tinted in the background so you can see when a window crosses tier boundaries. The shaded green band is the candidate range that falls inside [effective_mmr ± effective_delta]. Compatible peers counts how many of the 635 also have this player inside their window — the pair gate is mutual.

player MMR100

Recruit

base_mmr_delta_pct0.50

time_in_queue (s)0

mmr_floor active 150

mmr_ceiling active 1100

effective_pct

0.500

effective_mmr

100

effective_delta

±50

window half-width, MMR

compatible peers

of 635 eligible

tier coverage

—

tiers the window spans

Histogram = all 635 eligible players in 50-MMR bins. Tier bands tinted in the background. Solid red rule = the player's raw MMR. Dashed teal rule = effective MMR after the clamp (only when clamped). Shaded green band = the pair-gate window. The tier coverage readout is the design lever: a window confined to one tier is gameplay-fair; a window spanning multiple tiers is calibration-rich but stretches fairness.

SkillUnits

Why "team-total MMR diff" is the wrong metric.

The simulator's existing balance metric is |sum(home.mmr) − sum(away.mmr)|, with bands at 50/100/200. That metric is fine when MMR is linear in skill, which it isn't on this ladder — tier widths vary by an order of magnitude:

Tier ladder · MMR span varies by ~10×
#	Tier	MMR min	MMR max	Span

A 200-MMR gap inside Recruit (span 154) is the entire tier and then some. The same 200-MMR gap inside Premier (span 1551) is 13% of one tier — barely a ranking distinction. Treating them as equivalent inflates "good balance" numbers in the high-MMR end of the distribution and obscures real mismatches at the low end.

The replacement metric, used throughout this report:

SkillUnits(player) = tier_index + (mmr − tier.mmrMin) / (tier.mmrMax − tier.mmrMin)

1.0 SkillUnit always means "one tier", regardless of where on the ladder the comparison happens. Two derived match metrics:

team_skill_balance: |sum(home_SU) − sum(away_SU)| — does the snake draft produce even teams? Bands at 0.5 / 1.0 / 2.0 SU.
match_skill_spread: max(SU) − min(SU) across all 10 players — how wide is the lobby's skill range? Bands at 1 / 2 / 3 SU.

match_skill_spread is the load-bearing metric for the rest of this report. The team-balance metric reports zero for a perfectly-balanced lobby with a 1500-MMR player on each side flanked by 100-MMR players — the totals cancel — even though that match will play terribly. match_skill_spread catches that case directly.

What "good" looks like

Combines are placement, not just play. Reframe accordingly.

It would be tempting to treat low match_skill_spread as the optimisation target — pick the config that produces the most TIGHT lobbies. That treatment is wrong for combines specifically. Combines exist to generate cross-tier observation data so that the placement system can correctly bucket players. A combine night where every match is 10 same-tier players gives nearly zero placement signal: the algorithm only learns that Elite plays Elite.

TIGHT <1 SUlow calibration value MIXED 1–2 SUsweet spot WIDE 2–3 SUuseful but unfair BLOWOUT ≥3 SUavoid

So the optimisation target is not "minimise spread." It's maximise MIXED, hold WIDE within tolerance, drive BLOWOUT to zero. TIGHT is acceptable but not desirable — it means the night burned a match opportunity producing no calibration evidence. WIDE is the noisy edge of the useful range — fine in moderation, costly in volume because gameplay quality drops.

Implication for the matchmaker

The pair-gate base_pct controls the spread distribution directly: tight delta produces TIGHT matches, wide delta produces MIXED-and-up. The clamp shifts that distribution rightward at the edges — by mapping extreme players into the populated band, it converts what would be "no match" into either a MIXED or WIDE lobby. Both outcomes feed the placement signal.

The night, in detail

2026-04-28 EDT · 384 joiners · 1568 JOINs · 161 USER_LEAVEs · 118 popped matches · 1 cancelled.

6.1The eligible population (635 players)

Before the queue-specific view, here's the population that could queue: every Player with mmr > 0 and type &neq; Spec. This frames the floor/ceiling choice — clamping a 100-MMR player to 200 is meaningful only if the 200-MMR band has actual peers.

All 635 eligible players in 50-MMR bins, with tier boundaries marked. The "100-MMR floor" of the system is real — no players sit below it. Recruit (100–254) holds 73 players, the populated middle (Prospect–Challenger, 255–669) holds 380, the top tail above 1000 is 43 players spread to 1500.

Caveat — the <150 MMR cohort is artificial

You'll see 34 players in the 150–199 band and zero below 150. The system's true MMR floor is 100; what looks like a thin band starting at 150 is the result of manual MMR adjustments made on previous combine nights to give bottom-tier players a chance at matches under the existing base_mmr_delta_pct = 0.50 config. Those manual bumps are exactly the operational pain the algorithmic clamp is meant to eliminate. Read the histogram as: there are players who belong at MMR 100, but they've been temporarily lifted to ~150 to function inside the current matchmaker.

6.2MMR distribution of joiners (one night)

Median joiner MMR is 411, with 78% of joiners between 200–700 MMR. Only 25 unique joiners below 200; 17 above 1000. Of the 635 eligible players, 384 (60%) actually queued on this night — the queue subset is not the population.

Last night's unique joiners, in 50-MMR bins. Tier-band labels above show how the distribution maps onto the tier ladder.

6.3Wait time, by MMR cohort

Wait times below come from production data — each matched player's most recent JOIN before the POP that included them. The bottom and top cohorts dominate the long-wait list:

Production wait times. Bars show p50, p95, and max in minutes per MMR cohort.

Production wait by cohort · 2026-04-28 EDT
MMR band	joiners	matched	match %	p50	p95	max	>30m	>60m

6.4Match outcomes — both balance metrics

Each match is one point below. Horizontal axis: team_skill_balance (snake-draft fairness). Vertical axis: match_skill_spread (lobby skill range). The tightly-clustered points along the bottom are TIGHT same-tier matches; the points climbing the y-axis are the calibration matches we actually want, with BLOWOUT territory above 3.0.

Each circle is one match. Color encodes spread band. Vertical reference lines mark the team-balance band thresholds.

Simulator A/B

14 configs, one event log, the production code path.

The replay command (core/apps/matches/management/commands/replay_combine_night.py) drives the production simulate_pop() function — which is the exact _pop_mmr_queue_sync body extracted in simulate_matchmaking.py — with the night's real arrival/departure stream from Loki. The matchmaker decides pop timing per config; we observe outcomes.

The table below ranks scenarios by MIXED % (the calibration metric), not by TIGHT. Top of the table = configs that produce the most cross-tier exposure while staying compatible with reasonable gameplay quality.

14 scenarios · ranked by MIXED-spread share
scenario	matches	vs actual	tight %	mixed %	wide %	blowout %	spread p95

7.1Per-cohort wait times across scenarios

Heatmap rows are MMR cohorts, columns are scenarios, cells are p95 wait in minutes. The wide-delta + clamp scenarios (snapshot_300_1000, 40pct_300_1000) keep the middle rows green and collapse the tail rows — the recommended trade.

p95 wait by cohort, in minutes. Greener cells are faster.

7.2Match-skill-spread distribution per scenario

Stacked bars show the share of matches in each spread band. Read this looking for the largest green + amber stack at the bottom, with the smallest red sliver on top. snapshot looks great here on calibration share but produces the worst tail wait. Adding the clamp shifts distribution rightward (more WIDE, slightly more BLOWOUT) but rescues the tails.

Distribution of match-skill-spread bands per scenario. Bars sum to 100%.

7.3Calibration vs throughput

Each point is one configuration. X-axis: matches formed. Y-axis: MIXED + WIDE share (combined "produces calibration data"). The dashed vertical line marks production's actual match count. Top-right is the goal: more matches and more calibration evidence.

Calibration yield vs throughput. Top-right dominates. The recommended config sits there.

Historical season

19 nights, 1 044 matches, looked at through the SkillUnits lens.

Loki retention doesn't cover the January 5–19 stretch, so the simulator has nothing to replay there. We can compute SkillUnits-aware metrics on the actual matches that ran — useful as a baseline for what historical algorithms produced.

8.1Match volume per night

Matches popped per EDT-anchored combine night.

8.2Skill-spread distribution under historical algorithms

The histogram below covers all 1 044 historical matches. Note the right tail: the historical algorithms (a mix of the old tier-locked path and the new MMR-based one) produced a meaningful BLOWOUT share that the new MMR-based pair gate eliminates almost entirely in our simulator runs.

Histogram of match-skill spread, all historical matches.

8.3Per-night skill-spread trajectory

p50 / p95 / max match_skill_spread per night. Premier-heavy nights produce wider spreads; this confirms the SkillUnits choice over raw-MMR balance — the same algorithm produces visibly different skill-spread profiles depending on who showed up.

Skill spread per night. Lower is more uniform; not necessarily better.

Recommendation

Three candidates. Pick a starting point, observe one night, refine.

One night of arrival data is not enough to commit to a single configuration with confidence. The simulator output below maps the trade-off space; the pick depends on which axis matters more right now — tier fairness (how far across tier boundaries the clamp drags low-MMR players for matchmaking) or tail recovery (how aggressively bottom-tail wait time drops).

The constraint to respect: a 100-MMR Recruit (raw tier 100–254) clamped to 300 matches in raw range 150–450, which spans Recruit, Prospect, and Contender. Snake-draft balance still uses raw MMR — they end up on a team with strong Contender teammates against a similar mix on the other side. The match happens, but their gameplay experience is "I am the lowest skill in this lobby by a lot." Codifying that as policy is a real design decision, not a free win.

9.1Three candidates

Candidates · simulator results from one combine night (2026-04-28)
	Tier-fair	Balanced	Aggressive
base_mmr_delta_pct	0.50	0.40	0.50
mmr_floor	150	250	300
mmr_ceiling	1100	1100	1000
100-MMR player matches in raw range	75–225	175–325	150–450
Tier crossing for a 100-MMR player	none (Recruit)	1 tier (Recruit→Prospect)	2 tiers (R→P→Contender)
Bottom-tail (0–199) p95 wait, sim	21.5m	9.6m	5.4m
Top-tail (1000+) p95 wait, sim	23.3m	24.9m	20.9m
Middle (300–799) p50 wait, sim	1.0m	1.9m	1.0m
Match-skill bands TIGHT / MIXED / WIDE / BLOWOUT	43 / 57 / 0 / 0	30 / 68 / 2 / 0	13 / 55 / 31 / 1
Total matches formed (vs 118 actual)	127	129	135

9.2How to read the table

The Tier-fair candidate (floor=150) is the smallest possible algorithmic change: it does in code exactly what your operators have been doing manually — bumping low-MMR players up to ~150 to make matches form. A 100-MMR player matches inside Recruit only. The simulator shows a 21.5m bottom-tail p95, worse than current production, but this is almost certainly an artifact of the simulator's 5-second tick. Production fires the matchmaker on every queue mutation, which is faster than once-per-5s. The fact that the manual-bump-to-150 has been working in production tells us this configuration is at-or-better than current reality, just without the operator overhead.

The Balanced candidate (floor=250, base_pct=0.40) tightens the delta and lifts the floor to where the populated middle starts. A 100-MMR player matches in 175–325 (Recruit + Prospect). Tier-crossing happens but stays at one tier. Best Pareto on this night's simulator data: 9.6m bottom-tail, 0% BLOWOUT, 68% MIXED.

The Aggressive candidate (floor=300, base_pct=0.50) optimises hard for tail wait time at the cost of tier fairness. 5.4m bottom-tail p95 — by far the fastest — at the cost of dragging Recruits into matches against Contenders for the purpose of the pair gate. Snake draft still balances by raw MMR, but the lowest-skill player feels the spread.

9.3What not to do

The earlier draft of this report recommended base_pct = 0.10–0.15 with various clamps. With the calibration framing established in §5 (combines exist to generate cross-tier observation data), those configs are wrong: they produce 70%+ TIGHT same-tier-only lobbies and starve the placement system of signal. Don't ship them.

9.4Rollout

Pick a candidate. Recommended starting point: Tier-fair — it formalises what's already working in production manually, and risk of regression is minimal.
Update the active MatchmakingConfig row via updateMatchmakingConfig GraphQL mutation, or in the Django admin. Both mmr_floor and mmr_ceiling are nullable FloatFields; full_clean() enforces mmr_floor < mmr_ceiling.
Watch combines_queue_wait_seconds in Grafana during the night. The histogram is labelled by player_type and requeue_reason; tail-cohort effects show up most clearly when filtered to player_type matching the typical tail composition.
After one full combine night, re-export with scripts/fetch_queue_timeline.py + export_combine_night, re-run this analysis, and compare predicted vs observed wait p95 per cohort. If bottom-tail p95 is still >15 minutes, escalate to Balanced. If it's already <10 minutes, hold at the current candidate.
Iterate floor/ceiling values seasonally as MMR drifts. The eligible-population histogram in §6.1 is the correct guide — set the floor at the lower edge of the populated middle (where bucket density crosses ~10 players per 100-MMR band), and the ceiling at the corresponding upper edge.

Caveats

Where this analysis is lying to you.

Single-night replay sample. Only the 2026-04-28 EDT session has both Loki bot logs and DB rosters in the analysis. The 14 January nights contribute production-side match_skill_spread distributions only — we can't replay simulator scenarios there. Confidence in the recommended config is bounded by that sample size; soft-canary the change and re-run this analysis after one combine night under the new config.
Simulator approximations. The replay loop ticks every 5 seconds for pop attempts; production is event-driven via the pop_mmr_queue mutation, which fires on every queue mutation that could form a lobby. The simulator under-counts FINISHED requeues — without match-completion data we can't distinguish post-game requeues from a simple JOIN — and models cancellations as CANCELLED requeues whenever a player rejoins after a sim-pop. Server-availability is unmodelled (sim assumes infinite Dathost capacity).
MMR snapshot is current, not historical. A player's MMR moves slightly after each match. The "MMR at queue time" used by the simulator is approximated as "MMR right now." Effect should be O(±20 MMR) for active players, small relative to the 50-MMR histogram bins and 100-MMR cohort buckets.
SkillUnits depends on stable tier boundaries. If Tiers.mmrMin/Tiers.mmrMax change, the metric needs re-computing. Current bands:

Generated end-to-end by scripts/fetch_queue_timeline.py, core/apps/matches/management/commands/{export_combine_night, replay_combine_night, sweep_combine_night}.py, and an inline data bundler. All raw data is embedded in this document — see window.REPORT_DATA in the JS console for the full bundle.