Methodology

How Rolehue calculates salary benchmarks — with full transparency.

1. Data Sources

Rolehue salary data comes from two primary sources:

User submissions (primary): verified professionals submit their real compensation anonymously through our submission form. Each submission includes job title, city, base salary, bonus (optional), and years of experience. All submissions are pseudonymized — no identity is stored alongside salary data.
Public data (supplementary): when user submissions are sparse for a role/city combination, we supplement with publicly available data from: H1B visa salary databases (US), ONS/ASHE (UK), Statistics Canada, ABS (Australia), and Glassdoor/LinkedIn public ranges. Supplementary data is clearly marked and weighted lower than verified submissions.

Minimum cluster size: we only publish benchmarks when a role/city combination has at least n ≥ 25 data points. Below this threshold, the page shows “insufficient data” and invites submissions.

2. IQR-Based Outlier Detection

Before computing benchmarks, we remove outliers using the Interquartile Range (IQR) method:

Calculate Q1 (25th percentile) and Q3 (75th percentile) for the cluster.
Compute IQR = Q3 − Q1.
Define bounds: Lower = Q1 − 1.5 × IQR, Upper = Q3 + 1.5 × IQR.
Data points outside these bounds are flagged as outliers and excluded from public benchmarks.

Outliers are not discarded — they're retained in the database and can be re-included automatically if the cluster grows and their value falls within the new IQR bounds. This handles edge cases like legitimate ultra-high compensation at top-tier companies.

3. Clustering Methodology

Salary data is clustered across 6 dimensions:

Dimension	Description	Example
Role	Standardized job title	Senior Software Engineer
City	Metropolitan area	San Francisco, CA
Country	ISO country code	US
Industry	Sector grouping	Technology
Experience	Years of experience (bucketed)	5-9 years
Company Tier	FAANG / Big Tech / Startup / Other	Big Tech

When a sub-cluster (e.g., specific role × city) has n < 25, we roll up by removing the most specific dimension first: City → Country → broader Role category. Users always see the dimensional scope of their results.

4. Time Decay Weighting

Newer salary data is weighted more heavily than older data using exponential decay:

weight = e^{−λ × age_in_years}

Where λ = ln(2) / 2.0 yields a half-life of 2 years. A salary submitted today carries full weight (1.0); a salary from 2 years ago carries half weight (0.5); from 4 years ago, quarter weight (0.25). Data older than 5 years is excluded entirely.

Weight tiers used in computation:

Age	Weight	Tier
0-12 months	1.0	Current
12-24 months	0.6	Recent
24-60 months	0.3	Historical
> 60 months	0.0	Excluded

5. Benchmark Computation

After outlier removal and time decay weighting, we compute:

Percentiles: P10, P25, P50 (median), P75, P90 — computed on weighted data.
Mean: weighted arithmetic mean (informational, not the primary metric).
Range: P10 to P90 span.
Sample size (n): always displayed to give confidence context.

All monetary values are in USD unless explicitly noted. For non-US cities, we also show a PPP-adjusted equivalent.

6. Data Quality & Verification

Pseudonym consistency: each submission generates a stable pseudonym, letting us detect duplicate submissions from the same person without knowing their identity.
Statistical flagging: submissions that are extreme outliers (>3x IQR) are automatically held for review.
Source labeling: each data point is tagged with its source (verified user, H1B, public survey), and source weights affect the benchmark computation.
Freshness monitoring: clusters without new data in 6+ months are flagged as “stale” to users.

7. Limitations

Self-selection bias: users who submit salaries may not represent the full population. We mitigate this by including public data sources.
Small sample sizes: benchmarks with low n carry more uncertainty. We enforce n ≥ 25 but still encourage users to consider the sample size.
Total comp vs base: Rolehue currently focuses on base salary + bonus. Equity (RSUs, options) is on the roadmap but not yet included.
Geographic coverage: US cities have the most data. Non-US coverage is growing and may rely more heavily on supplementary sources.

Questions?

We believe in radical transparency. If you have questions about our methodology, data sources, or want to suggest improvements:

Email: [email protected]