Improvement Score (Severity-Adjusted) introduces a more rigorous and clinically contextualized way to measure patient improvement by adjusting for baseline severity. Instead of comparing raw change scores alone, this feature models each patient’s expected improvement based on others who began treatment at a similar severity level, then standardizes their results on a common scale. This “apples-to-apples” approach corrects limitations in traditional outcome metrics, ensuring fairer comparisons across diverse populations and settings. By providing a severity-aware, standardized metric, clinicians and organizations can better interpret treatment effectiveness with greater fairness, accuracy, and clinical rigor.
How to interpret Improvement Score (Severity-Adjusted)
What is effect size?
Before diving into Improvement Score, it helps to understand effect size—the statistical concept this metric builds on.
Effect size measures the magnitude of change in standard deviation units. When measuring treatment outcomes, we often use Cohen's d, which tells us how much patients improve on average.
Cohen (1988) proposed general benchmarks for interpreting effect sizes: 0.2 (small), 0.5 (medium), and 0.8 (large). These benchmarks are widely used in research, though Cohen himself noted they were intended as rough guidelines when domain-specific data isn't available—what counts as "large" can vary by field and context.
Why effect size alone falls short
Standard effect size has a limitation: it doesn't account for where patients start.
A patient beginning treatment with severe symptoms has more room to improve than someone starting with mild symptoms. This creates an unfair comparison—programs treating more severe populations can appear more effective simply because their patients had further to improve.
How we adjust for severity
Improvement Score (Severity-Adjusted) addresses this by adjusting for baseline severity before calculating the effect size.
Using Greenspace's dataset, we determine what improvement is typical for patients at each severity level. Then we measure how each patient performed relative to that expectation for someone who started where they did.
How the score is centered
The final score is centered on Greenspace's average Cohen's d—the typical treatment effect size across the Greenspace platform.
This means your organization's Improvement Score reflects how your outcomes compare to this benchmark. If your score equals the Greenspace average, your patients are improving at a typical rate for their severity level. Scores above the average indicate better-than-typical improvement; scores below indicate less-than-typical improvement.
The intuition: Improvement Score (Severity Adjusted) answers the question: "Compared to what usually happens for someone who started where this patient did, how much did they improve—and is that more or less than the Greenspace average?"
How to find the new Improvement Score (Severity-Adjusted) tiles?
Once you are logged in, go to the Analytics tab.
Click the MBC Benchmark dashboard.
Scroll down to the Improvement Score (Severity-Adjusted) tile.
How is Improvement Score (Severity-Adjusted) calculated? (with the stats, explained)
Improvement Score (Severity-Adjusted) is a severity-adjusted effect size, meaning it combines a standard effect size calculation with a statistical adjustment for baseline severity.
1. Calculate raw improvement
For each patient, we calculate change from baseline to follow-up:
where higher scores indicate worse symptoms. Positive values mean improvement.
2. Model expected improvement based on severity
We fit a linear regression model that predicts how much improvement is typical for patients at different starting severities:
This regression line represents the expected improvement for someone starting at a given severity level.
3. Calculate severity-adjusted improvement
For each patient, we calculate how much they improved relative to expectation:
This value (the residual) is positive if the patient improved more than expected, and negative if they improved less than expected.
4. Anchor to the average effect size (Cohen’s d)
Rather than centering scores at zero, we anchor Improvement Score (Severity-Adjusted) to the average treatment effect, expressed as a standard effect size:
5. Convert to Improvement Score (Severity-Adjusted) (final score)
Each patient’s Improvement Score (Severity-Adjusted) is then calculated as:
This expresses severity-adjusted improvement in standard deviation units, on the same scale as Cohen’s d.
Please note: The Greenspace averages used for comparison, will not be dynamic instead they will be static values.
Frequently Asked Questions (FAQs)
Why is Improvement Score (Severity-Adjusted) important?
Current metrics like percent improvement and standardized scores don't account for where patients start in their treatment journey. This means programs treating more severe populations appear to perform better simply due to greater "room for improvement," making fair comparisons impossible. Improvement Score (Severity-Adjusted) corrects this by adjusting for baseline severity.
How will administrators benefit?
Administrators gain the ability to fairly compare outcomes across programs, providers, and populations. They can benchmark their performance against peer organizations using Greenspace's proprietary dataset, identify true performance differences, and use statistically credible data for funding and value-based care discussions.
Which dashboards will include Improvement Score (Severity-Adjusted)?
At this moment, only the MBC Benchmarks dashboard will include the Improvement Score (Severity-Adjusted). However, we aim to have this rolled out with Clinical Outcomes and Assessment Results dashboards in the future.
What assessments are included in Improvement Score (Severity-Adjusted) calculations?
Improvement Score (Severity-Adjusted) is calculated for assessments that meet minimum sample size thresholds in Greenspace's dataset. Most assessments meet this criteria in Greenspace - a comprehensive list is outlined below.
Will Improvement Score (Severity-Adjusted) be available for individual patients?
The initial release focuses on aggregate analytics at the organization, clinic, and program levels. Patient-level Improvement Score (Severity-Adjusted) is being considered for future iterations.
How often are Improvement Score (Severity-Adjusted) recalculated?
Improvement Score (Severity-Adjusted) are calculated based on completed assessment cases and updated as new data becomes available. The data platform team has validated that results are statistically sound and reproducible, so constant recalculation is not necessary.
Why is this labeled as "Beta"?
We're launching Improvement Score (Severity-Adjusted) as a Beta feature because it introduces a new and statistically sophisticated concept. The Beta label gives us flexibility to refine language, interpretation thresholds, and presentation based on user feedback while maintaining transparency with customers.
How does Improvement Score (Severity-Adjusted) compare to other outcome metrics?
Improvement Score (Severity-Adjusted) complements existing metrics (percent improvement, standardized scores, RCI) by adding a severity-adjusted perspective. Each metric provides different insights, and together they offer a comprehensive view of treatment outcomes.
What if my organization's Improvement Score (Severity-Adjusted) is "Lower than average"?
Improvement Score (Severity-Adjusted) scores reflect many factors including treatment approaches, population characteristics, and program design. Lower scores present opportunities for quality improvement. Help Center resources will provide guidance on interpreting scores and identifying improvement strategies.
Can I hide Improvement Score (Severity-Adjusted) from my users?
As a Beta feature, Improvement Score (Severity-Adjusted) will be visible to all users with access to Analytics dashboards. We'll evaluate customization options based on feedback during the Beta period.
How does Greenspace calculate the benchmarks?
Benchmarks are derived from Greenspace's proprietary MBC dataset, which includes diverse organizations, populations, and treatment settings. The median (50th percentile) and 75th percentile scores provide reference points for comparison.
References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587






