Find degrees of freedom for `non_central_f` distribution by JacobHass8 · Pull Request #1368 · boostorg/math

JacobHass8 · 2026-02-21T01:16:51Z

Towards #1305. I've added functions to find parameters df1 and df2 because they are nearly identical.

All the tests are passing, but changing the initial guess for the degrees of freedom that is passed to bracket_and_solve causes most tests to fail. Thus, I'm not super confident in the current implementation. I will add more spot checks for a wider range of df1/df2 to make sure things don't break. Perhaps the initial guess should change based on if the different of p - cdf is positive or negative? For now, I'm going to leave it though.

I also had some difficulty with seeing if the function passed to bracket_and_solve was rising or falling. I ultimately found that the following worked

RealType guess = 1; // Changing this to 20, 200 causes errors! 
std::pair<RealType, RealType> ir = tools::bracket_and_solve_root(
   f, guess, RealType(2), f(guess) < 0 ? true : false, tol, max_iter, pol);

where

f_degrees_of_freedom_finder<RealType, Policy> f(x, v, nc, find_v1, p < q ? p : q, p < q ? false : true);

Essentially, if the difference between cdf-p (or is negative then f is increasing, and if it is positive then f is decreasing. I'm not entirely sure why this works though.

Could all of this rising/falling be avoided by minimizing the squared difference (cdf-p)^2? Maybe this will open up a whole new can of worms though.

codecov · 2026-02-21T02:34:35Z

Codecov Report

❌ Patch coverage is 96.92308% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.34%. Comparing base (a10f3c8) to head (3eb7561).

Files with missing lines	Patch %	Lines
include/boost/math/distributions/non_central_f.hpp	96.72%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1368      +/-   ##
===========================================
- Coverage    95.34%   95.34%   -0.01%     
===========================================
  Files          825      825              
  Lines        68160    68224      +64     
===========================================
+ Hits         64987    65048      +61     
- Misses        3173     3176       +3

Files with missing lines	Coverage Δ
test/test_nc_f.cpp	`100.00% <100.00%> (ø)`
include/boost/math/distributions/non_central_f.hpp	`89.70% <96.72%> (+1.73%)`	⬆️

... and 1 file with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a10f3c8...3eb7561. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jzmaddock · 2026-02-21T12:52:40Z

I looked into the case:

   double v1 = 5;
   double v2 = 2;
   double nc = 1;
   double x = 1.5;
   double P = 0.49845842011686358665786775091245664;

And trying to find v1, plotting the output of the function we're trying to root find on yields:

So we have two roots, and it's not clear to me that we can reliably determine which way the function slopes?

One thing I thought of was 1) determine if there is a root in [1, INF), if yes, then find it, otherwise 2) determine if there is a root in (0, 1], if yes then find it, otherwise throw.

But it's not clear that this will always work, or if 1 is the correct pivot point in all cases.

JacobHass8 · 2026-02-21T21:37:52Z

From sparse testing, it seems like there are two roots when $x < \nu_2, \nu_1$. See the plots below. Sorry it's a bit of a data dump.

This is just a rough estimate. I can't figure out where the point of inflection is.

Details

jzmaddock · 2026-02-22T18:17:40Z

From sparse testing, it seems like there are two roots when x < ν 2 , ν 1 .

Close, but:

double v1 = 5;
double v2 = 3.5;
double nc = 0;
double x = 3.5000000001;

Has two roots and x > v2.

On the other hand as the parameters get larger, the likelyhood of two roots seems to deminish regardless of the value of x.

JacobHass8 · 2026-02-22T19:30:41Z

From sparse testing, it seems like there are two roots when x < ν 2 , ν 1 .

Close, but:

double v1 = 5; double v2 = 3.5; double nc = 0; double x = 3.5000000001;

Has two roots and x > v2.

You're right! Even going to x=3.51 there are two roots.

JacobHass8 · 2026-02-22T19:58:11Z

Aha, scipy makes a note that it arbitrarily finds a root since the cdf is not necessarily monotonic. See here.

Actually, it's just breaking for cases where there are two roots.

Details

import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import ncf
from scipy.special import ncfdtridfd, ncfdtr, ncfdtridfn

if __name__ == '__main__':
    dfn = 5
    x = 3.51
    dfd = 3.5
    nc = 0
    p = ncfdtr(dfn, dfd, nc, x)
    
    print(ncfdtridfn(p, dfd, nc, x)) # 1e+100

    df1 = np.linspace(0, 10, num=1000)
    cdf1 = ncf.cdf(x, df1, dfd, nc)

    fig, ax = plt.subplots()
    ax.plot(df1, -(p-cdf1))
    ax.hlines(0, 0, 10, ls='--', color='k')
    ax.set_xlabel('df1')
    ax.set_ylabel("cdf-p")
    plt.show()

dschmitz89 · 2026-02-23T07:06:44Z

Yikes, I was afraid we would encounter something like this. These functions were never tested regularly in SciPy and are likely pretty much unused in the wild, that's why we never got any feedback about it.

One could give users the possibility to choose between the roots (with default to higher e. g.).

The more important issue for me is though if these inverses for the degrees of freedom are actually useful and helpful. I could open a RfC in SciPy to discuss a possible deprecation. Are there statistical applications for these parameter finders? I only know that for example statsmodels uses the inverse with respect to the noncentrality which is well behaved unlike the ones we are looking at here.

CC @steppi @mdhaber : do you see a strong reason/use case to keep parameters finders with regard to the degrees of freedom for the F distributions around? They have two two possible solutions.

mdhaber · 2026-02-23T15:57:29Z

Hmm, no, I'm not familiar with use cases.

JacobHass8 · 2026-02-23T16:55:48Z

The more important issue for me is though if these inverses for the degrees of freedom are actually useful and helpful. I could open a RfC in SciPy to discuss a possible deprecation. Are there statistical applications for these parameter finders? I only know that for example statsmodels uses the inverse with respect to the noncentrality which is well behaved unlike the ones we are looking at here.

After a quick search on github, it doesn't look like any routines in scipy call ncfdtridfd or ncfdtridfn (besides where they are defined). Tbh, I'm not familiar with the use cases for any of these inverses. I'm not that familiar with statsmodels though.

steppi · 2026-02-23T19:59:41Z

ncfdtridfd can be useful for sample size calculations for ANOVA. dfn would be one less than the number of treatments. p the desired power nc calculated based on the desired effect size. x the critical value from the central F distribution for the desired type 1 error rate. Fromdfd you can get the number of replicates to choose per treatment to achieve the desired power to detect the given effect size at the given type 1 error rate. I suspect that in most (all?) practical examples arising from real sample size calculations, there will only be one root.

I'm struggling to think of a use for ncfdtridfn though. The effect size of interest is mixed into dfn and nc, (nc is essentially the expected value of the numerator sum of squares, which depends on both the number of treatments and the treatment effect). So ncfdtridfn isn't what one would need to do power calculations for how many groups one can hope to investigate given a fixed sample size or things like that. I'd be reluctant to get rid of it in SciPy just because I can't think of an application offhand though.

JacobHass8 · 2026-02-24T02:27:26Z

This also appears to be an issue with the F-distribution when solving for dfn. See the below example and plot. Essentially, trying to invert to find dfn can yield 2 roots rather than 1.

dfn = 5
x = 3.51
dfd = 3.5
p = f.cdf(x, dfn, dfd)

dfn = np.linspace(0, 10, num=1000)
cdf1 = f.cdf(x, dfn, dfd)

fig, ax = plt.subplots()
ax.plot(dfn, -(p-cdf1))
ax.hlines(0, 0, 10, ls='--', color='k')
ax.set_xlabel('dfn')
ax.set_ylabel("cdf-p")
plt.show()

I'll try to investigate if there are any regimes where we can get 2 roots when inverting for dfd.

JacobHass8 · 2026-02-24T02:58:36Z

Yup, unfortunately this also happens when inverting for dfd.

dfn = 6
x = 0.5
dfd = 4
p = f.cdf(x, dfn, dfd)

dfd = np.linspace(0, 10, num=1000)
cdf1 = f.cdf(x, dfn, dfd)

fig, ax = plt.subplots()
ax.plot(dfd, (p-cdf1))
ax.hlines(0, 0, 10, ls='--', color='k')
ax.set_xlabel('dfd')
ax.set_ylabel("p-cdf")
plt.show()

steppi · 2026-02-24T15:39:00Z

But note that x=0.5 isn't a realistic critical value that would be used in a real sample size calculation. No one is looking to control the type 1 error rate at 78.6%. For now I think it would be fine to restrict tests to cases that could arise in real applications and document the limitations. If one could detect the two root case, returning NaN in those situations seems like it could be acceptable too.

JacobHass8 added 2 commits February 20, 2026 11:02

Initial implementation/test of find_v1

44cbb53

Added v2 finder

3eb7561

Included edge cases for code coverage [ci skip]

4801654

Conversation

JacobHass8 commented Feb 21, 2026

Uh oh!

codecov bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jzmaddock commented Feb 21, 2026

Uh oh!

JacobHass8 commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jzmaddock commented Feb 22, 2026

Uh oh!

JacobHass8 commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JacobHass8 commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dschmitz89 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdhaber commented Feb 23, 2026

Uh oh!

JacobHass8 commented Feb 23, 2026

Uh oh!

steppi commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JacobHass8 commented Feb 24, 2026

Uh oh!

JacobHass8 commented Feb 24, 2026

Uh oh!

steppi commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Feb 21, 2026 •

edited

Loading

JacobHass8 commented Feb 21, 2026 •

edited

Loading

JacobHass8 commented Feb 22, 2026 •

edited

Loading

JacobHass8 commented Feb 22, 2026 •

edited

Loading

dschmitz89 commented Feb 23, 2026 •

edited

Loading

steppi commented Feb 23, 2026 •

edited

Loading

steppi commented Feb 24, 2026 •

edited

Loading