Find degrees of freedom for non_central_f distribution#1368
Find degrees of freedom for non_central_f distribution#1368JacobHass8 wants to merge 3 commits intoboostorg:developfrom
non_central_f distribution#1368Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1368 +/- ##
===========================================
- Coverage 95.34% 95.34% -0.01%
===========================================
Files 825 825
Lines 68160 68224 +64
===========================================
+ Hits 64987 65048 +61
- Misses 3173 3176 +3
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Close, but: double v1 = 5; Has two roots and x > v2. On the other hand as the parameters get larger, the likelyhood of two roots seems to deminish regardless of the value of x. |
You're right! Even going to x=3.51 there are two roots. |
|
Aha, scipy makes a note that it arbitrarily finds a root since the cdf is not necessarily monotonic. See here. Actually, it's just breaking for cases where there are two roots. Details
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import ncf
from scipy.special import ncfdtridfd, ncfdtr, ncfdtridfn
if __name__ == '__main__':
dfn = 5
x = 3.51
dfd = 3.5
nc = 0
p = ncfdtr(dfn, dfd, nc, x)
print(ncfdtridfn(p, dfd, nc, x)) # 1e+100
df1 = np.linspace(0, 10, num=1000)
cdf1 = ncf.cdf(x, df1, dfd, nc)
fig, ax = plt.subplots()
ax.plot(df1, -(p-cdf1))
ax.hlines(0, 0, 10, ls='--', color='k')
ax.set_xlabel('df1')
ax.set_ylabel("cdf-p")
plt.show()
|
|
Yikes, I was afraid we would encounter something like this. These functions were never tested regularly in SciPy and are likely pretty much unused in the wild, that's why we never got any feedback about it. One could give users the possibility to choose between the roots (with default to The more important issue for me is though if these inverses for the degrees of freedom are actually useful and helpful. I could open a RfC in SciPy to discuss a possible deprecation. Are there statistical applications for these parameter finders? I only know that for example CC @steppi @mdhaber : do you see a strong reason/use case to keep parameters finders with regard to the degrees of freedom for the F distributions around? They have two two possible solutions. |
|
Hmm, no, I'm not familiar with use cases. |
After a quick search on github, it doesn't look like any routines in scipy call |
|
I'm struggling to think of a use for |
|
But note that |










Towards #1305. I've added functions to find parameters
df1anddf2because they are nearly identical.All the tests are passing, but changing the initial guess for the degrees of freedom that is passed to
bracket_and_solvecauses most tests to fail. Thus, I'm not super confident in the current implementation. I will add more spot checks for a wider range ofdf1/df2to make sure things don't break. Perhaps the initial guess should change based on if the different ofp - cdfis positive or negative? For now, I'm going to leave it though.I also had some difficulty with seeing if the function passed to
bracket_and_solvewas rising or falling. I ultimately found that the following workedwhere
Essentially, if the difference between
cdf-p(or is negative thenfis increasing, and if it is positive thenfis decreasing. I'm not entirely sure why this works though.Could all of this rising/falling be avoided by minimizing the squared difference
(cdf-p)^2? Maybe this will open up a whole new can of worms though.