Ensure Consistent Key Handling in as.data.table S3 Methods#6865
Ensure Consistent Key Handling in as.data.table S3 Methods#6865ben-schwen merged 21 commits intoRdatatable:masterfrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6865 +/- ##
=======================================
Coverage 98.59% 98.59%
=======================================
Files 79 79
Lines 14660 14661 +1
=======================================
+ Hits 14454 14455 +1
Misses 206 206 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ben-schwen
left a comment
There was a problem hiding this comment.
Please also add test cases + NEWS as stated in our Contributing Guidelines
|
Hi @ben-schwen , |
R/as.data.table.R
Outdated
| setattr(ans, "class", .resetclass(x, "data.frame")) | ||
| setalloccol(ans) | ||
| setkeyv(ans, key) | ||
| if (!is.null(key)) setkeyv(ans, key) |
There was a problem hiding this comment.
I think this change is superflous and also prevents some safeguarding like stripping keys from converting data.frame to data.table.
R/as.data.table.R
Outdated
| } | ||
|
|
||
| as.data.table.data.table = function(x, ...) { | ||
| key = list(...)$key |
There was a problem hiding this comment.
why no extra argument for key but packing it into ellipsis?
There was a problem hiding this comment.
I had initially considered adding an explicit argument for key, but in our previous discussion, you mentioned that giving a default value to key might introduce breaking behavior. Based on that feedback, I packed it into ... to avoid any unintended issues. However, I may have misunderstood your intent—if adding an explicit key argument is preferable after all, I’ll adjust accordingly. Could you please clarify?"
There was a problem hiding this comment.
Previously you had key=NULL giving it the default value of NULL and making it hard to distinguished if key is NULL because someone provided NULL as argument or because it is the default value.
inst/tests/tests.Rraw
Outdated
| test(2309.01, { | ||
| key(as.data.table(data.frame(t = c(3:1, 4:5), y = 1:5), key="t")) | ||
| }, "t") | ||
| # tibble to data.table with key | ||
| DF <- data.frame(t = c(3:1, 4:5), y = 1:5) |
There was a problem hiding this comment.
| test(2309.01, { | |
| key(as.data.table(data.frame(t = c(3:1, 4:5), y = 1:5), key="t")) | |
| }, "t") | |
| # tibble to data.table with key | |
| DF <- data.frame(t = c(3:1, 4:5), y = 1:5) | |
| DF = data.frame(t = c(3:1, 4:5), y = 1:5) | |
| test(2309.01, key(as.data.table(DF, key="t")), "t") |
Please keep to our style of tests, e.g. setting it up and then test on a single line. Also keep in mind that we do not use <- when not absolutely necessary.
NEWS.md
Outdated
|
|
||
| 3. `fread(keepLeadingZeros=TRUE)` now correctly parses dates with components with leading zeros as dates instead of strings, [#6851](https://github.com/Rdatatable/data.table/issues/6851). Thanks @TurnaevEvgeny for the report and @ben-schwen for the fix. | ||
|
|
||
| 4. `as.data.table()` now properly handles keys: explicitly specified keys are set, while unspecified keys are cleared (returning NULL from `key()`), [#6859](https://github.com/Rdatatable/data.table/issues/6859). Thanks @brookslogan for the report and @Mukulyadav2004 for the fix. |
There was a problem hiding this comment.
| 4. `as.data.table()` now properly handles keys: explicitly specified keys are set, while unspecified keys are cleared (returning NULL from `key()`), [#6859](https://github.com/Rdatatable/data.table/issues/6859). Thanks @brookslogan for the report and @Mukulyadav2004 for the fix. | |
| 4. `as.data.table()` now properly handles keys: specifying keys sets them, omitting keys preserves existing ones, and setting `key=NULL` clears them, [#6859](https://github.com/Rdatatable/data.table/issues/6859). Thanks @brookslogan for the report and @Mukulyadav2004 for the fix. |
inst/tests/tests.Rraw
Outdated
| test(2309.03, key(as.data.table(DT, key="t")), "t") | ||
| # data.table to data.table without key | ||
| DT = data.table(t = c(3:1, 4:5), key="t") | ||
| test(2309.04, key(as.data.table(DT)), NULL) |
There was a problem hiding this comment.
Actually we have three cases:
For data.table keyed with "b":
as.data.table(key="a")should set "a" as keyas.data.table()should leave "b" as key intactas.data.table(key=NULL)should remove the key
For a non-keyed data.table:as.data.table(key="a")should set "a" as keyas.data.table()should leave the data.table unkeyedas.data.table(key=NULL)should leave the data.table unkeyed
R/as.data.table.R
Outdated
| } | ||
|
|
||
| as.data.table.data.table = function(x, ...) { | ||
| as.data.table.data.table = function(x, keep.rownames, key=NULL, ...) { |
There was a problem hiding this comment.
| as.data.table.data.table = function(x, keep.rownames, key=NULL, ...) { | |
| as.data.table.data.table = function(x, key=NULL, ...) { |
As mentioned above we don't need a keep.rownames argument for data.table. While you could theoretically set them, we don't even print them
There was a problem hiding this comment.
I understand the reasoning behind simplifying the signature. However, removing keep.rownames from as.data.table.data.table() creates an S3 method mismatch with as.data.table(), which formally declares ...(x, keep.rownames,...). This inconsistency leads to a 'formal argument' warning in R due to the differing signatures. While keep.rownames may not be commonly used with a data.table, it remains necessary to maintain correct dispatch consistency
There was a problem hiding this comment.
Then I'm curious why we didn't have this warning previously? Since it was as.data.table.data.table = function(x, ...) {
There was a problem hiding this comment.
I haven't figured that out yet. I'll run the tests again to double-check whether the warning occurs as I described, and I'll get back to you with my findings
There was a problem hiding this comment.
I rechecked this, and the warning does occur as I described. Based on my research, Previously, using only ... masked this mismatch, and R’s checks didn’t flag it. Once the parameter key became explicit, R recognized an S3 generic/method discrepancy as well as the doc mismatch where the argument name and default values don’t align with the generic.
There was a problem hiding this comment.
It's probably due to argument order. The as.data.table.data.table = function(x, ...) method signature fits the as.data.table = function(x, keep.rownames=FALSE, ...) generic because ... doesn't contradict the keep.rownames argument. On the other hand, function(x, key, ...) would contradict the generic signature because key goes first.
There was a problem hiding this comment.
That sounds right. Moreover, adding new arguments before ... could constitute a breaking change for any downstreams relying on positional matching, c.f.
as.data.table(data.table(a=1), '2')Previously, the '2'is ignored, now it will be matched to keep.rownames=. Obviously pretty silly toy example, but it's more believable for this to happen accidentally in the case of more generic dispatch examples winding up with arguments in ... perhaps unintentionally.
The upshot is -- new arguments should go after .... That requires them to be named, which I think is a good practice anyway, see e.g.
|
One last change (in code + docs) of changing the signature |
|
Thanks @Mukulyadav2004! |
* fixing key setting order * fix key setting order * document argument keep.rownames and key * add tests and news.md * add rownames and key in .Rd * Extract keep.rownames and key from ... * eliminate %>% operator * update test 2309.2 * update tests * set key * update key handling * update test * corrected last test * correct test case * correct 2309.04 * update fix * update changes * Update R/as.data.table.R * Update R/as.data.table.R * update tests * correct signature of as.data.table.data.table --------- Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>
* fixing key setting order * fix key setting order * document argument keep.rownames and key * add tests and news.md * add rownames and key in .Rd * Extract keep.rownames and key from ... * eliminate %>% operator * update test 2309.2 * update tests * set key * update key handling * update test * corrected last test * correct test case * correct 2309.04 * update fix * update changes * Update R/as.data.table.R * Update R/as.data.table.R * update tests * correct signature of as.data.table.data.table --------- Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>
Closes #6859
This PR standardizes key assignment in as.data.table(), ensuring:
->key(as.data.table(x, key = )) always returns .
->Existing keys are cleared when key = NULL.
Changes:
->Added key = key in as.data.table.data.frame for nested conversions.
->Applied keys only when explicitly requested (if (!is.null(key)) setkeyv(ans, key)).
->Ensured key application to rn before renaming.
->Standardized as.data.table.data.table: key = NULL, keep.rownames = FALSE to prevent unexpected key retention.
->Ensured explicit key assignment: setkeyv(x, key) when provided, setkeyv(x, NULL) when NULL.
@ben-schwen , please review when you have time. Thanks!