Skip to content

Ensure Consistent Key Handling in as.data.table S3 Methods#6865

Merged
ben-schwen merged 21 commits intoRdatatable:masterfrom
Mukulyadav2004:fix_key_inconsistency
Mar 17, 2025
Merged

Ensure Consistent Key Handling in as.data.table S3 Methods#6865
ben-schwen merged 21 commits intoRdatatable:masterfrom
Mukulyadav2004:fix_key_inconsistency

Conversation

@Mukulyadav2004
Copy link
Contributor

Closes #6859
This PR standardizes key assignment in as.data.table(), ensuring:

->key(as.data.table(x, key = )) always returns .
->Existing keys are cleared when key = NULL.

Changes:
->Added key = key in as.data.table.data.frame for nested conversions.
->Applied keys only when explicitly requested (if (!is.null(key)) setkeyv(ans, key)).
->Ensured key application to rn before renaming.
->Standardized as.data.table.data.table: key = NULL, keep.rownames = FALSE to prevent unexpected key retention.
->Ensured explicit key assignment: setkeyv(x, key) when provided, setkeyv(x, NULL) when NULL.

@ben-schwen , please review when you have time. Thanks!

@codecov
Copy link

codecov bot commented Mar 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.59%. Comparing base (3afc750) to head (c31d1e7).
Report is 7 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6865   +/-   ##
=======================================
  Coverage   98.59%   98.59%           
=======================================
  Files          79       79           
  Lines       14660    14661    +1     
=======================================
+ Hits        14454    14455    +1     
  Misses        206      206           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@ben-schwen ben-schwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add test cases + NEWS as stated in our Contributing Guidelines

@Mukulyadav2004
Copy link
Contributor Author

Hi @ben-schwen ,
I sincerely apologize for the excess commits on this pull request, especially around the test cases. I’ve cleaned up any unnecessary changes and would greatly appreciate your review when you have time. Thank you for your patience and understanding.

setattr(ans, "class", .resetclass(x, "data.frame"))
setalloccol(ans)
setkeyv(ans, key)
if (!is.null(key)) setkeyv(ans, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is superflous and also prevents some safeguarding like stripping keys from converting data.frame to data.table.

}

as.data.table.data.table = function(x, ...) {
key = list(...)$key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why no extra argument for key but packing it into ellipsis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had initially considered adding an explicit argument for key, but in our previous discussion, you mentioned that giving a default value to key might introduce breaking behavior. Based on that feedback, I packed it into ... to avoid any unintended issues. However, I may have misunderstood your intent—if adding an explicit key argument is preferable after all, I’ll adjust accordingly. Could you please clarify?"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously you had key=NULL giving it the default value of NULL and making it hard to distinguished if key is NULL because someone provided NULL as argument or because it is the default value.

Comment on lines +21090 to +21094
test(2309.01, {
key(as.data.table(data.frame(t = c(3:1, 4:5), y = 1:5), key="t"))
}, "t")
# tibble to data.table with key
DF <- data.frame(t = c(3:1, 4:5), y = 1:5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test(2309.01, {
key(as.data.table(data.frame(t = c(3:1, 4:5), y = 1:5), key="t"))
}, "t")
# tibble to data.table with key
DF <- data.frame(t = c(3:1, 4:5), y = 1:5)
DF = data.frame(t = c(3:1, 4:5), y = 1:5)
test(2309.01, key(as.data.table(DF, key="t")), "t")

Please keep to our style of tests, e.g. setting it up and then test on a single line. Also keep in mind that we do not use <- when not absolutely necessary.

NEWS.md Outdated

3. `fread(keepLeadingZeros=TRUE)` now correctly parses dates with components with leading zeros as dates instead of strings, [#6851](https://github.com/Rdatatable/data.table/issues/6851). Thanks @TurnaevEvgeny for the report and @ben-schwen for the fix.

4. `as.data.table()` now properly handles keys: explicitly specified keys are set, while unspecified keys are cleared (returning NULL from `key()`), [#6859](https://github.com/Rdatatable/data.table/issues/6859). Thanks @brookslogan for the report and @Mukulyadav2004 for the fix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. `as.data.table()` now properly handles keys: explicitly specified keys are set, while unspecified keys are cleared (returning NULL from `key()`), [#6859](https://github.com/Rdatatable/data.table/issues/6859). Thanks @brookslogan for the report and @Mukulyadav2004 for the fix.
4. `as.data.table()` now properly handles keys: specifying keys sets them, omitting keys preserves existing ones, and setting `key=NULL` clears them, [#6859](https://github.com/Rdatatable/data.table/issues/6859). Thanks @brookslogan for the report and @Mukulyadav2004 for the fix.

test(2309.03, key(as.data.table(DT, key="t")), "t")
# data.table to data.table without key
DT = data.table(t = c(3:1, 4:5), key="t")
test(2309.04, key(as.data.table(DT)), NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we have three cases:
For data.table keyed with "b":

  • as.data.table(key="a") should set "a" as key
  • as.data.table() should leave "b" as key intact
  • as.data.table(key=NULL) should remove the key
    For a non-keyed data.table:
  • as.data.table(key="a") should set "a" as key
  • as.data.table() should leave the data.table unkeyed
  • as.data.table(key=NULL) should leave the data.table unkeyed

}

as.data.table.data.table = function(x, ...) {
as.data.table.data.table = function(x, keep.rownames, key=NULL, ...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
as.data.table.data.table = function(x, keep.rownames, key=NULL, ...) {
as.data.table.data.table = function(x, key=NULL, ...) {

As mentioned above we don't need a keep.rownames argument for data.table. While you could theoretically set them, we don't even print them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reasoning behind simplifying the signature. However, removing keep.rownames from as.data.table.data.table() creates an S3 method mismatch with as.data.table(), which formally declares ...(x, keep.rownames,...). This inconsistency leads to a 'formal argument' warning in R due to the differing signatures. While keep.rownames may not be commonly used with a data.table, it remains necessary to maintain correct dispatch consistency

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'm curious why we didn't have this warning previously? Since it was as.data.table.data.table = function(x, ...) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't figured that out yet. I'll run the tests again to double-check whether the warning occurs as I described, and I'll get back to you with my findings

Copy link
Contributor Author

@Mukulyadav2004 Mukulyadav2004 Mar 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rechecked this, and the warning does occur as I described. Based on my research, Previously, using only ... masked this mismatch, and R’s checks didn’t flag it. Once the parameter key became explicit, R recognized an S3 generic/method discrepancy as well as the doc mismatch where the argument name and default values don’t align with the generic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably due to argument order. The as.data.table.data.table = function(x, ...) method signature fits the as.data.table = function(x, keep.rownames=FALSE, ...) generic because ... doesn't contradict the keep.rownames argument. On the other hand, function(x, key, ...) would contradict the generic signature because key goes first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds right. Moreover, adding new arguments before ... could constitute a breaking change for any downstreams relying on positional matching, c.f.

as.data.table(data.table(a=1), '2')

Previously, the '2'is ignored, now it will be matched to keep.rownames=. Obviously pretty silly toy example, but it's more believable for this to happen accidentally in the case of more generic dispatch examples winding up with arguments in ... perhaps unintentionally.

The upshot is -- new arguments should go after .... That requires them to be named, which I think is a good practice anyway, see e.g.

https://design.tidyverse.org/dots-after-required.html

@ben-schwen
Copy link
Member

ben-schwen commented Mar 17, 2025

One last change (in code + docs) of changing the signature as.data.table.data.table to as.data.table.data.table = function(x, ..., key=NULL) as suggested by @MichaelChirico and we are good to go! :)

@ben-schwen ben-schwen merged commit 0909048 into Rdatatable:master Mar 17, 2025
10 of 11 checks passed
@ben-schwen
Copy link
Member

Thanks @Mukulyadav2004!

@Mukulyadav2004 Mukulyadav2004 deleted the fix_key_inconsistency branch March 17, 2025 17:53
aitap pushed a commit that referenced this pull request Apr 24, 2025
* fixing key setting order

* fix key setting order

* document argument keep.rownames and key

* add tests and news.md

* add rownames and key in .Rd

* Extract keep.rownames and key from ...

* eliminate %>% operator

* update test 2309.2

* update tests

* set key

* update key handling

* update test

* corrected last test

* correct test case

* correct 2309.04

* update fix

* update changes

* Update R/as.data.table.R

* Update R/as.data.table.R

* update tests

* correct signature of as.data.table.data.table

---------

Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>
@aitap aitap mentioned this pull request Apr 24, 2025
TysonStanley pushed a commit that referenced this pull request Apr 24, 2025
* fixing key setting order

* fix key setting order

* document argument keep.rownames and key

* add tests and news.md

* add rownames and key in .Rd

* Extract keep.rownames and key from ...

* eliminate %>% operator

* update test 2309.2

* update tests

* set key

* update key handling

* update test

* corrected last test

* correct test case

* correct 2309.04

* update fix

* update changes

* Update R/as.data.table.R

* Update R/as.data.table.R

* update tests

* correct signature of as.data.table.data.table

---------

Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

key(as.data.table(x, key = <key>)) is inconsistent, varies based on x

4 participants