Skip to content

Document essential complexity of setDT#6756

Merged
MichaelChirico merged 1 commit intomasterfrom
setdt-doc-complexity
Jan 24, 2025
Merged

Document essential complexity of setDT#6756
MichaelChirico merged 1 commit intomasterfrom
setdt-doc-complexity

Conversation

@MichaelChirico
Copy link
Member

Closes #6741

@MichaelChirico MichaelChirico requested a review from tdhock January 23, 2025 17:32
@codecov
Copy link

codecov bot commented Jan 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.62%. Comparing base (f72e46b) to head (00e3024).
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6756   +/-   ##
=======================================
  Coverage   98.62%   98.62%           
=======================================
  Files          79       79           
  Lines       14641    14641           
=======================================
  Hits        14440    14440           
  Misses        201      201           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

When working on large \code{lists} or \code{data.frames}, it might be both time and memory consuming to convert them to a \code{data.table} using \code{as.data.table(.)}, as this will make a complete copy of the input object before to convert it to a \code{data.table}. The \code{setDT} function takes care of this issue by allowing to convert \code{lists} - both named and unnamed lists and \code{data.frames} \emph{by reference} instead. That is, the input object is modified in place, no copy is being made.
When working on large \code{list}s or \code{data.frame}s, it might be both time- and memory-consuming to convert them to a \code{data.table} using \code{as.data.table(.)}, which will make a complete copy of the input object before converting it to a \code{data.table}. \code{setDT} takes care of this issue by converting any \code{list} (named or unnamed, data.frame or not) \emph{by reference} instead. That is, the input object is modified in place with no copy.

This should come with low overhead, but note that \code{setDT} does check that the input is valid by looking for inconsistent input lengths and inadmissible column types (e.g. matrix).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great thanks, that explains why it is linear time in the number of columns.

@MichaelChirico MichaelChirico merged commit 4899b39 into master Jan 24, 2025
10 checks passed
@MichaelChirico MichaelChirico deleted the setdt-doc-complexity branch January 24, 2025 17:29
Copy link
Member

@jangorecki jangorecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a PR with light setDT where checks can be omitted - default false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

expected/documented time complexity of setDT?

3 participants