Document essential complexity of setDT by MichaelChirico · Pull Request #6756 · Rdatatable/data.table

MichaelChirico · 2025-01-23T17:32:28Z

codecov · 2025-01-23T17:39:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.62%. Comparing base (f72e46b) to head (00e3024).
Report is 2 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #6756   +/-   ##
=======================================
  Coverage   98.62%   98.62%           
=======================================
  Files          79       79           
  Lines       14641    14641           
=======================================
  Hits        14440    14440           
  Misses        201      201

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tdhock · 2025-01-24T08:46:59Z

man/setDT.Rd

-  When working on large \code{lists} or \code{data.frames}, it might be both time and memory consuming to convert them to a \code{data.table} using \code{as.data.table(.)}, as this will make a complete copy of the input object before to convert it to a \code{data.table}. The \code{setDT} function takes care of this issue by allowing to convert \code{lists} - both named and unnamed lists and \code{data.frames} \emph{by reference} instead. That is, the input object is modified in place, no copy is being made.
+  When working on large \code{list}s or \code{data.frame}s, it might be both time- and memory-consuming to convert them to a \code{data.table} using \code{as.data.table(.)}, which will make a complete copy of the input object before converting it to a \code{data.table}. \code{setDT} takes care of this issue by converting any \code{list} (named or unnamed, data.frame or not) \emph{by reference} instead. That is, the input object is modified in place with no copy.
+
+  This should come with low overhead, but note that \code{setDT} does check that the input is valid by looking for inconsistent input lengths and inadmissible column types (e.g. matrix).


great thanks, that explains why it is linear time in the number of columns.

jangorecki

There is a PR with light setDT where checks can be omitted - default false

document essential complexity of setDT

00e3024

MichaelChirico requested a review from tdhock January 23, 2025 17:32

tdhock approved these changes Jan 24, 2025

View reviewed changes

MichaelChirico merged commit 4899b39 into master Jan 24, 2025
10 checks passed

MichaelChirico deleted the setdt-doc-complexity branch January 24, 2025 17:29

jangorecki reviewed Jan 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document essential complexity of setDT#6756

Document essential complexity of setDT#6756
MichaelChirico merged 1 commit intomasterfrom
setdt-doc-complexity

MichaelChirico commented Jan 23, 2025

Uh oh!

codecov bot commented Jan 23, 2025 •

edited

Loading

Uh oh!

tdhock Jan 24, 2025

Uh oh!

Uh oh!

jangorecki left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MichaelChirico commented Jan 23, 2025

Uh oh!

codecov bot commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tdhock Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jangorecki left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jan 23, 2025 •

edited

Loading