Releases: etiennebacher/tidypolars
tidypolars 0.11.0
tidypolars
requires polars
>= 0.20.0.
Breaking changes
-
arrange()
now errors with unknown variable names (likedplyr::arrange()
).
Previously, unknown variables were silently ignored. Using expressions (like
a + b
) is now accepted (#144). -
The parameter
inherit_optimization
is removed from allsink_*()
functions.
New features
-
The power operators
^
and**
now work. -
New function
sink_ndjson()
to write the results of a lazy query to a NDJSON
file without collecting it in memory. -
inner_join()
now accepts inequality joins in theby
argument, including
the following helpers:between()
,overlaps()
,within()
(#148).
Bug fixes
-
Using an external object in
case_when()
,ifelse()
andifelse()
now works. -
str_sub()
doesn't error anymore whenstart
is positive andend
is negative. -
read_*_polars()
functions used to return a standarddata.frame
by mistake.
They now return a Polars DataFrame. -
Using
[
for subsetting in expressions now works. Thanks @ginolhac for the
report (#141). -
bind_cols_polars()
andbind_rows_polars()
now error (as expected before) if
elements are a mix of Polars DataFrames and LazyFrames.
tidypolars 0.10.1
Bug fixes
- Do not error when handling columns with datatype
Null
. Note that converting
those columns to R withas.data.frame()
,as_tibble()
, orcollect()
is
still an issue as ofpolars
0.19.1.
tidypolars 0.10.0
tidypolars
requires polars
>= 0.19.1.
Breaking changes and deprecations
-
describe()
is deprecated as of tidypolars 0.10.0 and will be removed in a
future update. Usesummary()
with the same arguments instead (#127). -
describe_plan()
anddescribe_optimized_plan()
are deprecated as of
tidypolars 0.10.0 and will be removed in a future update. Useexplain()
with
optimized = TRUE/FALSE
instead (#128). -
In
sink_parquet()
andsink_csv()
, all arguments except for.data
and
path
must be named (#136).
New features
-
Add support for more functions:
- from package
base
:substr()
.
- from package
-
Better error message when a function can come from several packages but only
one version is translated (#130). -
row_number()
now works without argument (#131). -
New functions to import data as Polars DataFrames and LazyFrames (#136):
read_<format>_polars()
to import data as a Polars DataFrame;scan_<format>_polars()
to import data as a Polars LazyFrame;<format>
can be "csv", "ipc", "json", "parquet".
Those can replace functions from
polars
. For example,
polars::pl$read_parquet(...)
can be replaced by
read_parquet_polars(...)
. -
New functions to write Polars DataFrames to external files:
write_<format>_polars()
where<format>
can be "csv", "ipc", "json",
"ndjson", "parquet" (#136). -
New function
sink_ipc()
that is similar tosink_parquet()
andsink_csv()
but for IPC files (#136). -
across()
now throws a better error message when the user passes an external
list to.fns
. This works withdplyr
but cannot work withtidypolars
(#135). -
Added support for argument
.add
ingroup_by()
.
Bug fixes
-
stringr::str_sub()
now works when bothstart
andend
are negative. -
Fixed a bug in
str_sub()
whenstart
was greater than 1. -
stringr::str_starts()
andstringr::str_ends()
now work with a regex. -
fill()
doesn't error anymore when...
is empty. Instead, it returns the
input data. -
unite()
now provides a proper error message whencol
is missing. -
unite()
doesn't error anymore when...
is empty. Instead, it uses all
variables in the dataset. -
filter()
,mutate()
andsummarize()
now work when using a column from
another data.frame, e.g.my_polars_df |> filter(x %in% some_data_frame$y)
-
replace_na()
no longer converts the column to the datatype of the replacement,
e.g.data |> replace_na("a")
will error if the input data is numeric. -
n_distinct()
now correctly applies thena.rm
argument when several columns
are passed as input (#137).
tidypolars 0.9.0
tidypolars
requires polars
>= 0.18.0.
New features
-
Add support for several functions:
-
from package
base
:%%
and%/%
. -
from package
dplyr
:dense_rank()
,row_number()
. -
from package
lubridate
:wday()
.
-
-
Better handling of missing values to match
R
behavior. In the following
functions, if there is at least one missing value andna.rm = FALSE
(the
default), then the output will beNA
:max()
,mean()
,median()
,min()
,
sd()
,sum()
,var()
(#120). -
New argument
cluster_with_columns
incollect()
,compute()
, andfetch()
. -
Add a global option
tidypolars_unknown_args
to control what happens when
tidypolars
doesn't know how to handle an argument in a function. The default
is to warn and the only other accepted value is"error"
.
Bug fixes
count()
andadd_count()
no longer overwrite a variable namedn
if the
argumentname
is unspecified.
tidypolars 0.8.0
tidypolars
requires polars
>= 0.17.0.
Breaking changes
-
As announced in
tidypolars
0.7.0, the behavior ofcollect()
has changed.
It now returns a standard Rdata.frame
and not a PolarsDataFrame
anymore.
Replacecollect()
bycompute()
(with the same arguments) to keep the old
behavior. -
In
bind_rows_polars()
, if.id
is passed, the resulting column now is of
type character instead of integer.
New features
-
Add support for several functions:
-
from package
base
:all()
,any()
,diff()
,ISOdatetime()
,
length()
,rev()
,unique()
. -
from package
dplyr
:consecutive_id()
,min_rank()
,na_if()
,
n_distinct()
,nth()
. -
from package
lubridate
:make_datetime()
. -
from package
stringr
:str_dup()
,str_split()
,str_split_i()
,
str_trunc()
. -
from package
tidyr
:replace_na()
(the data.frame method was already
translated but not the vector one that can be used inmutate()
for example).
-
-
It is now possible to use explicit namespaces (such as
dplyr::first()
instead
offirst()
) inmutate()
,summarize()
andfilter()
(#114). -
In
bind_rows_polars()
, if all elements are named and.id
is specified, the
.id
column will use the names of the elements (#116). -
It is now possible to rename variables in
select()
(#117). -
Add support for argument
na_matches
in all join functions (except
cross_join()
that doesn't need it) (#109).
Bug fixes
-
Local variables in custom functions could not be used in tidypolars functions
(reported in a blog post of Art Steinmetz). This is now fixed. -
across()
now works when.cols
contains only one variable and.fns
contains
only one function. -
In
across()
, the.cols
argument now takes into account variables created
in the samemutate()
orsummarize()
call beforeacross()
.as_polars_df(mtcars) |> head(n = 3) |> mutate( foo = 1, across(.cols = contains("oo"), \(x) x - 1) ) shape: (3, 12) ┌──────┬─────┬───────┬───────┬───┬─────┬──────┬──────┬─────┐ │ mpg ┆ cyl ┆ disp ┆ hp ┆ … ┆ am ┆ gear ┆ carb ┆ foo │ │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │ │ f64 ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 │ ╞══════╪═════╪═══════╪═══════╪═══╪═════╪══════╪══════╪═════╡ │ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 1.0 ┆ 4.0 ┆ 4.0 ┆ 0.0 │ │ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 1.0 ┆ 4.0 ┆ 4.0 ┆ 0.0 │ │ 22.8 ┆ 4.0 ┆ 108.0 ┆ 93.0 ┆ … ┆ 1.0 ┆ 4.0 ┆ 1.0 ┆ 0.0 │ └──────┴─────┴───────┴───────┴───┴─────┴──────┴──────┴─────┘
Note that the
where()
function is not supported here. For example:as_polars_df(mtcars) |> mutate( foo = 1, across(.cols = where(is.numeric), \(x) x - 1) )
will not return 0 for the variable
foo
. A warning is emitted about this
behavior. -
Better handling of negative values in
c()
when called inmutate()
and
summarize()
.
tidypolars 0.7.0
tidypolars
requires polars
>= 0.16.0.
Breaking changes and deprecations
-
as_polars()
is now removed. It was deprecated in 0.6.0. Useas_polars_df()
oras_polars_lf()
instead. -
to_r()
is now removed. It was deprecated in 0.6.0. Useas.data.frame()
oras_tibble()
instead. -
For consistency with
dplyr
, the behavior ofcollect()
will change in 0.8.0
as it will perform the lazy query and convert the result to a standard
data.frame
. For now,collect()
only throws a warning about this future
change. It is recommended to usecompute()
to only perform the query and get
a Polars DataFrame as output (#101).
New features
-
Several improvements and changes for
pivot_wider()
(#95):names_from
can now takes several variables;- add support for
id_cols
andnames_glue
; - default value of
names_sep
now is_
, for consistency withtidyr
; - fix documentation as
pivot_wider()
doesn't work on LazyFrame.
-
Add support for
stringr::regex()
. Note that only the argumentignore_case
is supported for now (#97). -
Add support for several
lubridate
functions:dweeks()
,ddays()
,
dhours()
,dminutes()
,dseconds()
,dmilliseconds()
,make_date()
(#107). -
When a
polars
function called internally fails, the original error message
is now displayed. -
Add support for
group_split()
(forDataFrame
only). -
Add support for argument
relationship
inleft_join()
,right_join()
,
full_join()
andinner_join()
(#106).
tidypolars 0.6.0
tidypolars
requires polars
>= 0.15.0.
Breaking changes and deprecations
-
as_polars()
is deprecated and will be removed in 0.7.0. Useas_polars_lf()
oras_polars_df()
instead. -
as_polars()
doesn't have an argumentwith_string_cache
anymore. When set
toTRUE
, this enabled the string cache globally, which could lead to
undesirable side effects. -
to_r()
is deprecated and will be removed in 0.7.0. Useas.data.frame()
or
as_tibble()
instead. This used to silently return aLazyFrame
if the
input wasLazyFrame
. It now automatically collects theLazyFrame
(#88). -
pull()
nows automatically collects inputLazyFrame
(#89).
New features
-
Add support for argument
.keep
inmutate()
(#80). -
Add support for
group_vars()
andgroup_keys()
(#81). -
Experimental support of
rowwise()
. For now, this is limited to a few
functions:mean()
,median()
,min()
,max()
,sum()
,all()
,any()
.
rowwise()
andgroup_by()
cannot be used at the same time (#40). -
All functions that return a polars
Data/LazyFrame
now add the class
"tidypolars"
to the output (#86). -
Support
which.min()
,which.max()
,dplyr::n()
. -
Support
.data[[
and.env[[
in addition to.data$
and.env$
. Better
error messages when the objects specified in.data
or.env
don't exist.
Bug fixes
pull()
now errors whenvar
is of length > 1.
tidypolars 0.5.0
tidypolars
requires polars
>= 0.12.0.
Breaking changes
-
across()
now errors if the argument.cols
is not provided (either named or
unnamed). This behavior was deprecated indplyr
1.1.0. -
It is no longer possible to use
!
inarrange()
to sort by decreasing order,
for compatibility withdplyr::arrange()
. Use-
ordesc()
instead.
New features
-
summarize()
now works on ungrouped data and returns a 1-row output. -
It is now possible to use
desc(x1)
inarrange()
to sort in decreasing
order ofx1
(this is equivalent to-x1
). -
Add support for argument
names_prefix
inpivot_longer()
. -
Add support for arguments
names_prefix
andnames_sep
inpivot_wider()
. -
Add support for
tidyr::uncount()
. -
All
*_join()
functions now work whenby
is a specification created by
dplyr::join_by()
. Notice that this is limited to equality joins for now. -
You can now use the "embrace" operator
{{ }}
to pass unquoted column names
(among other things) as arguments of custom functions. See the "Programming
with dplyr" vignette
for some examples. -
bind_cols_polars()
now works with twoLazyFrame
s, but not more. -
Add support for argument
.name_repair
inbind_cols_polars()
(#74). -
Support for
.env$
and.data$
pronouns in expressions offilter()
,
mutate()
andsummarize()
. -
Support named vector in the argument
pattern
ofstr_replace_all()
, where
names are patterns and values are replacements. -
Using
%in%
for factor variables doesn't require enabling the string cache
anymore.
Bug fixes
-
summarize()
no longer errors whenacross(everything(), ...)
is used with
.by
. -
All
*_join()
functions no longer error when a named vector is provided in
the argumentby
. -
Expressions with values only are not named "literal" anymore.
Misc
- Simplify the procedure to support new functions.
tidypolars 0.4.0
tidypolars
requires polars
>= 0.11.0.
Breaking changes
- It is no longer possible to pass a list in
rename()
.
New features
-
The argument
with_string_cache
inas_polars()
now enables the string cache
globally if set toTRUE
(#54). -
Better error message in
filter()
when comparing factors to strings while the
string cache is disabled. -
Basic support for
strptime()
. It is possible to usestrptime(*, strict = FALSE)
to not error when the parsing of some characters fails. -
New argument
.by
infilter()
,mutate()
, andsummarize()
, and new
argumentby
in theslice_*()
functions. This allows to do operations on
groups without usinggroup_by()
andungroup()
. See the
dplyr
vignette for
more information (#59). -
rename()
now accepts unquoted names both old and new names. -
Support fixed regexes in
str_detect()
(usingfixed()
) and ingrepl()
(usingfixed = TRUE
).
Bug fixes
-
Improve robustness of sequential expressions in
mutate()
andsummarize()
(i.e expressions that should be run one after the other because they depend on
variables created in the same call) (#58). -
relocate()
now works correctly when.after = last_col()
. -
All functions that work on grouped data now correctly restore the groups
structure (#62).
Misc
-
Error messages coming from
mutate()
,summarize()
, andfilter()
now give
the right function call. -
Faster tidy selection (#61).
tidypolars 0.3.0
tidypolars
requires polars
>= 0.10.0.
Breaking changes
-
All functions starting with
pl_
have been removed to the benefit of the S3
methods. For example,pl_distinct()
doesn't exist anymore so the only way to
use it is to loaddplyr
and to usedistinct()
on a Polars DataFrame or
LazyFrame. This is to avoid confusion about compatibility withdplyr
and
tidyr
. See #49 for a more detailed explanation. -
pl_bind_rows()
andpl_bind_cols()
are renamedbind_rows_polars()
and
bind_cols_polars()
respectively. This is becausebind_rows()
andbind_cols()
are not S3 methods (this might change in future versions ofdplyr
).
New features
-
New function
duplicated_rows()
that is the opposite ofdistinct()
(#50). -
New argument
.id
inbind_rows_polars()
. -
bind_rows_polars()
can now bind Data/LazyFrames that don't have the same
schema. Columns will be upcast to common types if necessary. Unknown columns
will be filled withNA
.
Bug fixes
complete()
now works correctly on grouped data.
Misc
relig_income
andfish_encounters
are not reexported anymore sincetidyr
is now imported.