Weighted Tasting, Tidyr Verbs, Robust Scaler, RAPIDS, and more

sparklyr 1.4 is now offered on CRAN! To set up sparklyr 1.4 from CRAN, run

In this article, we will display the following much-anticipated brand-new performances from the sparklyr 1.4 release:

Parallelized Weighted Tasting

Readers acquainted with dplyr:: sample_n() and dplyr:: sample_frac() functions might have discovered that both of them support weighted-sampling usage cases on R dataframes, e.g.,

 dplyr::  sample_n( mtcars, size  =  3, weight  =  mpg, change  =  FALSE)

 mpg cyl disp hp drat wt qsec vs am equipment carbohydrate
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4

and

 dplyr::  sample_frac( mtcars, size  =  0.1, weight  =  mpg, change  =  FALSE)

 mpg cyl disp hp drat wt qsec vs am equipment carbohydrate
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1

will choose some random subset of mtcars utilizing the mpg associate as the tasting weight for each row. If change = FALSE is set, then a row is gotten rid of from the tasting population once it gets picked, whereas when setting change = REAL, each row will constantly remain in the tasting population and can be picked several times.

Now the precise very same usage cases are supported for Glow dataframes in sparklyr 1.4! For instance: