sparklyr
1.4 is now offered on CRAN! To set up sparklyr
1.4 from CRAN, run
In this article, we will display the following much-anticipated brand-new performances from the sparklyr
1.4 release:
Parallelized Weighted Tasting
Readers acquainted with dplyr:: sample_n()
and dplyr:: sample_frac()
functions might have discovered that both of them support weighted-sampling usage cases on R dataframes, e.g.,
dplyr:: sample_n( mtcars, size = 3, weight = mpg, change = FALSE)
mpg cyl disp hp drat wt qsec vs am equipment carbohydrate
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
and
dplyr:: sample_frac( mtcars, size = 0.1, weight = mpg, change = FALSE)
mpg cyl disp hp drat wt qsec vs am equipment carbohydrate
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
will choose some random subset of mtcars
utilizing the mpg
associate as the tasting weight for each row. If change = FALSE
is set, then a row is gotten rid of from the tasting population once it gets picked, whereas when setting change = REAL
, each row will constantly remain in the tasting population and can be picked several times.
Now the precise very same usage cases are supported for Glow dataframes in sparklyr
1.4! For instance: