NEWS.md
The purpose of this release is to make more compatible with FSelector
package. We changed some of the default behaviors, so the results might be different between version 2.*
and 3.0
.
information_gain
and discretize
get new parameter discIntegers
to control if integer columns should be discretized. Default value is TRUE
, so it means that they’re treated like numerics. For more information please refer to vignette("integer-variables", package = "FSelectorRcpp")
.information_gain
(remove only those rows which contain NAs dependent variable, NAs in independents variables are removed column-wise).discretize
argument all
to TRUE
.customBreaksControl
for creating custom breaks in discretize
function.discretize
can be now evaluated with data as a first argument in the formula interface
discretize(iris, Species ~ .)
or discretize(Species ~ ., iris)
.discretize(iris, Species ~ .)
seems to be more pipe friendly.discretize_transform
allows applying the discretization cut points to the new data set.extract_discretize_transformer
produces small object containing all cutpoints. It can be also used to transform the new data set.
extract_discretize_transformer
can be useful in ML pipelines where the training data needs to be discarded to save memory.Bug fixes: - Fixed build using Rcpp 0.12.12 - feature_search now returns proper structure.
Bug fixes:
Bug fixes:
RTCGA.rnaseq
package is not available.Rcpp (free of Java/Weka) implementation of FSelector entropy-based feature selection algorithms with sparse matrix support.
Provided functions
discretize()
with additional equalsizeControl()
and mdlControl
- discretize a range of numeric attributes in the dataset into nominal attributes. Minimum Description Length (MDL) method is set as the default control. There is also available equalsizeControl()
method.information_gain()
- algorithms that find ranks of importance of discrete attributes, basing on their entropy with a continous class attribute,feature_search()
- a convenience wrapper for and feature selection algorithms that extract valuable attributes depending on the evaluation method (called evaluator),cut_attrs()
- select attributes by their score/rank/weights, depending on the cutoff that may be specified by the percentage of the highest ranked attributes or by the number of the highest ranked attributes,to_formula()
(misc) - create a formula
object from a vector.