NEWS.md
The purpose of this release is to make more compatible with FSelector package. We changed some of the default behaviors, so the results might be different between version 2.* and 3.0.
information_gain and discretize get new parameter discIntegers to control if integer columns should be discretized. Default value is TRUE, so it means that they’re treated like numerics. For more information please refer to vignette("integer-variables", package = "FSelectorRcpp").information_gain (remove only those rows which contain NAs dependent variable, NAs in independents variables are removed column-wise).discretize argument all to TRUE.customBreaksControl for creating custom breaks in discretize function.discretize can be now evaluated with data as a first argument in the formula interface
discretize(iris, Species ~ .) or discretize(Species ~ ., iris).discretize(iris, Species ~ .) seems to be more pipe friendly.discretize_transform allows applying the discretization cut points to the new data set.extract_discretize_transformer produces small object containing all cutpoints. It can be also used to transform the new data set.
extract_discretize_transformer can be useful in ML pipelines where the training data needs to be discarded to save memory.Bug fixes: - Fixed build using Rcpp 0.12.12 - feature_search now returns proper structure.
Bug fixes:
Bug fixes:
RTCGA.rnaseq package is not available.Rcpp (free of Java/Weka) implementation of FSelector entropy-based feature selection algorithms with sparse matrix support.
Provided functions
discretize() with additional equalsizeControl() and mdlControl - discretize a range of numeric attributes in the dataset into nominal attributes. Minimum Description Length (MDL) method is set as the default control. There is also available equalsizeControl() method.information_gain() - algorithms that find ranks of importance of discrete attributes, basing on their entropy with a continous class attribute,feature_search() - a convenience wrapper for and feature selection algorithms that extract valuable attributes depending on the evaluation method (called evaluator),cut_attrs() - select attributes by their score/rank/weights, depending on the cutoff that may be specified by the percentage of the highest ranked attributes or by the number of the highest ranked attributes,to_formula() (misc) - create a formula object from a vector.