ordered is a parsnip extension to enable additional classification models for ordinal outcomes (e.g., “low”, “medium”, “high”). While there are several model/engine combinations in the parsnip package that can be used, this package adds:
- cumulative link (cumulative logit) ordinal regression via
MASS::polr() - generalized linear and generalized additive ordinal regression models of cumulative link, adjacent categories, continuation ratio, and stopping ratio families via
VGAM::vglm()andVGAM::vgam()(Yee, 2015) - regularized elastic net ordinal regression models of cumulative link, adjacent categories, continuation ratio, and stopping ratio families via
ordinalNet::ordinalNet()(Wurm, Hanlon, and Rathouz, 2021) - ordinal classification trees via
rpartScore::rpartScore()(Galimberti, Soffritti, and Di Maso, 2012) - latent variable ordinal forests via
ordinalForest::ordfor()(Hornung, 2020)
More will be added.
There are some existing features in tidymodels packages that are useful for ordinal outcomes:
- The partykit engines for
parsnip::decision_tree()andparsnip::rand_forest()use the ordered nature of the factors to train the model. - The yardstick package has
yardstick::kap()for weighted and unweighted Kappa statistics (the former being of more interest). Also,yardstick::classification_cost()can utilize more complex cost structures and uses the class probabilities for estimation.
Installation
You can install the development version of ordered like so:
# install.packages("pak")
pak::pak("corybrunson/ordered", dependencies = FALSE)Currently, ordered relies on engine registration in a parsnip fork and dial registration in the current development version:
Available models, engines, and prediction types
ordered provides new engines for several models, including all engines for ordinal_reg(), as summarized in the table. Currently only predictions of type = "class" and type = "prob" are supported.
| model | engine | class | prob |
|---|---|---|---|
decision_tree |
rpartScore |
✔ | ✖ |
gen_additive_mod |
vgam |
✔ | ✔ |
ordinal_reg |
polr |
✔ | ✔ |
ordinal_reg |
ordinalNet |
✔ | ✔ |
ordinal_reg |
vglm |
✔ | ✔ |
rand_forest |
ordinalForest |
✔ | ✔ |
Example
Here is a simple example using computational chemistry data to predict the permeability of a molecule:
library(dplyr)
library(ordered)
data(caco, package = "QSARdata")
caco_dat <-
inner_join(caco_Outcome, caco_Dragon, by = "Molecule") %>%
as_tibble() %>%
select(
class = Class,
mol_weight = QikProp_mol_MW,
volume = QikProp_volume,
ClogP
)
caco_train <- caco_dat[-(1:10), ]
caco_test <- caco_dat[ (1:10), ]
ord_rf_spec <-
# you should really use many more trees and score sets
rand_forest(mtry = 2, trees = 100) %>%
set_mode("classification") %>%
set_engine("ordinalForest", nsets = 100)
set.seed(382)
ord_rf_fit <- ord_rf_spec %>% fit(class ~ ., data = caco_train)
augment(ord_rf_fit, new_data = caco_test)
#> # A tibble: 10 × 8
#> .pred_class .pred_L .pred_M .pred_H class mol_weight volume ClogP
#> <ord> <dbl> <dbl> <dbl> <ord> <dbl> <dbl> <dbl>
#> 1 M 0.370 0.384 0.246 M 123. 445. 0.799
#> 2 M 0.250 0.533 0.217 L 290. 856. 0.534
#> 3 M 0.178 0.801 0.0212 M 519. 1576. 1.02
#> 4 M 0.221 0.736 0.0431 M 533. 1606. 1.58
#> 5 M 0.135 0.762 0.103 M 505. 1517. 1.71
#> 6 M 0.0698 0.913 0.0176 M 519. 1547. 2.27
#> 7 M 0.220 0.738 0.0417 M 517. 1600. 1.78
#> 8 M 0.109 0.868 0.0229 M 531. 1631. 2.34
#> 9 M 0.0307 0.952 0.0177 M 517. 1572. 2.81
#> 10 L 0.603 0.394 0.003 L 588. 1799. -1.85Code of Conduct
Please note that the ordered project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
To voice support for or volunteer to contribute additional engines, please comment on this issue.