How and When to Use Ordinal Encoder

Leo Choi
3 min readApr 15, 2021
1

Table of Contents

This is a brief guide aimed at helping you determine whether or not you should use an ordinal encoder for your category encoding. Near the bottom of this post, there are examples of how to use OrdinalEncoder from the category_encoders library in Python.

Statistical Data Measurement Scales

There are a few main data types in statistics (not to be confused with programming language data types):

  • Nominal
  • Ordinal
  • Interval
  • Ratio

In this post we will focus on nominal and ordinal data types.

Nominal data, also referred to as categorical data does not have numerical significance and are not continuous. Some examples would be different colors of cars(red, black, blue, white, etc.), or a list of different animals at a zoo(chimpanzee, zebra, tiger, giraffe, etc.).

Ordinal data is similar to nominal data in that they are both are categorical, except ordinal data types have an added element of order to them. The exact difference or distance between the categories in ordinal data is unknown and/or cannot be measured. An example would be the different answer choices in a satisfaction survey (very dissatisfied, dissatisfied, satisfied, very satisfied) or language proficiency (beginner, intermediate, expert). As previously mentioned, the difference in satisfaction between each category can’t be quantified or put into numbers, but there is a clear progression or hierarchy of the data.

When NOT to use Ordinal Encoder

If the data you are working and trying to interpret has numerical significance, you will most likely not need to use ordinal encoder, or any encoder for that matter. If you can already take the mean, median, mode of your data that provides insight, you’re in luck and won’t need to encode the data.

Ordinal encoder also should not be used if your data has no meaningful order. Going back to the car color example, there is no way to logically order these colors from smallest to largest or worst to best. When working with nominal data, OneHotEncoder or LabelEncoder should do the trick depending on what you need. OneHotEncoder is used with the features/variables, while LabelEncoder is used for the target variables.

When to Use Ordinal Encoder

The name of the encoder gives it away, but ordinal encoders should be used when working with ordinal data. When working with any data related to ranking something with non-numerical categories, ordinal encoder is the way to go.

How to Use Ordinal Encoder

The main ordinal encoder I will be focusing on here will be OrdinalEncoder from category_encoders, which is based off of the scikit-learn library.

This is our sample dataframe:

df

Here we will create a maplist that will be passed into the mapping parameter to tell the encoder which values will be associated with which number, and then instantiate an OrdinalEncoder object while passing in the aforementioned maplist into our mapping parameter:

from category_encoders import OrdinalEncoder maplist = [{'col': 'satisfaction_rating', 'mapping': {'Very Dissatisfied': 0, 'Dissatisfied': 1,'Neutral': 2, 'Satisfied': 3, 'Very Satisfied': 4}}]oe = OrdinalEncoder(mapping=maplist)

WARNING: If you do not use mapping=, the encoder will not know how to order your values and the encoder will pick random integers for you, and your data will most likely not be in order.

Now we will fit and transform to the OrdinalEncoder object to encode our satisfaction_rating column!

oe.fit_transform(df)

And voilà! You successfully used OrdinalEncoder. The majority of problems and errors I encountered were in the process of passing in the mapping list, so just be mindful of that. Please message me if you see any errors or have any questions or comments, happy encoding!

Photo credit:

  1. 4/15/21 <https://www.questionpro.com/blog/client-satisfaction-survey-questions/>

--

--