Most datasets that data scientists or analysts can work on are variables that describe a set of observations.
In general, variables will be found in two different types which are separate and quantitative.
Dive a little deeper into the different variable types to understand how to identify them in a dataset.
We can think of any information about an observation about a quantitative variable that can only be described with numbers. Discrete Variables
Single quantitative variables are numeric values that represent counting and can only take numeric values.
When working with discrete variables in a dataset, you see something similar to these values:
team_wins num_of_players num_goals_season num_fouls_season Flaskers 4 21 8 2 Pythons 5 15 13 4 Coders 10 17 9 5 Julias 3 18 7 3
When inspecting a dataset for discrete variables, ask yourself if the variable makes sense if you add .5 to any of the values. Continuous Variables
Continuous quantitative variables are numerical measurements that can be expressed as decimal accuracy. weight age height temperature Michael 61.28 21.5 76.03 36.21 McKensey 83.1 27.13 85.201 37.3 Joel 69.7 34.901 77.34 36.918 Barry 56.310 31.5 72.13 37.594
Let’s take a look at the
Sometimes the line between discrete and permanent variables can cause a slight blur. Categorical Variables
Hierarchical variables differ from quantitative variables because they focus on different ways in which data can be grouped instead of counting or measuring. Ordinal Variables
Do you remember working with columns in a dataset where column values were groups that were internally higher or lower than each other?
Other examples of common variables may include sports competition, age limits, and customer ratings for a product or service.
place company_seniority age_group customer_rating A. Jacobs 1 junior 20-25 very_satisfied McKensey 3 senior 35-40 satisfied Joel 7 executive 30-35 satisfied Barry 4 mid 50-55 very_satisfied
When working with minor variables it is important to keep in mind that differences between categories can vary. Nominal Variables
Nominal hierarchical variables are variables that have two or more categories that have no relative order. pet_type color favorite_food adoption_city Fluffy cat orange fish_pate Sacramento Bruno dog white peanut_butter Mt. Shasta Alfie bird blue sunflower_seed San Francisco Bitsy turtle green apple Los Angeles
The number of possible values for the nominal variable can be quite large.
Sometimes, it can be difficult to identify a nominal variable if the attributes in that variable are minor or quantitative. adoption_city city_rel_temp Sacramento cool Mt. Shasta coldest San Francisco warm Los Angeles warmest
Now we can see that
The binary or dichotomous variable is a special type of nominal variable with only two types. status is_awake is_stable patient_101 1 True No patient_304 0 True Yes patient_107 1 False No patient_514 1 False Yes
Now that you’ve learned about the variable types, test your knowledge by answering the following questions: