Python Datatypes¶

All objects in Python have a datatype. If you want to know the datatype of an object, you can simply use the type() function. The main datatypes of Python are:

Integer
Float
String
Boolean
List
Dictionary

Let’s take a look at each one.

1) Integer¶

The integer is a numerical datatype. It’s a whole number, which means that it does not have any decimals and cannot be expressed as a fraction.

Examples of integers:

population
number of cities
year

population = 1000
type(population)

int

“int” is short for “integer”! 😎

2) Float¶

The float is a real number written in scientific notation with decimals. This is useful when more precision is needed.

Examples of floats:

cost of a latte
weight
distance in miles

cost_of_latte = 4.50
type(cost_of_latte) 

float

Fractions are also expressed as floats (even if the output is theoretically a whole number):

cost_per_egg = 12/12 
type(cost_per_egg)

float

Mixing Floats and Integers¶

If we want to convert a float to an integer (or vice versa), we can easily do so by wrapping the variable in int() or float(). Let’s try this out:

cost_per_apple = 3.55
n_apples = 10

print(f"Cost of apple: float {cost_per_apple} --> int {int(cost_per_apple)}")
print(f"Number of apples: int {n_apples} --> float {float(n_apples)}")

Cost of apple: float 3.55 --> int 3
Number of apples: int 10 --> float 10.0

When you “cast” (convert) a float into an integer using int(), it will trim the values after the decimal point and returns only the integer/whole number part. In other words, int() will always round down to the whole number.

Note

The print statements above use something called f-strings. Why is it called f-string? If you notice in the code above, the string inside the print statement is preceded by an “f” - this puts it in “f-string mode”. To embed an expression in your string, you need to wrap it inside squiggly brackets { }. The f-string is only available in Python 3.6 or greater. It lets you embed Python expressions inside string literals in a readable way. Before Python 3.6, you would have to use %-formatting or .format() to embed expressions inside strings, which was much more verbose and prone to error.

In Python, it’s possible to mix integers and floats in an arithmetic operation. So you don’t need to worry about converting these numeric types into a common format. Let’s test it out with our variables n_apples (an integer) and cost_per_apple (a float).

total_cost = n_apples*cost_per_apple
total_cost

35.5

We can see that the output of n_apples * cost_per_apple is a float. This is because n_apples, which was originally an integer, gets converted to a float when it gets multiplied with cost_per_apple.

Here’s a complete list of arithmetic operations in Python:

Addition: gets the sum of the operands

x + y

Subtraction: gets the difference of the operands

x - y

Multiplication: gets the product of the operands

x * y

Division: produces the quotient of the operands and returns a float

x / y

Division with floor: produces the quotient of the operands and returns an integer (rounds down)

x // y

Exponent: raises the first operand to the power of the second operand

x ** y

3) String¶

The string datatype is typically used to store text. We can think of a string as a “sequence of charactertics” which can be alphateic, numeric, or having special characters. A string is surrounded by quotations, which can be either double quotes " " or single ' ' quotes.

Examples of strings:

name of city
address
Canadian postal code

name_of_city = 'Toronto'
type(name_of_city) 

str

If a string contains an apostrophe, we can use double quotes to define the string and use a single quote character in the string.

text = "It's snowing outside"
print(text)

It's snowing outside

It’s important to note that anything surrounded by quotations is treated as a string. For example, if you wrap an integer in quotations, its datatype will be a string.

number_of_planets = "9"
type(number_of_planets)

str

Strings within Strings¶

If we want to see if a shorter string is inside a longer string, we can use the in operator.

'el' in 'Hello'

True

Built-in Functions¶

Strings have some special built-in functions that are useful when you’re analyzing data.

text.upper() - converts text to all uppercase
text.lower() - converts text to all lowercase
text.capitalize() - capitalizes text (first character is made uppercase, followed by all lowercase characters)
len(text) - measures the length of a string (i.e., character count)
text.replace('t', 'a') - replaces a part of the string with another string

4) Boolean¶

A boolean is a binary datatype which can be either True or False. For those of you who are familiar with other programming languages, it’s important to note that Python’s boolean datatype must be capitalized - uppercase T for True and uppercase F for False. Booleans are often used to answer a yes/no question like “is it nighttime?” or “is the patient female?”.

Examples of booleans:

is it morning?
is the patient on meds?
does x equal y?

is_morning = False
type(is_morning)

bool

“bool” is short for boolean! 😎

Comparing Values with Boolean Expressions¶

A boolean expression evaluates a statement and results in a boolean value. For example, the operator == tests if two values are equal.

is_vegan = False
is_vegetarian = True 

is_vegan == is_vegetarian

False

You can also compare two numeric values using:

> (greater than)
< (less than)
>= (greater than or equal to)
<= (less than or equal to)

n_donuts = 10
n_muffins = 5

n_donuts >= n_muffins

True

Comparing Strings with Boolean Expressions¶

Interestingly, you can also compare two strings. The evaluation goes by alphabetical order so the “larger” item would be higher up in the alphabet.

server = 'Anne'
host = 'Jim'

server > host

False

5) List¶

Lists represent a collection of objects and are constructed with square brackets, separating items with commas. A list can contain a collection of one datatype:

list_of_integers = [1,2,3,4,5]

It can also contain a collection of mixed datatypes:

list_of_mixed_datatypes = ['cat', 10, 'belarus', True]

Let’s start with a simple list that captures the number of hours slept by a group of friends:

hours_slept = [10,12,5,8]

To get the length (count) of a list, you can use len().

len(hours_slept)

To get the sum of numbers in a list, you can use sum(). This will only work if all elements in the list are numeric.

sum(hours_slept)

You can get the smallest and largest values of a list using min() and max(), respectively.

min(hours_slept)

max(hours_slept)

Sorting Lists¶

You can also sort elements within a list using the .sorted() function, which sorts the list from lowest to highest value.

hours_slept.sort()
hours_slept

[5, 8, 10, 12]

You can also reverse the order of the sort, from highest to lowest value, sing .reverse().

hours_slept.reverse()
hours_slept

[12, 10, 8, 5]

Lists are ordered¶

Lists are ordered which means that the order of elements within a list is part of a list’s identity. You can have two lists with the exact same elements but if the order of elements are different, these lists are not the same. Let’s demonstrate this with an example.

list1 = [1,2,3,4]
list2 = [4,3,2,1]

list1 == list2

False

list1 and list2 are not equal to one another since the order of their elements are different.

The Index: Accessing Elements within a List¶

You can access elements in a list by referencing its index. The index of a list starts at 0, which is probably different from what you’re use to if you come from an R or Matlab background.

Let’s say we want to go grocery shopping. We made a list of all the items we want to buy:

Each item in this list has a location (an index).

A list can have negative indices too. A negative list index counts from the end of a list.

We can get an individual item from a list using shopping_list[index]. Let’s test this out!

shopping_list = ['apples', 'carrots', 'chocolate', 'bananas', 'onions']

print(shopping_list[0])
print(shopping_list[1])
print(shopping_list[2])
print(shopping_list[3])
print(shopping_list[4])

apples
carrots
chocolate
bananas
onions

Now, let’s try calling each item by its negative index.

print(shopping_list[-5])
print(shopping_list[-4])
print(shopping_list[-3])
print(shopping_list[-2])
print(shopping_list[-1])

apples
carrots
chocolate
bananas
onions

Slicing a list¶

You can get a subset of a list, or “slice” it, using list indices. If shopping_list is a list, the expression [m:n] returns the portion of shopping_list from the index m to BUT not including index n. Let’s see how this works.

shopping_list[1:3]

['carrots', 'chocolate']

The code above returns ‘carrots’ and ‘chocolate’, which are represented by indices 1 and 2. It didn’t return index 3 (bananas) because the second number of the slice is non-inclusive. To include index 3, we would have to update the slice to [1:4]:

shopping_list[1:4]

['carrots', 'chocolate', 'bananas']

Finding Elements in a List¶

You can check to see if an element exists inside a list using the in operator.

'carrots' in shopping_list

True

'milk' in shopping_list

False

Iterating Over Lists¶

There are several ways to iterate over a list. The traditional approach is to use a for loop.

for item in shopping_list:
    print(item)

apples
carrots
chocolate
bananas
onions

If you also need the element’s index in your for loop, you can access it using enumerate().

for i, item in enumerate(shopping_list):
    print(f"{i+1}) {item}")

1) apples
2) carrots
3) chocolate
4) bananas
5) onions

Another way to iterate over a list is to use list comprehension. This is a one-liner that is useful when you’re applying a simple operation to each element in your list. For example, let’s make all elements inside shopping_list uppercase.

[item.upper() for item in shopping_list]

['APPLES', 'CARROTS', 'CHOCOLATE', 'BANANAS', 'ONIONS']

Lists are Mutable¶

An important feature of a list is that it’s mutable. This means that elements within a list can be added, deleted, or changed after being defined.

To add a new element to a list, you can use .extend():

shopping_list.extend(['milk'])
shopping_list

['apples', 'carrots', 'chocolate', 'bananas', 'onions', 'milk']

You can also add another list like this:

more_food = ['cake', 'watermelon']
shopping_list += more_food
shopping_list

['apples',
 'carrots',
 'chocolate',
 'bananas',
 'onions',
 'milk',
 'cake',
 'watermelon']

To remove the last element of a list, you can “pop” it:

shopping_list.pop()
shopping_list

['apples', 'carrots', 'chocolate', 'bananas', 'onions', 'milk', 'cake']

If you wanted to remove a specific element from your list, you can use the remove() method.

shopping_list.remove('carrots')
shopping_list

['apples', 'chocolate', 'bananas', 'onions', 'milk', 'cake']

6) Dictionary¶

Dictionaries are used to store data values in key:value pairs. Similar to the list, a dictionary is a collection of objects. It is also mutable, meaning that you can add, remove, change values inside of it.

Note

If you’ve ever worked with JSON before, a dictionary is very similar to the JSON object. In fact, if you load JSON data into Python, it will be expressed as a dictionary. Similarly, you can write a Python dictionary to a JSON file.

With the list, we access elements using the index. With the dictionary, we access elements using keys. Let’s take a look at an example of a dictionary which captures population information about boroughs in New York City:

population_nyc = {
    'bronx': 1472654,
    'brooklyn': 2736074,
    'manhattan': 1694251, 
    'queens': 2405464,
    'staten_island': 495747
}

type(population_nyc)

dict

“dict” is short for “dictionary”! 😎

In this dictionary, the “key” is the borough name and the “value” is the population of that borough. To get a particular value, we need to know the key of that value.

For example, let’s say we want to get the population of Manhattan. We can do so by doing this:

population_nyc['manhattan']

We can get all keys of a dictionary using .keys():

population_nyc.keys()

dict_keys(['bronx', 'brooklyn', 'manhattan', 'queens', 'staten_island'])

We can get all values of a dictionary using .values():

population_nyc.values()

dict_values([1472654, 2736074, 1694251, 2405464, 495747])

You can add a new key-value pair to the dictionary like this:

population_nyc['long_island'] = 8063232
population_nyc

{'bronx': 1472654,
 'brooklyn': 2736074,
 'manhattan': 1694251,
 'queens': 2405464,
 'staten_island': 495747,
 'long_island': 8063232}

You can also change the value of a key like this:

population_nyc['long_island'] = 8
population_nyc

{'bronx': 1472654,
 'brooklyn': 2736074,
 'manhattan': 1694251,
 'queens': 2405464,
 'staten_island': 495747,
 'long_island': 8}

Long Island is technically not part of NYC so let’s remove it from our dictionary. We can remove the “long_island” key-value pair using .pop(key_name).

population_nyc.pop('long_island')
population_nyc

{'bronx': 1472654,
 'brooklyn': 2736074,
 'manhattan': 1694251,
 'queens': 2405464,
 'staten_island': 495747}

Practical Python for Data Science

Python Datatypes

Contents