Pandas

This section is currently unfinished and will be updated further!

Pandas is a Python library that makes it easier to work with data.

It helps programmers to store, clean, sort and analyse data - especially data that looks like a table (rows and columns), similar to a spreadsheet in Excel.

Example 1 - Reading a CSV File

Suppose there is a file named products_complete.csv that contains the following data:

1ProductID,ProductName,Category,Price,Stock
21,Laptop,Electronics,750,10
32,Mouse,Accessories,25,50
43,Keyboard,Accessories,40,25
54,Monitor,Electronics,150,15
65,Printer,Electronics,120,20

We can use Pandas to load this file and display its contents using the following code:

1import pandas as pd
2
3data = pd.read_csv("products_complete.csv")
4
5print(data)
Show Output
   ProductID ProductName     Category  Price  Stock
0          1      Laptop  Electronics    750     10
1          2       Mouse  Accessories     25     50
2          3    Keyboard  Accessories     40     25
3          4     Monitor  Electronics    150     15
4          5     Printer  Electronics    120     20

Explanation

The first line imports the Pandas library and gives it a shorter name, pd. This makes it easier to use later in the program without typing the full word “pandas” each time.

The third line tells Pandas to read a file called products_complete.csv. The information from this file is stored in a special data structure called a DataFrame, which works like a table with rows and columns. The variable named data now holds all of this information.

The final line shows the contents of the DataFrame on the screen. When this code is run, it will show the data from the CSV file in a table format, similar to how it would appear in a spreadsheet such as Excel.

Example 2 - Calculating the Average Price

1import pandas as pd
2
3data = pd.read_csv("products_complete.csv")
4
5averagePrice = data["Price"].mean()
6
7print(averagePrice)
Show Output

217.0

Explanation

The fifth line looks at the Price column in the DataFrame and uses the .mean() function to calculate the average (mean) price of all the products. The result is then stored in a variable called averagePrice.

The final line displays the average price on the screen.

Example 3 - Finding the Highest Price

1import pandas as pd
2
3data = pd.read_csv("products_complete.csv")
4
5maxPrice = data["Price"].max()
6
7print(maxPrice)
Show Output

750

Explanation

The fifth line looks at the Price column and finds the maximum value - in other words, the highest price among all the products in the dataset. The result is stored in a variable called maxPrice.

The final line displays that highest price on the screen.

Example 4 - Finding the Lowest Price

1import pandas as pd
2
3data = pd.read_csv("products_complete.csv")
4
5minPrice = data["Price"].min()
6
7print(minPrice)
Show Output

25

Explanation

The fifth line looks at the Price column and finds the minimum value - in other words, the lowest price among all the products in the dataset. The result is stored in a variable called minPrice.

The final line displays that lowest price on the screen.

Example 5 - Finding the Most and Least Expensive Products

1import pandas as pd
2
3data = pd.read_csv("products_complete.csv")
4
5mostExpensive = data.loc[data["Price"].idxmax(), "ProductName"]
6leastExpensive = data.loc[data["Price"].idxmin(), "ProductName"]
7
8print(f"Most Expensive Product: {mostExpensive}")
9print(f"Least Expensive Product: {leastExpensive}")
Show Output
Most Expensive Product: Laptop
Least Expensive Product: Mouse

Explanation

The fifth line finds the name of the product with the highest price. The idxmax() function identifies the row number where the price is the highest and .loc[ , "ProductName"] looks up the product name in that same row.

The sixth line does the same thing but finds the name of the product with the lowest price instead. The idxmin() function identifies the row where the price is the smallest.

The final two lines use f-strings to display both results in a clear format.