Python101/Session 9 - Heatmaps and me.../Session 9 - Heatmaps and me...

265 KiB

<html> <head> </head>

Heatmaps from dataframes

A pandas dataframe can be visualized by means of a so-called heatmap. A heatmap consists of tiles at each data point that adheres to a chosen color palette.

Many plotting libraries offer the possibility of creating heatmaps. A common one is called seaborn, which is built on top of matplotlib to enhance certain plot types.

Before the dataframe can be plotted to a heatmap, it needs to be in the right format. If we for example have a dataframe with $x$-values in one column, $y$-values in another and the values to be plotted in a third column we can pivot the data to a new dataframe:

In [41]:
import pandas as pd

# Creata a dummy dataframe to pivot
df = pd.DataFrame({
    'x': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    'y': ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'],
    'values': [34, 74, 1, 9, -36, -24, 47, -27, 47]})

df
Out[41]:
x y values
0 1 a 34
1 1 b 74
2 1 c 1
3 2 a 9
4 2 b -36
5 2 c -24
6 3 a 47
7 3 b -27
8 3 c 47
In [44]:
# Pivot the dataframe
df_pivot = df.pivot(index='y', columns='x', values='values')
df_pivot
Out[44]:
x 1 2 3
y
a 34 9 47
b 74 -36 -27
c 1 -24 47

A heatmap can be created from a dataframe like this:

In [43]:
import seaborn as sns

# Plot the pivotted dataframe as a heatmap
sns.heatmap(df_pivot, annot=True)
Out[43]:
<matplotlib.axes._subplots.AxesSubplot at 0x28d74e097b8>

Note: Make sure you have installed the seaborn library before using this. Otherwise a ModuleNotFoundError will be raised. Recall that libraries are installed via the Anaconda Prompt by typing pip install <library_name>.

Heatmap parameters

seaborn.heatmap has many parameters for tweaking the appearance of the colormap. Often a bunch of them are needed to create a good looking plot.

Special attention should be paid to choosing a colormap that fits the dataset well. A bad choice in colormap can be very misleading to the reader of the plot, while a well-chosen one can convey the overall message to the reader in very few seconds.

See https://seaborn.pydata.org/generated/seaborn.heatmap.html

Some of the parameters for seaborn.heatmap:

  • annot=True for putting the value in each tile.
  • fmt to set the number of decimals for the annotated tiles. Set equal to ".0f" for 0 decimals.
  • annot_kws={'size': 10} for setting fontsize of annotated tiles to 10.
  • square=True for ensuring that tiles are square.
  • cmap=name_of_colormap for controlling the colomap (see available colormaps here: https://matplotlib.org/users/colormaps.html, be sure to choose one that fits the content of the data).
  • vmin and vmax to define the min and max values of the colormap (and colorbar).
  • cbar_kws={"orientation": "horizontal"} for orientation of the colorbar. Here set to horizontal but is vertical as default.

Merge operations on dataframes

Merge operations provide very powerful manipulation techniques in pandas. We are only gonna look at a simple example here, which will perform an operation similar ti Excel's VLOOKUP.

# Merge df1 and df2 on <column_to_merge_on>, retain only rows from df1  (similar to Excel VLOOKUP)
df_merged = df1.merge(df2, on='<column_to_merge_on>', how='left')

See this page of the pandas documentation for more on merging, joining and concatenating dataframes: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

Exercise 1

The file piles.csv in the session folder contains some piles and their profile type. Use steel_profiles.csv to insert the cross sectional parameters in each pile row. Use the merging operation described above.

Exercise 2

The file Crack_width_Seg7_y_direction.csv has results of a crack width calculation in a base slab from the ESS project.

The sectional forces from IBDAS were exported into Excel and each $(N, M)$ pair was run through our standard spreadsheet for calculating the crack with in the Quasi-Permanent load combination. The results were exported to this csv-file in order to create a presentable plot for the documentation report.

While doing the exercises below, recall that df.head() will print the first five rows of df.

Exercise 2.1

Load the file Crack_width_Seg7_y_direction.csv into a dataframe. Filter the dataframe so it only contains rows where the criterion column has max My.

Exercise 2.2

Pivot the dataframe and save it to a new dataframe. Use column 'y[m]' as index, column 'x[m]' as columns and column 'w_k[mm]' as values.

Exercise 2.3

Create a heatmap of the pivotted dataframe. Use parameters of your choice from the ones described above or in the pandas documentation.

Be sure to choose a colormap, the default one is awful. The maximum allowable crack width for this concrete slab is $w_{k.max} = 0.30$mm.

In [ ]:
 
</html>