# Generating an excel report with python

## Pandas pivot tables to an excel sheet.

For many data analysts and business people excel is a powerful tool for reporting. But very often excel reports become cumbersome and difficult to extend, especially when it comes to gathering data from several sources. In this post we will generate an excel report using python (pandas and openpyxl). We will also look briefly at a way to create an excel template and then generate an excel report with several sheets based on this template.

We will be looking at some startups investments data.

```
1 import pandas as pd
2 import numpy as np
3
4 df = pd.read_csv('investments.csv', sep=';')
```

To make the report simple, we will be using only a subset of the columns in the csv file, and we will look at the data from 2007 for funding rounds of type: angel, seed, venture, private equity, and undisclosed.

```
1 df1 = df[['funded_year', 'funded_quarter', 'funding_round_type', 'raised_amount_usd']]
2 df1 = df1[df.funded_year >= 2007]
3 df1 = df1[df1.funding_round_type.isin(['angel', 'private_equity', 'seed', 'venture', 'undisclosed'])]
```

The data look like this:

funded_year | funded_quarter | funding_round_type | raised_amount_usd |
---|---|---|---|

2014 | 2014-Q3 | venture | 5 956 174 |

2014 | 2014-Q3 | private_equity | 81 216 295 |

2014 | 2014-Q3 | seed | 1 300 000 |

2014 | 2014-Q3 | seed | NaN |

2014 | 2014-Q2 | seed | 500 000 |

2014 | 2014-Q3 | venture | 2 500 000 |

2014 | 2014-Q3 | venture | 7 000 000 |

… | … | … | … |

We notice that the data contains some undefined values, so we have to do some cleaning

```
1 df1 = df1.fillna(0)
2 def to_int(x):
3 try:
4 x = int(x.replace(' ', ''))
5 except (AttributeError, ValueError):
6 return 0
7
8 df1.raised_amount_usd = df1.raised_amount_usd.map(to_int)
9 df1.raised_amount_usd = df1.raised_amount_usd.astype(int)
```

Now let’s create our pivot tables

```
1 table1 = pd.pivot_table(df1, index=['funding_round_type'],
2 columns=['funded_year', 'funded_quarter'],
3 values=['raised_amount_usd'],
4 aggfunc=[lambda x: len(x)],fill_value=0)
5
6 table2 = pd.pivot_table(df1, index=['funding_round_type'],
7 columns=['funded_year', 'funded_quarter'],
8 values=['raised_amount_usd'],
9 aggfunc=[np.sum],fill_value=0)
```

This will generate a 2 pivot tables similar to this:

<count> | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

raised_amount_usd | |||||||||||||||

funded_year | 2007 | 2008 | ... | 2013 | 2014 | ||||||||||

funded_quarter | 2007-Q1 | 2007-Q2 | 2007-Q3 | 2007-Q4 | 2008-Q1 | 2008-Q2 | ... | 2013-Q2 | 2013-Q3 | 2013-Q4 | 2014-Q1 | 2014-Q2 | 2014-Q3 | ||

funding_round_type | |||||||||||||||

angel | 97 | 67 | 50 | 71 | 122 | 69 | ... | 161 | 226 | 141 | 211 | 123 | 77 | ||

private_equity | 16 | 14 | 22 | 10 | 13 | 13 | ... | 244 | 187 | 132 | 101 | 131 | 68 | ||

seed | 110 | 88 | 92 | 111 | 219 | 143 | ... | 1870 | 2368 | 2097 | 1922 | 1686 | 1763 | ||

undisclosed | 395 | 268 | 255 | 243 | 362 | 354 | ... | 710 | 454 | 647 | 830 | 304 | 98 | ||

venture | 1294 | 1207 | 1297 | 1144 | 1440 | 1434 | ... | 1856 | 2065 | 2147 | 2263 | 3019 | 2905 |

5 rows × 32 columns

Now let’s generate an excel report with 2 sheets containing the results of the pivot tables

```
1 writer = pd.ExcelWriter('report.xlsx')
2 table1.to_excel(writer, 'Sheet1')
3 table2.to_excel(writer, 'Sheet2')
4 writer.save()
```

The result output

To go a bit further, let’s imagine that we want to generate the report from a template. For this we will need the python module openpyxl, because it’s the only python module that allows to modify an excel file.

The first step is to create a template for our report.

You can notice that we named an excel range `values`

to be able to update the values in that range easily from the python side. Here’s an utility function to update an excel sheet from python given a named range or cell range.

```
1 import openpyxl
2 from openpyxl import load_workbook
3
4 def update_range(worksheet, data, cell_range=None, named_range=None):
5 """
6 Updates an excel worksheet with the given data.
7 :param worksheet: an excel worksheet
8 :param data: data used to update the worksheet cell range (list, tuple, np.ndarray, pd.Dataframe)
9 :param cell_range: a string representing the cell range, e.g. 'AB12:XX23'
10 :param named_range: a string representing an excel named range
11 """
12
13 def clean_data(data):
14 if not isinstance(data, (list, tuple, np.ndarray, pd.DataFrame)):
15 raise TypeError('Invalid data, data should be an array type iterable.')
16
17 if not len(data):
18 raise ValueError('You need to provide data to update the cells')
19
20 if isinstance(data, pd.DataFrame):
21 data = data.values
22
23 elif isinstance(data, (list, tuple)):
24 data = np.array(data)
25
26 return np.hstack(data)
27
28 def clean_cells(worksheet, cell_range, named_range):
29 # check that we can access a cell range
30 if not any((cell_range, named_range) or all((cell_range, named_range))):
31 raise ValueError('`cell_range` or `named_range` should be provided.')
32
33 # get the cell range
34 if cell_range:
35 try:
36 cells = np.hstack(worksheet[cell_range])
37 except (CellCoordinatesException, AttributeError):
38 raise ValueError('The cell range provided is invalid, cell range must be in the form XX--[:YY--]')
39
40 else:
41 try:
42 cells = worksheet.get_named_range(named_range)
43 except (NamedRangeException, TypeError):
44 raise ValueError('The current worksheet {} does not contain any named range {}.'.format(
45 worksheet.title,
46 named_range))
47
48 # checking that we have cells to update, and data
49 if not len(cells):
50 raise ValueError('You need to provide cells to update.')
51
52 return cells
53
54 cells = clean_cells(worksheet, cell_range, named_range)
55 data = clean_data(data)
56
57 # check that the data has the same dimension as cells
58 if len(cells) != data.size:
59 raise ValueError('Cells({}) should have the same dimension as the data({}).'.format(len(cells), data.size))
60
61 for i, cell in enumerate(cells):
62 cell.value = data[i]
```

Now we are ready to generate our first report from the template

```
1 import openpyxl
2 from openpyxl import load_workbook
3
4 # load the excel template
5 workbook = load_workbook('template.xlsx')
6
7 # number of funding rounds by year
8 table1 = pd.pivot_table(df1, index=['funding_round_type'],
9 columns=['funded_year'],
10 values=['raised_amount_usd'],
11 aggfunc=[lambda x: len(x)], fill_value=0)
12
13 worksheet = workbook.active
14 worksheet.title = 'count_by_year'
15 update_range(workbook.active, table1.values, named_range='values')
16 # create a new excel report
17 workbook.save('report1.xlsx')
```

This will keep our template unmodified and generate a report with one excel sheet `count_by_year`

containing the values from the pivot table.

The last part of this part is to show a work around when we need to generate a report with multiple sheets based on the same template. Things get a bit complicated, because no python module allow to copy/clone a sheet and update it, I managed to find a way to do this by reloading the workbook each time we copy the template sheet, but I hope that this option will be supported by the openpyxl module in the future, for further reference there’s a ticket for this.

```
1 import openpyxl
2 from openpyxl import load_workbook
3 from copy import copy
4
5 # load the excel template
6 workbook = load_workbook('template.xlsx')
7 # create a new excel report
8 workbook.save('report2.xlsx')
9
10
11 # number of funding rounds by year
12 table1 = pd.pivot_table(df1, index=['funding_round_type'],
13 columns=['funded_year'],
14 values=['raised_amount_usd'],
15 aggfunc=[lambda x: len(x)], fill_value=0)
16
17 # sum of funding rounds by year
18 table2 = pd.pivot_table(df1, index=['funding_round_type'],
19 columns=['funded_year'],
20 values=['raised_amount_usd'],
21 aggfunc=[np.sum], fill_value=0)
22
23 def create_sheet(wb_name, sheet_name, values):
24 workbook = load_workbook(wb_name)
25 worksheet = workbook.active
26 # clone the template worksheet
27 update_range(workbook.active, values, named_range='values')
28 # add worksheet to the workbook
29 worksheet.title = sheet_name
30 workbook._add_sheet(copy.copy(worksheet))
31 # since we can't use deepcopy, we need to save the workbook and reload it
32 # otherwise we will be updating the same references of all copied sheets
33 workbook.save(wb_name)
34
35 create_sheet('report2.xlsx', 'count_by_year', table1.values)
36 create_sheet('report2.xlsx', 'sum_by_year', table2.values)
37 # we need to remove the current template sheet
38 workbook = load_workbook('report2.xlsx')
39 workbook.remove_sheet(workbook.active)
40 workbook.save('report2.xlsx')
```

The output result