This is how I organize my personal library with Notion and Python

Written by Helguera on 14 Oct 2022
Link

Summary

Introduction
My Script
Complete Script
Conclusion

Introduction

I have always wanted to have a big collection of books, most of them related to programming, in a big library where I can consult them whenever I need them. But the truth is that this dream is going to have to wait a bit because right now, with my semi-nomadic lifestyle, it would be impossible for me to move them every time I move from place to place.

So a while ago, I made the decision to always buy or download all books in digital format and read them on the iPad or iPhone. At first, when I had few, I could get a mental idea of the books I had so I could quickly refer to them. But when the library started to grow, I started to lose track of exactly which books I had and in the end I never resorted to them.

For some time now I have been using Notion to organize all my life, both personal and professional, so I decided to use it also to organize my books and to use it as a search engine. But of course, I wasn't going to insert each book manually, I'm not in software development for that, so I created a small script in Python that does it for me.

My Script

The script steps are as follows:

Get the path of all books in the specified folder and subfolders. I always try to acquire the books in .epub format, so this script will only work with that format.
Extract the information of the books such as title, author, ISBN, publisher...
Generate a CSV file with the results.
Upload the file to a database in Notion, respecting the data it previously contained.

Needed libraries

epub_meta: to extract information from .epub files
openpyxl: to generate the .xlsx file
os: to execute the command that will upload the data to Notion
pandas: to convert the .xlsx file to .csv so that Csv2Notion can process it.

import epub_meta
import openpyxl
import os
import pandas as pd

Initial config

First of all, the following lines of code are needed to initialize openpyxl, set the name of the spreadsheet and the path from which to search for the .epub files. In this case, the path is the same from which the script is run.

wb = openpyxl.Workbook() 
ws1 = wb.active
ws1.title = "epubs"
path = './'

Get the books

The following for loop is responsible for searching for all .epub files in the specified directory and all its subdirectories. The results will be stored in the array "files_path":

files_path = []
for r, d, f in os.walk(path):
    for file in f:
        if '.epub' in file:
            files_path.append(os.path.join(r, file))
files_path = filter(lambda x: '._' not in x, files_path)

for f in files_path:
    print(f)

The line that makes use of the lambda function is only necessary if you use MacOS, as is my case, to ignore duplicate hidden files starting with "._".

Extract information

Next, it is necessary to create the headers for the columns of the .xlsx file. In my case I have decided that the following are sufficient as the information I want to store for each book.

Although the columns "Tags", "Priority", "Status" and "Comments" are not information that I am going to extract from the .epub file, they are columns that I want to appear in the database in Notion, therefore, I have to add them even though they will never have a value.

ws1.cell(row=1, column=1).value = 'Identifier'
ws1.cell(row=1, column=2).value = 'Title'
ws1.cell(row=1, column=3).value = 'Priority'
ws1.cell(row=1, column=4).value = 'Tags'
ws1.cell(row=1, column=5).value = 'Status'
ws1.cell(row=1, column=6).value = 'Author'
ws1.cell(row=1, column=7).value = 'Publisher'
ws1.cell(row=1, column=8).value = 'Comments'

Once you have the path to each book, simply iterate each of them, extract the information with the help of the epub_meta library and save the information in different rows, one per book, of the .xlsx file.

for index, my_file in enumerate(files_path, start=1):
    if '.epub' in my_file:
        # print(my_file)
        data = epub_meta.get_epub_metadata(my_file, read_cover_image=True, read_toc=True)
        if ' '.join(data.identifiers) == '':
            ws1.cell(row=index+1, column=1).value = data.title
        else:
            ws1.cell(row=index+1, column=1).value = ' '.join(data.identifiers)
        ws1.cell(row=index+1, column=2).value = data.title
        ws1.cell(row=index+1, column=3).value = ''
        ws1.cell(row=index+1, column=4).value = '' #' '.join(data.subject).replace('/',',')
        ws1.cell(row=index+1, column=5).value = 'Not started'
        ws1.cell(row=index+1, column=6).value = ' '.join(data.authors)
        if data.publisher:
            ws1.cell(row=index+1, column=7).value = data.publisher.replace(',','')
        ws1.cell(row=index+1, column=8).value = ''

Generate CSV

Finally, all that remains is to save the .xlsx file and convert it to .csv:

wb.save(filename='epubs.xlsx')
df = pd.read_excel('epubs.xlsx', sheet_name=None)
df['epubs'].to_csv('output.csv', index=False, encoding='utf-8')  

Upload data to Notion (Csv2Notion)

To upload the data to Notion I use a tool called Csv2Notion that I found on GitHub. Click here to access the repository.

There are several ways to install it, the easiest is via PIP:

$ pip install --user csv2notion

Anyway, I recommend you to consult the official repository for more information.

The command required is as follows:

csv2notion --token [your__token] --url [your_notion_db_url] --merge output.csv --merge-only-column "Identifier" --merge-only-column "Title" --merge-only-column "Author" --merge-only-column "Publisher" --verbose

To get your token, you need to open Notion in a web browser and analyze the cookies to get the value of the one called token_v2. I use Firefox with the Cookie-Editor extension.

Note: It is very important that you do not share this token with anyone!!!

To get the link to the database, right click on the view name and "Copy link to view".

The last step to be done is to determine which columns of the .csv file we want to be combined with the existing ones. In my case, "Identifier", "Title, "Author" and "Publisher". The only values that have been extracted from the .epub files.

The command can be executed manually from a console, but I consider that it is better to integrate it directly into the Python script. To do this, the command must be saved in a file in the same directory as the Python script with extension ".sh ":

#!/bin/bash

csv2notion --token [your__token] --url [your_notion_db_url] --merge output.csv --merge-only-column "Identifier" --merge-only-column "Title" --merge-only-column "Author" --merge-only-column "Publisher" --verbose

And add the following line of code at the end of the Python script to execute the new ".sh" file.

os.system('sh csv2notion.sh')

Complete Script

You can find the complete script on my GitHub by clicking here.

Conclusion

Every time we add one or more books to the folder and run the script, they will automatically appear in Notion. The most important thing is that we do not have to worry about the existing ones in Notion since they will not be affected and neither duplicated as long as the primary key of the database does not change. In my case it is the ISBN ("Identifier").

I hope you found this post useful.

Javier Helguera.