from data import insight :
    About     Archive     Feed

Pre-commit quickstart

So, I finally added pre-commit to my little library repo – it’s so much easier than I thought it would be.

Why use pre-commit?

Python has a ton of tools for improving code readability and quality, including black, isort, flake8, mypy and so on. Pre-commit is a tool that aggregates all tools you want to run before committing a file into one command.

This talk from PyOhio explains it well, and adds adding pre-commit as your linter in your tox environment as well (but this article will just cover setting up pre-commit).

Initial steps

  1. Install it

    pip install pre-commit

  2. Create a default config file. From the top directory of your project, run:

    pre-commit sample-config > .pre-commit-config.yaml

  3. Try it out

    pre-commit run --all-files

Congratulations! You’ve just used pre-commit to remove trailing whitespace from all you files and a couple other checks. (You can delete any of the “standard” rules you don’t like from the initial .yaml file.)

Install additional tools

  1. Go to the pre-commit hooks list and enter the hook you want in the search box. Let’s choose isort.

  2. Copy the repo link from the pre-commit website. Sometimes, as for isort, it is the tool’s official repo. Other times, it’s a special pre-commit mirror repo.

  3. Paste the link from the pre-commit hooks website into your command line to get the entry for your pre-commit-config.yaml:

    pre-commit try-repo https://github.com/PyCQA/isort

  4. The top part of the output will be the entry for the yaml file:

    [INFO] Initializing environment for https://github.com/PyCQA/isort.
    ===============================================================================
    Using config:
    ===============================================================================
    
    repos:
    -   repo: https://github.com/PyCQA/isort
        rev: e5bfb28b079d942b8d5b0ce5aa7a231a0292d14a
        hooks:
        -   id: isort
    
    ===============================================================================
    
  5. This is my own little trick: copy the ouput for the tool into your yaml file, but replace the git-hash with a fake version number (“v0”). Your file will look something like this:

    repos:
    -   repo: https://github.com/pre-commit/pre-commit-hooks
        rev: v4.0.1
        hooks:
        -   id: trailing-whitespace
        -   id: end-of-file-fixer
        -   id: check-yaml
        -   id: check-added-large-files
    -   repo: https://github.com/PyCQA/isort
        rev: v0
        hooks:
        -   id: isort
    
  6. Run pre-commit autoupdate and it will replace the “v0” with the human-readable current version of the tool. (Or you could pin a preferred version of your choice here as well.)

  7. You can add additional arguments or install-dependencies to each tool in this yaml file as well. I had my arguments already set up in setup.cfg and pyproject.toml and I left them there for now. Mypy was a smidge tricky - you do have to add the install dependencies for any stub files you need.

    -   repo: https://github.com/pre-commit/mirrors-mypy
        rev: v0.910
        hooks:
        -   id: mypy
            additional_dependencies: ['types-python-dateutil', 'types-requests']
    
  8. Add one tool at a time, running pre-commit run --all-files each time to verify that the tool is configured correctly. If a tool is slow, you can just run a single tool, such as with pre-commit run isort --all-files

Configure your project and development environment

  1. Delete the installation of all the file-checking tools from your setup or requirements files in your project. Replace all the separate tool installs with installing pre-commit, let pre-commit take it from there. Test everything one last time. I had a Makefile to run my tools - I changed that to a single command to call pre-commit as well.

  2. When everything is happy, run pre-commit install to make it a git hook that runs every time you commit a file in this repo. (Don’t forget to check in .pre-commit-config.yaml as well.)

  3. If you run multiple coordinated builds in tox, see the video above for instructions for that.

Cleaning Data for Plotly Hover tool

I’m learning plotly and trying to decide if the next version of my data processing website should use it instead of bokeh

One bug I’ve noticed is that when putting fields in a hover tool, if the data can contain nulls, nans, or other non-float values like np.inf mixed in with numeric data, those rows are not displayed correctly. The %{customdata[#]} slot in your format string is not replaced at all. Very ugly.

This little function works around the bug, at the expense of not being able to use a format string in the hovertext template as the data is now a mix of floats and strings. In order to work around that, I do the decimal-rounding in my data cleaner as well.

from typing import Optional
import pandas as pd

def clean_data_for_hover(data: pd.Series, places: Optional[int] =None) -> pd.Series:
    """
    Takes floating data that may have nulls, nans, or infs, and reformats
    to display in a plotly hover window.

    Paramteters
    -----------
    data: pd.Series of float-like (np.float32, np.float64, etc)
        The data to be displayed on hover
    places: int or None 
        The number of decimal places to round to, or None for
        no rounding

    Returns
    -------
    pd.Series
        Transformed so that all null-like and inf-like values are
        replaced with strings, and all floats are rounded to given number
        of places.
    """
    if places:
        result = data.round(places)
    else:
        result = data.copy()
        
    # None, pd.NA, np.nan...
    nulls = result.isna()
    result[nulls] = result[nulls].apply(str)
    
    # np.inf, -np.inf
    infs = (result*0).isna()
    result[infs] = result[infs].apply(str)
    
    return result

And here’s a little notebook cell to try it:

import pandas as pd
import numpy as np
from plotly import express as px

# Setting up the data to plot
x = pd.Series(name="x", data=[1,2,3,4,5,])
y = pd.Series(name="y", data=[1,2,3,4,5,])
bad = pd.Series(name="bad", data=[1.234, None, np.nan, np.inf, -np.inf])
fixed = pd.Series(name="fixed", data=clean_data_for_hover(bad, 2).values)
df = pd.concat([x,y,bad,fixed], axis=1)

# Plotting with plotly express api
fig = px.scatter(
     data_frame=df,
     x="x",
     y="y",
     hover_data=["bad", "fixed"]
)

# for notebook display only
from IPython.display import display,HTML
display(HTML(fig.to_html()))

You should get a plot that looks like this: