Heat-scatter plots

Author

Daniel Munro

Published

August 3, 2024

< Home (danmun.ro)

I will introduce a type of plot that I don’t recall seeing before, but I think would be useful. It is a scatter plot in which the points are colored by the number of other points they overlap. This simple coloring scheme turns out to be surprisingly versatile.

With very dense plots, it serves a similar function as a heatmap-colored density plot, but unlike continuous or binned density plots, it still shows the actual data points:

Opacity is often used to achieve this effect, but then you can’t use colors to help distinguish density levels, and even after adjusting opacity levels you can end up with oversaturation and/or points that are difficult to see and may not print well:

The heat-scatter method can also be useful with sparse plots to highlight overlaps that would otherwise be difficult to notice:

For plots with intermediate sparseness, it helps you identify clusters:

Considerations:

You can try the R function I used to count overlaps:

library(spatstat.geom)

#' Get number of points overlapping each point. Width and height define the plot window that
#' will contain all points, and must be in the same units as radius. x and y coordinates will
#' be tranformed into these units to determine overlaps.
n_overlap <- function(x, y, radius, width, height) {
    n_overlap_get_counts <- function(x, y, radius) {
        # x, y, and radius must all be in the same visually scaled units
        pp <- ppp(x, y, window = owin(range(x), range(y)))
        close_pairs <- closepairs(pp, radius * 2)
        counts <- table(close_pairs$i)
        all_counts <- integer(npoints(pp))
        all_counts[as.integer(names(counts))] <- as.integer(counts)
        all_counts
    }
    xviz <- width * (x - range(x)[1]) / (range(x)[2] - range(x)[1])
    yviz <- height * (y - range(y)[1]) / (range(y)[2] - range(y)[1])
    n_overlap_get_counts(xviz, yviz, radius)
}

This uses the spatstat library to efficiently identify points that are close enough to overlap. Otherwise, comparing all pairs of points would have O(n2) time complexity and be slow when there are many points. You may have to guess the width and height of the plot window and adjust until it looks right.

Acknowledgements

I used R with the tidyverse and spatstat libraries for data generation and visualizations. My code is here.

< Home (danmun.ro)