Heat-scatter plots
I will introduce a type of plot that I don’t recall seeing before, but I think would be useful. It is a scatter plot in which the points are colored by the number of other points they overlap. This simple coloring scheme turns out to be surprisingly versatile.
With very dense plots, it serves a similar function as a heatmap-colored density plot, but unlike continuous or binned density plots, it still shows the actual data points:
Opacity is often used to achieve this effect, but then you can’t use colors to help distinguish density levels, and even after adjusting opacity levels you can end up with oversaturation and/or points that are difficult to see and may not print well:
The heat-scatter method can also be useful with sparse plots to highlight overlaps that would otherwise be difficult to notice:
For plots with intermediate sparseness, it helps you identify clusters:
Considerations:
This prevents the use of color to represent another property of the data.
It can be difficult to implement this in a way that is compatible with plotting libraries while still automatically determining number of overlaps. For example, the number of overlaps is dependent on the size of the plot window, which might not be specified at the time the plot object is being created.
You can try the R function I used to count overlaps:
library(spatstat.geom)
#' Get number of points overlapping each point. Width and height define the plot window that
#' will contain all points, and must be in the same units as radius. x and y coordinates will
#' be tranformed into these units to determine overlaps.
<- function(x, y, radius, width, height) {
n_overlap <- function(x, y, radius) {
n_overlap_get_counts # x, y, and radius must all be in the same visually scaled units
<- ppp(x, y, window = owin(range(x), range(y)))
pp <- closepairs(pp, radius * 2)
close_pairs <- table(close_pairs$i)
counts <- integer(npoints(pp))
all_counts as.integer(names(counts))] <- as.integer(counts)
all_counts[
all_counts
}<- width * (x - range(x)[1]) / (range(x)[2] - range(x)[1])
xviz <- height * (y - range(y)[1]) / (range(y)[2] - range(y)[1])
yviz n_overlap_get_counts(xviz, yviz, radius)
}
This uses the spatstat
library to efficiently identify points that are close enough to overlap. Otherwise, comparing all pairs of points would have O(n2) time complexity and be slow when there are many points. You may have to guess the width
and height
of the plot window and adjust until it looks right.
Acknowledgements
I used R with the tidyverse and spatstat libraries for data generation and visualizations. My code is here.