Update sales validation/flagging algorithm

Previously, data used a set of simple heuristics to flag outlier sales for hand validation. These included:

  • Anything more than 4 SD away from the mean log price, grouped by class and township
  • The top and bottom 2.5% of each neighborhood
  • Extreme top and bottom outliers of the full sales distribution ($15M homes, $40K homes)

This year, we should refine and expand this set of heuristics to flag additional sales. In particularly, we should more generously flag the right tail of the distribution. The north tri has more extremely high-value property than elsewhere in the county, so it will be critical that we get a good representative, validated sample of these sales.