"Segmented Linear" Calibration for ESPHome

UPDATE (2023-08-19)

This article is no longer necessary as of Mat931's PR!

Now you can have the behavior of what I called "segmented linear" by specifying the "exact" method:

filters:
  - calibrate_linear:
      method: exact
      datapoints:
          - 10 -> 12
          - 55 -> 50
          - 100 -> 105

Horray!

The original article below is kept for posterity.

ORIGINAL

ESPHome provides a few functions to help calibrate the measurements being reported by the sensors you set up. But I am finding them to be inadequate.

Inadequate how?

The two most relevant functions are calibrate_linear and calibrate_polynomial.

calibrate_linear is pretty straightforward. You give it at least two pairs of numbers and it generates a best-fit straight line for what adjustment it should apply. Let's use an example:

If a watt meter is reporting 10W when it should be 12W, and 100W when it's supposed to be 105W, then you can use this syntax to adjust for the difference:

    filters:
      - calibrate_linear:
          - 10 -> 12
          - 100 -> 105

But what if the sensor's inaccuracy isn't a linear relationship? In the above example, the sensor is consistently measuring a bit too low. But what if it reported 55W when it should be 50W? That relationship is no longer linear, and the output of this calibration will be incorrect at all known points. There was a bug filed for this but was closed... but not thanks to a fix. They simply closed the bug after updating the documentation to say what's wrong.

calibrate_polynomial isn't necessarily suitable either. There is no guarantee that the polynomial equation would be a perfect fit for the known data points. It would be more correct than a linear calibration, but I think that the least a calibration can do is to exactly match the known points.

As stated in the bug, a "piecewise linear fit" would be preferred, and I agree. At least, I agree with my interpretation of it: do calibrate_linear's behavior in between each pair of data points.

Let's implement that!

ESPHome uses a Python-to-C++ mix of code that I don't have the time to get into right now. In the interest of expedience for my own needs, I am just writing a pure C++ function. I also foresee a point of ambiguity that will create some bikeshedding when I do eventually create a PR for ESPHome. The ambiguity is this:

How to handle values outside your known measurements?

Say I have data points at 12, 50, and 105 watts. What should be done for values outside of that range? Like say, 5 or 500 watts.

I think it's fine to take the nearest pair and extend out its linear slope to handle this. But if you have several points, there can appear to be a non-linear relationship. To better match that, it can be valid to say that you want to use the nearest n pairs and get an average slope from that (much like how calibrate_linear works right now), or even switch to calibrate_polynomial.

Let's just implement the simplest one: extending the linear slope of the nearest pair.

float calibrate_segmented_linear(std::vector<std::vector<float>> mapping, float x) {
    float res = x;
    if (x < mapping[0][0]) {
        // Less than mapping
        // Use the left-most pair.
        float before_a = mapping[0][0];
        float after_a = mapping[0][1];
        float before_b = mapping[1][0];
        float after_b = mapping[1][1];
        float before_diff = before_b - before_a;
        float after_diff = after_b - after_a;
        float diff = before_a - x;
        float ratio = diff / before_diff;
        res = after_a - (ratio * after_diff);
    } else if (x > mapping[mapping.size() - 1][0]) {
        // More than mapping
        // Use the right-most pair.
        int i = mapping.size() - 1;
        float before_a = mapping[i-1][0];
        float after_a = mapping[i-1][1];
        float before_b = mapping[i][0];
        float after_b = mapping[i][1];
        float before_diff = before_b - before_a;
        float after_diff = after_b - after_a;
        float diff = x - before_b;
        float ratio = diff / before_diff;
        res = after_b + (ratio * after_diff);
    } else {
        // Within mapping
        // Find and use the pair that x sits between.
        for (int i = 1; i < mapping.size(); i++) {
            float before_a = mapping[i-1][0];
            float after_a = mapping[i-1][1];
            float before_b = mapping[i][0];
            float after_b = mapping[i][1];
            if (x <= before_b) {
                float before_diff = before_b - before_a;
                float after_diff = after_b - after_a;
                float diff = x - before_a;
                float ratio = diff / before_diff;
                res = after_a + (ratio * after_diff);
                break;
            }
        }
    }
    return res;
}

That's it! Just create a header file like calibration.h in ESPHome and import it like this:

esphome:
  name: my_device
  includes:
    - calibration.h

Which can be used in a filter like so:

sensor:
  - platform: a_supported_watt_meter
    update_interval: 1s
    power:
      name: "My Power"
      accuracy_decimals: 1
      filters:
        - lambda: |-
            static std::vector<std::vector<float>> mapping = {
                {10.0, 12.0},
                {55.0, 50.0},
                {100.0, 105.0},
            };
            return calibrate_segmented_linear(mapping, x);

This is not the cleanest code, but it is functional. Would be nice to still be able to use the - 10.0 -> 12.0 syntax, but it gets the job done.

Optimizations can be made such as doing a binary search when x is within the mapping values, but the longest mapping list I have is only 11 items. I don't think binary search is worthwhile when n is so small.

A real-world use-case

This calculation method happens to be the way that the EPA defines their AQI. AQI is calculated based on the measured µg/m³ weight concentration and we can use calibrate_segmented_linear to calculate PM2.5 AQI like so:

lambda: |-
    static std::vector<std::vector<float>> mapping = {
        {0, 0},
        {12.1, 50},
        {35.5, 100},
        {55.5, 150},
        {150.5, 200},
        {250.5, 300},
        {350.5, 400},
        {500.5, 500},
    };
    return calibrate_segmented_linear(mapping, id(pm_2_5).state);

Now that's a whole lot cleaner than this manual if-else logic!