Humidity Readings - A Rabbit Hole
I've been getting sensors like this that can measure temperature and relative humidity. Coupled with microcontroller boards like the Raspberry Pi Pico W running ESPHome, I'm able to create small USB-powered wireless thermometer/hygrometer units that I put in various rooms/locations.
With the combination of Home Assistant / InfluxDB / Grafana, I can record and visualize historical data:
If you like #datahoarding, this is great fun.. right?
Side note: I'm focusing on humidity readings for this post because calibrating humidity has been a rabbit hole while calibrating the temperature has been more straightforward. That said, the tool I provide will calibrate temperature as well.
Descent into Madness
There's an adage called Segal's law that states:
A man with a watch knows what time it is. A man with two watches is never sure.
Well, I have way more than one sensor and they certainly don't agree with each other. Here are several of them installed very closely together on a single breadboard to ensure they are measuring the same environment.
Here's their data on a timeline:
They've formed two groups of measurements, but which group is correct?
... Is either group correct?
Not a great situation. There's no point in hoarding data that isn't even accurate. So what can be done? Follow me on my journey!
Step 1: How do others calibrate?
When searching about humidity calibration methodologies, I encountered an unexpected ally: cigar enthusiasts! Cigars must be stored in a particular humidity range (65-75%), so they have an interest in having a properly calibrated humidity sensor.
According to discussion boards and in videos, the technique is this:
Use table salt (NaCl) and make it damp - like wet snow.
Seal it in a container along with your humidity sensor.
Wait several hours for the salt to regulate the humidity in the box to about 75%.
Observe the humidity reported by the sensor.
Tweak the sensor up or down by the appropriate amount to make it say 75%.
Done!
I couldn't use a lid with my Tupperware container (like in the video) because I need to run a power cord, so this is my setup:
<Tangent>
The photo above shows that I sealed the sensors and salt in a Ziploc bag. I want to take a quick aside to mention that it took me weeks of container iterations to settle on that method of having a well-sealed container. The bag's only hole is a corner I cut to run a power cord through, but that corner was subsequently taped back shut.
I also tried using a Qi wireless receiver to avoid even needing to cut a hole for the power cord at all, but the heat generated by wireless charging proved to be too much.
All of my prior container "designs" had (bad enough) leaks that led to inconsistent readings, such as with my earliest approach of holding a plastic wrap down with a rubber band:
Don't do that. Try using a Ziploc bag instead.
Anyway, back to talking about the salt calibration method!
</Tangent>
This method works fine for cigars because table salt happens to regulate a humidity (75%) that fits the ideal range for cigars (65-75%). So, this single-point calibration is good enough.
What's the issue?
I am looking for accurate measurements across the entire spectrum that I'd encounter at home. The single-point calibration method has a major weakness: you don't know if the sensor's inaccuracy is uniform across the spectrum.
In other words, if at 75% the sensor reads 70%, then that's a simple +5 to calibrate it. But would that sensor report 30% when the reality is 35%? Spoiler: no.
Step 2: Can I calibrate with more than one point?
Different salts maintain different humidity levels. Let's add a second point of reference. I chose to use Magnesium Chloride (MgCl2) because at 33% it allows covering a reasonable spectrum alongside NaCl's 75%. Compared to other low-humidity salt options, MgCl2 is also easier to get. I could find it on Amazon while others were special enough to only be sold by dedicated science supply stores.
Using more than two points of reference would be nice. But for practical purposes, I am going to limit to only using NaCl and MgCl2... at least for now.
So how does the data look?
We can see that all of the sensors tend to read higher on the low end and lower on the high end. The "F" and "G" sensors happen to be too high (almost 40% instead of 33%) while being spot on at 75%, while the other sensors are spot on at 33% but deviate to about 71% when they should report 75%.
ESPHome provides a few calibration functions, and a basic one we can use here is calibrate_linear
. So I can just add something like this to the .yaml config:
filters:
- calibrate_linear:
- 39.386 -> 33.613
- 75.231 -> 75.500
Easy, right? But wait, that wasn't so bad... was the rabbit hole just learning how to seal the container? Unfortunately not. 🙃
Relative Humidity
The humidity percentages I've been talking about are a "relative humidity" value. They are relative to the current temperature. So if anything, I've only defined the calibration for humidity specifically at my room temperature (77.5°F / 25.278°C). If I were to seriously calibrate the humidity readings, I would have to use a formula that considers both temperature and humidity.
Step 3: Gather humidity values at multiple temperatures
Fortunately, calibrating using salts still works here. As I linked earlier, there exist tables that define the temperature-to-humidity relationship for each salt.
I went with this table because it gives more precision (to the hundredths place).
Celsius | NaCl | MgCl2 |
10° | 75.67% | 33.47% |
15° | 75.61% | 33.30% |
20° | 75.47% | 33.07% |
25° | 75.29% | 32.78% |
30° | 75.09% | 32.44% |
35° | 74.87% | 32.05% |
While the humidity level changes with temperature, it changes by very little.
As for what kind of temperatures I could realistically produce with any sense of long-term stability without laboratory equipment... I opted for room temperature and the refrigerator, which are 25 and 4°C, respectively. So how do the sensors vary depending on temperature?
Most of their slopes don't match the Expected slope, so we need to apply a temperature-dependent adjustment.
Step 4: Interpolations
We need to make a few interpolations to be able to identify the adjustments needed (for each sensor) to derive the correct humidity at any given temperature. To do so, we consider:
For any given temperature, what are the expected humidities for NaCl and MgCl2?
For any given temperature, what are the incorrect humidities reported by the sensor for NaCl and MgCl2?
With those two figured out, what is the necessary correction slope to address humidities that aren't 33 or 75%?
1. Reference Humidities at any Arbitrary Temperature
The original humidity table I used only provides humidities at 5° intervals. I let Google Sheets derive polynomial trendlines to match this data.
For a given temperature x, the humidity with NaCl would be calculated with this:
$$33.697 - (7.98 \times 10^{-3})x - (1.09 \times 10^{-3})x^2 - (9.71 \times 10^{-9})x^3$$
For MgCl2, it'd be this:
$$75.51 + 0.0396x - (2.65 \times 10^{-3})x^2 + (2.84 \times 10^{-5})x^3$$
2. Reported Humidities at any Arbitrary Temperature
As mentioned before, I will only be trying two temperature points: 4 and 25°C. We will just have a linear relationship between the two measurements.
Unfortunately, it'd be awkward to try to use ESPHome's calibrated_linear
because it isn't available as a function call. Instead, we can use a segmented_linear
implementation I had written in the past. You can read about it here!
Yes, it was intended for inputting three or more points, but it will behave identically to calibrated_linear
when given two points and is available as a simple function call.
3. Corrected Humidity outside of 33 or 75%.
Let this just be a linear relationship based on the offsets that need to be done to each sensor at 33 and 75% humidity. We will use segmented_linear
here as well.
Step 5: Implementation
The calibration options built into ESPHome are inadequate for this task, but they allow calling C++ code in a lambda. So let's make a custom function! What are our inputs?
Temperature.
- Correct temperature, post-calibration.
Humidity.
- Original reading, pre-calibration.
Function to calculate the expected humidity for a given temperature.
Need at least two of these at different humidity levels to have an adjustment slope/curve on the spectrum.
For my needs, I will just implement support for two points.
Function to calculate the reported humidity for a given temperature.
- Needs to be a matching set to coincide with the same respective expected humidities.
So we need two floats and four lambdas.
float calibrated_humidity(
float temp,
float hum,
const std::function<float(float)> &expected1,
const std::function<float(float)> &expected2,
const std::function<float(float)> &measured1,
const std::function<float(float)> &measured2) {
// ...
}
We'll also need helper functions to derive the linear fit line:
float correlation_coefficient(float x1, float x2, float y1, float y2) {
float avg_x = (x1 + x2) / 2;
float avg_y = (y1 + y2) / 2;
float numerator = (x1 - avg_x) * (y1 - avg_y) + (x2 - avg_x) * (y2 - avg_y);
float denominator = sqrt(
pow(x1 - avg_x, 2) +
pow(x2 - avg_x, 2)
) *
sqrt(
pow(y1 - avg_y, 2) +
pow(y2 - avg_y, 2)
);
return numerator / denominator;
}
std::pair<float, float> linear_fit(float x1, float y1, float x2, float y2) {
float correlation_coefficient_value = correlation_coefficient(x1, y1, x2, y2);
float slope = correlation_coefficient_value * (y2 - y1) / (x2 - x1);
float intercept = y1 - slope * x1;
return std::make_pair(slope, intercept);
}
So all together, our calibration call would be...
float calibrated_humidity(
float temp,
float hum,
const std::function<float(float)> &expected1,
const std::function<float(float)> &expected2,
const std::function<float(float)> &measured1,
const std::function<float(float)> &measured2) {
std::pair<float, float> pair = linear_fit(
measured1(temp),
measured2(temp),
expected1(temp),
expected2(temp)
);
float slope = pair.first;
float intercept = pair.second;
return (slope * hum) + intercept;
}
Let's put all of that into a calibration.h
file. We can then use it in an ESPHome YAML configuration like so:
esphome:
name: my_sensor
includes:
- calibration.h
# ...
- &hum_calibrate
lambda: |-
static auto expected1 = [](float x) -> float {
return 33.67 - ((7.98 * pow(10, -3)) * x) - ((1.09 * pow(10, -3)) *
pow(x, 2)) - ((9.71 * pow(10, -9)) * pow(x, 3));
};
static auto expected2 = [](float x) -> float {
return 75.51 + (0.0396 * x) - ((2.65 * pow(10, -3)) * pow(x, 2)) +
((2.84 * pow(10, -5)) * pow(x, 3));
};
static auto measured1 = [](float x) -> float {
return -0.22*x + 42.4;
};
static auto measured2 = [](float x) -> float {
return -0.453*x + 71.4;
};
return calibrated_humidity(
id(my_calibrated_temperature).state,
x, expected1, expected2, measured1, measured2
);
# ...
Step 6: Calibrating other sensors
I'll admit, the salt test calibration process is tedious and annoying and I don't want to do it again. But I do want to calibrate my other sensors. I can take the salt-calibrated sensors as my source of truth and place the additional sensors near them so they measure the same environment. From there, it'd be a matter of calibrating the other sensors according to what the "true" sensors are reading.
But how can I confidently do this? Going back to how I use both the temperature and humidity readings to properly calibrate the humidity, I'd need at least four data points:
Low humidity @ Low temperature
High humidity @ Low temperature
Low humidity @ High temperature
High humidity @ High temperature
The problem is I don't have control over the humidity anymore due to not wanting to redo the salt test. It's a matter of luck to see what the weather gives me, and enough time for the weather to vary enough. Ultimately, I'm not in a hurry and the setup would be set-and-forget, so I can allow for weeks of data to be gathered.
Procedure
Let's gather lots of data, and I will use my garage environment since it would have the widest swings (of at least the temperature). The data we care about are the calibrated temperatures and humidities from the calibrated sensors and raw readings from the uncalibrated sensors. The data will have a wave pattern on a daily cadence. Here are the calibrated sensors:
You can see they pretty much agree with each other. The humidity graph has a wider delta between lines due to the sensors having a rated humidity margin of error of ±1.0-1.8% depending on model.
Now let's add a couple of uncalibrated humidity readings that we want to calibrate:
The two new lines don't agree with the calibrated lines. We have our work set out for us.
We will want to:
Choose two humidity points (from the calibrated sensors only), a low and a high.
You might be tempted to find the widest difference between low and high, but by its nature, it is an outlier. An outlier will mean we only find one temperature for that humidity at the one time it occurs.
- We are looking for examples of low and high temperatures existing with the same humidity level, so it is a necessity to choose humidity levels that occur more than once.
I think a reasonable method is to calculate the standard deviation and use the -1 and +1 stddev humidity values.
Find all timestamps that have the chosen humidity points.
Get the lowest and highest observed temperatures among those timestamps, for each humidity point.
Once we have the timestamps for those temperatures, we need to get the respective humidity values at those timestamps from the sensors we want to calibrate.
Now we know both the expected and observed humidity levels for two different temperature levels at two different expected humidities. This is enough data to feed into my
calibrated_humidity()
function mentioned above.- One difference is the
expected
lambdas were created to support the salt test, where the expected humidity changes depending on the temperature. But in this approach, we are matching based on the humidity. Naturally, the humidity level becomes a constant. Therefore, the twoexpected
lambdas can just return a static value (either the low or high humidity, respectively).
- One difference is the
Enter those values into the YAML like so:
lambda: |-
static auto expected1 = [](float x) -> float {
return 38.002;
};
static auto expected2 = [](float x) -> float {
return 48.856;
};
static auto measured1 = [](float x) -> float {
static std::vector<std::vector<float>> mapping = {
// {Temperature, Humidity}
{27.068, 40.548},
{34.536, 40.816},
};
return segmented_linear(mapping, x);
};
static auto measured2 = [](float x) -> float {
static std::vector<std::vector<float>> mapping = {
// {Temperature, Humidity}
{20.859, 49.383},
{30.523, 48.517},
};
return segmented_linear(mapping, x);
};
return calibrated_humidity(
id(temperature).state,
x, expected1, expected2, measured1, measured2
);
Now the newly-calibrated sensors fit in with the rest!
Step 7: But that's so much data to process...
I agree! No one should be doing that by hand, especially if you intend to calibrate multiple sensors. That's why I wrote a tool to do it for me (and you!).
I've written two ways of using the script:
Let it query your InfluxDB directly (
from_influx.py
).Give it some CSV files with the same data we need (
from_csv.py
).
From CSV
The CSV way is the easier place for us to start understanding the tool.
There are four collections of data we need to pass in:
Reference Temperatures
Reference Humidities
Uncalibrated Temperatures
Uncalibrated Humidities
The command would look like this:
python from_csv.py
--reference_temperature_csv ref_temp.csv
--reference_humidity_csv ref_hum.csv
--uncalibrated_temperature_csv uncal_temp.csv
--uncalibrated_humidity_csv uncal_hum.csv
All of the CSV files must be in the format SENSOR_NAME,TIMESTAMP,VALUE
. The value unit you use here doesn't matter as long as you understand that the output will be in the same units (which might not be the unit you need in your ESPHome config). The timestamp must be parsable by:
datetime.strptime(input, '%Y-%m-%dT%H:%M:%SZ')
To be explicit, the tool is expecting lots of values. In my case, I'm using a 30-second interval. For a hypothetical week of data gathered, it would amount to 20,160 data points per file, per sensor. The script will interpolate between the gaps, but it's always better to have smaller gaps.
From InfluxDB
The tool can query the database directly to grab the same data as we'd have passed into the CSV. But naturally, in this case, you'd have to specify the entity IDs of the sensors as well as the date range you want to sample.
python from_influx.py
--start_time "2023-08-31 07:00:00" --end_time "2023-09-03 07:00:00"
--reference_temperature_sensors sensor_a_temp,sensor_b_temp
--reference_humidity_sensors sensor_a_hum,sensor_b_hum
--uncalibrated_temperature_sensors sensor_c_temp,sensor_d_temp
--uncalibrated_humidity_sensors sensor_c_hum,sensor_d_hum
The start and end times expect UTC.
--output_to_csv
exists as well to simply output the CSV files that you could pass into from_csv.py
.
You also need to create an influx_config.py
file to contain the details needed for the tool to connect to your database instance.
INFLUX_ORG='your_org'
INFLUX_BUCKET='your_bucket' # probably "homeassistant"
INFLUX_TOKEN='your_token'
INFLUX_URL='http://your_db_ip:8086'
Example output
Here's what you can expect to see for a given uncalibrated sensor. You can then copy and paste the calibrate_linear
and lambda
sections into the appropriate spots in your sensor's ESPHome YAML configuration.
Sensor: sensor_c_temp
========== Temperature Calibration ==========
calibrate_linear:
method: exact
datapoints:
- 21.985 -> 20.100
- 23.958 -> 22.200
- 25.942 -> 24.300
- 27.808 -> 26.400
- 29.633 -> 28.500
- 31.490 -> 30.600
- 33.600 -> 32.700
- 35.722 -> 34.800
=========== Humidity Calibration ============
lambda: |-
static auto expected1 = [](float x) -> float {
return 38.065;
};
static auto expected2 = [](float x) -> float {
return 48.819;
};
static auto measured1 = [](float x) -> float {
static std::vector<std::vector<float>> mapping = {
{23.116, 36.839}, {31.777, 38.114}
};
return segmented_linear(mapping, x);
};
static auto measured2 = [](float x) -> float {
static std::vector<std::vector<float>> mapping = {
{20.678, 44.844}, {28.367, 46.836}
};
return segmented_linear(mapping, x);
};
return calibrated_humidity(
id(temperature).state,
x, expected1, expected2, measured1, measured2
);
Assumptions made
To avoid complicating the logic, I've made some assumptions about what's being provided to the tool.
The set of sensors given for
uncalibrated_temperature_sensors
anduncalibrated_humidity_sensors
are identical. Same length and contains the same sensors.This means that your sensors are expected to be temperature+humidity combo sensors. This is true for me so I did not dive in further. If you are trying to calibrate sensors that are only one of these types... try to fake some dummy data. I would duplicate the data from another sensor and just change the name in the CSV, then use the
from_csv.py
route.- Make sure the dummy name matches the format of the sensor you want to calibrate, see the next assumption below.
Note that this restriction does not apply to the reference sensors. The tool will just average them together before using the data.
Your sensor names are consistent between temperature and humidity. To match up the temperature and humidity entities for each sensor, I am assuming that the list of temperature entities and list of humidity entities would alpha-sort into the same order.
- The respective sensors at each index of both lists should end up referring to the same actual sensor.