Skip to content

Centralized Monitoring of DHT22 Sensor Data – From Raspberry Pi to the Elastic Stack

Keeping an eye on the environment inside your server room – using temperature and humidity values from DHT22 sensors collected via Raspberry Pis. That’s exactly what I implemented in this project, including integration into an existing ELK platform.

What started as a simple Plotly-based visualization of sensor data quickly evolved into a more robust setup. Sensor integration became part of a full-fledged Elastic Stack architecture that I had previously set up for centralized logging of servers, firewalls, and network devices.

This post is not a detailed how-to, but a compact field report with hands-on insights:

  • Why we chose the Elastic Stack – despite alternatives like Grafana and InfluxDB
  • Challenges encountered during setup and migration
  • Lessons learned and best practices for similar projects

Why Elastic?

Choosing the Elastic Stack was no coincidence: I already had solid experience with Elasticsearch, Filebeat, Kibana, and related components – and had set up the existing environment myself.

Alongside traditional logs from servers, firewalls, and switches, I wanted to integrate external metrics like sensor data. Integrating DHT22 sensors was a natural next step within an already functional platform.

Especially in heterogeneous homelab environments with many small but relevant data sources, ELK offers key advantages:

  • Central management and configuration via Fleet
  • Unified logging of servers, firewalls, and network components
  • Flexible data modeling and structured search in Elasticsearch
  • Simple integration of custom data sources via Custom Pipelines
  • Powerful visualization and alerting with Kibana

From CSV to Usable JSON – Transforming Sensor Data

The temperature and humidity readings from the DHT22 sensors were stored locally on each Raspberry Pi in plain CSV files – separated by location (e.g., Location A and Location B). The file format was a classic tabular structure:

Timestamp,Temperature,Humidity
24-02-05 14:10,22.5C,51.2%
24-02-05 14:15,22.4C,51.3%
24-02-05 14:20,22.4C,51.5%

Why CSV Isn’t Enough

For direct ingestion into Elasticsearch, this format posed several problems:

  • Not machine-friendly timestamps: The format 24-02-05 13:50 can’t be reliably parsed, especially for time series data.

  • Non-standardized field names: Terms like Temperature and Humidity are readable but too vague for automated processing.

  • Embedded units: Values like 22.4C or 51.2% need to be cleaned before further use.

  • No JSON or NDJSON format: Filebeat and other Elastic agents prefer structured data formats.

Feeding raw CSVs into the Elastic Agent would likely cause parsing errors or result in unusable data. A preprocessing step was required.

The Solution: Convert CSV to JSON via Cronjob

To make the data usable, I wrote a lightweight Bash script that runs every 12 minutes via Cron. It reads the CSV line by line, filters new entries using a (.pos) position file, strips units from temperature and humidity, and converts each row into a line-based JSON structure:

*/12 * * * * /[..]/csv2json.sh >> /var/log/csv2json_cron.log 2>&1

Given a source file like this:

Timestamp,Temperature,Humidity
24-02-05 14:10,22.5C,51.2%
24-02-05 14:15,22.4C,51.3%
24-02-05 14:20,22.4C,51.5%

It produces JSON output like this:

{"timestamp":"2024-02-05T14:10:00+02:00","temperature":22.5,"humidity":51.2}
{"timestamp":"2024-02-05T14:15:00+02:00","temperature":22.4,"humidity":51.3}
{"timestamp":"2024-02-05T14:20:00+02:00","temperature":22.4,"humidity":51.5}

The resulting files are saved in a designated directory and automatically picked up by the Elastic Agent upon any changes. The next step was to push the values into Elasticsearch using a custom integration and ingest pipeline – including appropriate mapping and Kibana visualization.

Bash Logic Overview – JSON Export via Function

The core of the script is a function that detects new CSV rows, cleans them, and appends them as JSON:

convert_csv_to_ndjson_append() {
  local input_file="$1"
  local output_file="$2"

  state_file="${output_file}.pos"
  last_line=1
  if [[ -f "$state_file" ]]; then
    last_line=$(<"$state_file")
  fi

  current_line=1

  tail -n +2 "$input_file" | while IFS=',' read -r timestamp temp humid; do
    ((current_line++))

    if (( current_line <= last_line )); then
      continue
    fi

    clean_temp=$(echo "$temp" | tr -d 'C')
    clean_humid=$(echo "$humid" | tr -d '%')

    iso_timestamp=$(date -d "20$timestamp" --iso-8601=seconds 2>/dev/null)
    if [ -z "$iso_timestamp" ]; then
      iso_timestamp="20$timestamp"
    fi

    echo "{\"timestamp\":\"$iso_timestamp\",\"temperature\":$clean_temp,\"humidity\":$clean_humid}" >> "$output_file"
    echo "$current_line" > "$state_file"
  done
}

This function bridges raw CSV data to structured JSON – efficient, reliable, and extensible. Here's a breakdown:


Progress Tracking with .pos File

state_file="${output_file}.pos"
last_line=$(<"$state_file")

Stores the last processed line number in a .pos file to ensure only new lines are handled during the next run.


Cleaning Raw Data

clean_temp=$(echo "$temp" | tr -d 'C')
clean_humid=$(echo "$humid" | tr -d '%')

Removes units from temperature and humidity so they can be stored as numeric values in Elasticsearch – essential for proper dashboard use in Kibana.


Generating ISO Timestamps

iso_timestamp=$(date -d "20$timestamp" --iso-8601=seconds)

Since the CSV timestamp is in a short format (24-02-05), it's converted to ISO format – crucial for Elasticsearch/Kibana timelines.


The output files contain newline-delimited JSON (NDJSON), the preferred format for Filebeat and other Elastic components – each line is an independent JSON object that can be individually processed. The next step: an ingest pipeline and index mapping in the Elastic Stack.

Elastic Agent Integration and Processing in Elasticsearch

Once the Bash script produces NDJSON files, the Elastic Agent takes over. Inside the assigned policy, two instances of the "Custom Logs (Filestream)" integration are configured – each pointing to one of the JSON files from Location A or B. This separation ensures the data is written into distinct datasets.

This separation is essential for clarity and enables side-by-side comparison of both locations in the Kibana dashboard.

Processing is handled via a dedicated ingest pipeline that normalizes the timestamp field and maps the data into Elasticsearch fields. The pipeline and index mapping were intentionally kept simple – covering just timestamp, temperature, and humidity – to maximize flexibility.

The result: clean, structured sensor data in Elasticsearch, ready for visualization, correlation, or alerting.

Visualization in Kibana – Comparing Location Data

With the sensor data successfully ingested into Elasticsearch, the goal was to display it clearly and comparably. Kibana dashboards provide the required flexibility.

Each location (A and B) is visualized in separate panels – using the same time axis and unified temperature/humidity scales. This allows for easy spotting of trends or anomalies between locations.

Typical visualizations include:

  • Time series: Temperature & humidity over time
  • Averages across selected time ranges

Alerts can also be configured – e.g., when temperature exceeds a threshold or when a sensor stops reporting.

Here’s a screenshot of the finished dashboard:

Container Manager Image

Completed Kibana dashboard with temperature and humidity data

Conclusion

What started as a simple Plotly visualization evolved into a robust and centralized solution based on the Elastic Stack. By combining a Bash script, Elastic Agent, custom ingest pipeline, and a structured dashboard, a maintainable infrastructure emerged – technically transparent and visually effective.

In homelab environments, the effort to systematically integrate sensor data pays off – not just for analysis, but also for quick responses to deviations or outages. The clear separation by location, use of datasets, and the potential for expansion make this setup scalable over time.

For those already working with the Elastic Stack, this project provides solid inspiration for integrating IoT data sources like DHT22 sensors into an existing infrastructure – with minimal effort and significant benefit.

This setup not only adds value in day-to-day operations but also serves as a foundation for expanding observability to other IoT devices in the future.


You know, for search.