IoT · OperationsCase 09
0 to 1B2C AppIoTCompliance

Turning a cold chain
from a liability into
a monitored asset

Priya Foods managed ₹42 lakhs of perishable stock using manual temperature logs and a generator that took 15 minutes to start. Two incidents in 18 months cost them ₹8.9 lakhs. I led the product definition of ColdGuard — an IoT system that cut failure response time from minutes to seconds.
₹21L+Annual loss before
< 90sAlert to technician
0Manual log entries
7Units monitored 24/7
01The client

Priya Foods Distribution Pvt. Ltd.

A fictional regional distributor of meat, poultry, seafood, and frozen processed foods based in Madurai, Tamil Nadu. They supply 140+ hotels, supermarkets, and restaurants across southern Tamil Nadu from a 1,200 m² warehouse operating 24 hours a day.

At any given time, approximately ₹42 lakhs of perishable stock sits across 4 refrigerated rooms (2°C–8°C) and 2 deep freezers (−18°C to −22°C). Every degree of temperature exceedance is a direct financial risk. The margin between "safe" and "condemned" is measured in minutes, not hours.

02The situation

Three assumptions that were quietly failing

Cold chain operations are unforgiving. A refrigeration failure at 2 AM that goes undetected until the morning inspection is not a minor inconvenience — it is stock destruction. Priya Foods had built their operations around three assumptions: that someone would notice a failure quickly, that two temperature logs a day was enough for compliance, and that staff awareness alone would keep doors properly sealed.

8–15 minPower recovery time
From grid failure to generator running — someone had to notice, walk to the generator room, and start it manually. Deep freezers begin losing product integrity after 20 minutes.
2×/dayCompliance log frequency
Temperature recorded manually at 6 AM and 6 PM. Everything between those readings was invisible to the operations team and to FSSAI on audit.
2 in 18moConfirmed major incidents
One night-shift power failure undetected for 40 minutes. One silent compressor failure caught only at morning inspection. Combined loss: ₹8.9 lakhs in condemned stock.
~15/moDoor-ajar events
Estimated from post-incident analysis. Several ran 2–3 hours during night shift. A 2 cm gap on a refrigerated room door raises internal temperature ~1°C every 8–10 minutes.

"We knew something had gone wrong when the morning shift arrived. By then the damage was already done. There was no way to know sooner."

The real issue was that the operation had no continuous signal. Temperature data existed in a notebook twice a day. A power failure was only detectable if someone happened to notice the lights go out. The cold chain — the single most critical thing keeping the business running — was effectively unmonitored between manual checks.

Total annual loss before ColdGuard was estimated at ₹21+ lakhs across power events, door incidents, compliance admin, and penalty risk — modelled from degradation curves at Tamil Nadu ambient conditions (~30°C outside), with ₹10,000–₹12,000 loss per minute in a full deep-freeze failure.

03The solution

Close every gap in the continuous signal

ColdGuard is an IoT monitoring and alerting system built in partnership with NexSense IoT Solutions, a Chennai-based industrial IoT integrator. It monitors temperature, humidity, door seal status, and three-phase power on every unit, every 60 seconds, and gets an alert to the on-duty technician's phone within 90 seconds of a sustained breach.

The thinking behind it was simple: close every gap in the continuous signal. If the data is there, the problem gets caught before it becomes a loss event. The alert, the response, and the FSSAI log are all just downstream of having reliable data flowing without interruption.

01
Automatic Transfer Switch (ATS)
The Socomec ATYS G replaces the manual transfer switch. On grid failure, it switches to generator power in under 200 milliseconds, before a compressor even has time to register a loss. The 8–15 minute manual recovery window is gone.
02
Continuous sensor monitoring
ESP32 microcontrollers paired with Sensirion SHT40 sensors (±0.2°C accuracy) and magnetic door contact sensors on every unit. One reading per minute, every minute, streamed via MQTT to the cloud. 1,440 data points per unit per day instead of 2.
03
10-minute rule alert engine
Alerts don't fire on a single out-of-range reading — that would create noise during normal loading. The engine fires only when temperature stays above threshold continuously. Configurable per unit: 10 minutes for refrigerated rooms, 5 for freezers, 3 for the blast chiller.
04
FSSAI compliance, automated
The daily report now generates automatically at midnight and is delivered to the authority at 00:05 AM. The compliance officer's 4 hours per week on log management: zero.
04Tech stack

Every layer chosen for an operational reason

Not familiarity or preference. Each component earns its place against a specific requirement.

LayerTechnologyWhy this choice
Sensor nodeESP32 + Sensirion SHT40Industrial-grade MCU with built-in Wi-Fi. ±0.2°C accuracy exceeds FSSAI requirement. ~₹500/unit, runs for years. Honeywell 5816WMWH magnetic contact on each door.
Power monitoringPZEM-004T v3.0Monitors all three phases. Detects voltage drop before the compressor trips, giving the ATS context and flagging power events for the alert engine.
Edge gatewayRaspberry Pi 4B + MosquittoRuns locally on a UPS. If internet goes down, gateway keeps logging locally and buffers 24 hours of data. Alerts fire locally too — system doesn't depend on cloud to protect stock.
Sensor → cloudMQTT → AWS IoT CoreISO-standard IoT protocol. Lightweight, low-latency, handles hundreds of concurrent sensor streams. AWS IoT Core manages device certificates and scales without infrastructure management.
DatabaseTimescaleDBSensor data is time-series by nature. Auto-partitions by time chunk, so a query for "last 24 hours on Unit 05" scans one chunk instead of millions of rows. ~90% compression on older data.
Alert enginePython on AWS LambdaStateless, runs on every incoming reading. Implements the 10-minute rule. Calls Firebase Cloud Messaging for push notifications.
App + dashboardReact Native + React.jsSingle codebase for iOS and Android. Technician app focused on alert response. Web dashboard for operations manager: live heatmap, trend charts, monthly compliance view.

Architecture

Hardware
ESP32 sensor nodes ×7
Sensirion SHT40 temp/humidity
Door contact sensors ×7
PZEM-004T power monitor
Socomec ATYS G (ATS)
Edge
Raspberry Pi 4B gateway
Mosquitto MQTT broker
SQLite local buffer (24h)
UPS-backed power
Local alert logic
Cloud
AWS IoT Core
AWS SQS message queue
Lambda ingestion + alerts
TimescaleDB hypertable
FastAPI backend
Delivery
FCM push notifications
AWS SES email alerts
React Native mobile app
React.js web dashboard
FSSAI PDF at 00:05 AM

Database logic

One reading per minute across 7 units: 10,080 rows a day, 3.7 million a year. TimescaleDB handles this through automatic time-based partitioning — the application just writes and reads ordinary SQL.

-- Core sensor readings table, converted to a TimescaleDB hypertable
CREATE TABLE sensor_readings (
  time          TIMESTAMPTZ    NOT NULL,
  unit_code     VARCHAR(10)    NOT NULL,
  temp_c        NUMERIC(5,2)   NOT NULL,
  humidity_pct  NUMERIC(5,2),
  door_open     BOOLEAN        DEFAULT false,
  power_on      BOOLEAN        DEFAULT true,
  voltage_v     NUMERIC(6,2),
  PRIMARY KEY (time, unit_code)
);

-- Partition by 7-day time chunks
SELECT create_hypertable('sensor_readings', 'time',
  chunk_time_interval => INTERVAL '7 days');

-- Compress chunks older than 30 days (~90% storage reduction)
SELECT add_compression_policy('sensor_readings', INTERVAL '30 days');

-- FSSAI daily compliance — was each unit in range all day?
SELECT unit_code,
  MIN(temp_c) AS min_temp,
  MAX(temp_c) AS max_temp,
  ROUND(100.0 * COUNT(*) FILTER (
    WHERE temp_c BETWEEN su.temp_min_c AND su.temp_max_c
  ) / COUNT(*), 1) AS compliance_pct
FROM sensor_readings sr
JOIN storage_units su USING (unit_code)
WHERE DATE(time) = CURRENT_DATE - 1
GROUP BY unit_code, su.temp_min_c, su.temp_max_c;
05Before & after
Before
Temperature logged twice a day in a notebook
Power failure: 8–15 minutes to generator, manually
Door-ajar events undetectable unless someone walked past
FSSAI report: weekly manual data entry, monthly email
Compliance officer spends 4 hrs/week on log management
Stock loss events confirmed: 2 in 18 months, ₹8.9 lakhs
After
1,440 automated readings per unit per day
Power failure: ATS switches to generator in <200ms
Door open 5+ continuous minutes → push alert fires immediately
FSSAI report generated automatically, delivered 00:05 AM
Compliance officer log management time: zero
Zero stock loss events in 12 months post-deployment
06Outcomes
₹23.8LAnnual saving

Across prevented stock loss, door incidents caught early, power events neutralised by ATS, and compliance admin time recovered.

3 moPayback period

Total system cost including hardware, installation, and first year of cloud infrastructure: ₹5.6 lakhs. Recovered in under one quarter.

720×More data points/day

From 2 manual readings per unit to 1,440 automated readings. Every minute of every day is now in the compliance record.

<200msPower recovery

ATS switches grid to generator before most compressors register a voltage drop. The 8–15 minute manual window is eliminated entirely.

0 hrsManual compliance work

The compliance officer's 4 hours per week on log management, data entry, and FSSAI report preparation is fully automated.

4.2 minAvg alert response

From breach detection to technician acknowledgement. Previously unmeasurable — there was no alert system at all.

07What went wrong

Building on hardware means the real world pushes back

01
Dirty and missing sensor data
Sensors occasionally sent malformed payloads: truncated JSON, null temperature values, or wrong timestamps. We had to build a validation layer that flagged bad readings, stored them separately with a quality marker, and excluded them from compliance calculations without dropping them entirely. They still needed to exist for audit purposes.
02
Sensors going offline mid-operation
A few ESP32 units lost Wi-Fi connectivity intermittently during peak afternoon heat. A gap in readings looked exactly like a damaged or switched-off sensor. We had to distinguish "sensor is offline" from "sensor is reading something it shouldn't" — one is a maintenance issue, the other is an emergency. We added a heartbeat check separate from temperature readings.
03
Secure transmission was harder than expected
Getting data securely to AWS IoT Core required device certificates, mutual TLS, and a proper certificate rotation process. Every physical device needed its own certificate — and if one expired silently, the unit would stop reporting with no obvious error. We had to build certificate expiry monitoring into the system, which wasn't in the original spec.
04
Query performance at scale
Once we had a few months of data across 7 units, some dashboard queries started slowing noticeably — especially trend charts scanning large time ranges. TimescaleDB's chunking solved most of it, but we also had to add a pre-aggregated hourly rollup table so the 30-day chart wasn't recalculating from raw readings on every page load.
What I learned
01
The most important product decision happened before any requirement was written. We had to agree on what "failure" actually meant. The client came in wanting to monitor temperature. What they really needed was to close every gap in the continuous signal. That reframe changed the scope significantly — and brought power monitoring and ATS integration in from day one rather than as an afterthought.
02
The 10-minute alert rule is a good example of where product thinking and technical thinking had to be in the same room. A simple implementation fires an alert the moment temperature crosses a threshold. In practice, that creates constant noise every time someone opens a door during loading, and the technician stops paying attention within a week. Getting the rule right meant understanding what actually happens on the floor, not just what the data model looks like.
03
The local resilience architecture was the hardest sell. Running alert logic on the edge gateway added complexity and cost. But the alternative was a system that could fail to protect stock at the exact moment the internet went down during a power event — which is precisely when it was needed most. Some technical decisions are really trust decisions. If the people relying on a system don't believe it will work when things go wrong, they won't rely on it when things go right either.
Back to work
Next case
Omega-3 Platform Strategy