AI on Microcontrollers: A Deep Dive into TinyML, ESP-Skainet, and the Embedded Intelligence Revolution

The convergence of artificial intelligence and microcontrollers represents one of the most exciting developments in embedded systems. TinyML enables devices to make smart decisions without needing to send data to the cloud, which is beneficial from both efficiency and privacy perspectives. This article explores the landscape of AI on microcontrollers, examining the tools, frameworks, and practical considerations for bringing intelligence to the edge.
What is TinyML?
TinyML refers to the application of machine learning techniques on extremely resource-constrained devices, such as microcontrollers and other small embedded systems with limited memory, processing power, and energy resources. Unlike traditional machine learning that runs on powerful servers or cloud infrastructure, TinyML brings inference capabilities directly to devices that might have as little as 256KB of RAM and a few hundred MHz of processing power.
The appeal is compelling: TinyML helps tiny devices make decisions based on huge amounts of data without wasting time and energy transmitting it elsewhere, and with inference on-device, user privacy is protected since no audio or data ever needs to be sent to the cloud.
ESP-Skainet: Voice Intelligence for ESP32
What is ESP-Skainet?
ESP-Skainet enables convenient development of wake word detection and speech command recognition applications based on Espressif's ESP32 series chips, allowing developers to easily build wake word and command recognition solutions. It's Espressif's answer to bringing voice assistant capabilities to low-power microcontrollers.
Core Components
ESP-Skainet consists of several key engines:
WakeNet (Wake Word Detection): WakeNet is designed for low-power embedded MCUs with a low memory usage of approximately 20KB and achieves a 97% wake-up performance within a one-meter distance in a quiet environment, and 95% within a three-meter distance. Espressif provides wake words such as "Hi, Lexin" and "Hi, ESP" for free, and also supports custom wake words.
MultiNet (Speech Command Recognition): MultiNet is a lightweight model that allows ESP32 to perform offline speech-recognition of multiple commands, using Convolutional Recurrent Neural Networks (CRNN) and Connectionist Temporal Classification (CTC), processing an audio clip's Mel-Frequency Cepstral Coefficients (MFCC) as input and outputting phonemes. Currently, MultiNet supports up to 200 Chinese or English speech commands such as "Turn on the air conditioner" and "Turn on the bedroom light", and users can easily add their own commands without retraining the model.
Audio Front-End (AFE): The Audio Front-End integrates AEC (Acoustic Echo Cancellation), VAD (Voice Activity Detection), BSS (Blind Source Separation), and NS (Noise Suppression), with Espressif's two-mic AFE qualified as a "Software Audio Front-End Solution" for Amazon Alexa Built-in devices.
What You Can Do with ESP-Skainet
ESP-Skainet is ideal for AIoT and smart home applications, enabling local voice control of devices, with typical applications including smart-home devices like voice-controlled switches, outlets, lamps, thermostats, and security systems, smart-office equipment like voice-controlled displays and phones, and interactive products such as educational toys or assistants for the elderly.
Hardware Requirements
To run ESP-Skainet, you need an ESP32 or ESP32-S3 development board with an integrated audio input module. Popular boards include:
ESP32-Korvo
ESP32-S3-Korvo-1 and Korvo-2
ESP-BOX
ESP32-S3-EYE
The ESP32-Korvo includes an ESP32-WROVER-B module with 16MB SPI flash (comfortably above the 4MB minimum for MultiNet support) and 64Mb pseudo-static RAM (PSRAM).
Limitations
Language support is primarily Chinese and English
Requires specific hardware with adequate RAM (ESP32 or ESP32-S3)
Requires ESP-IDF v4.4 or ESP-IDF v5.0
The wake word detection must be active before command recognition begins
TensorFlow Lite for Microcontrollers: The Foundation of TinyML
Overview
TensorFlow Lite for Microcontrollers is designed for the specific constraints of microcontroller development, allowing deployment of machine learning models on tiny microcontrollers to boost the intelligence of billions of devices in our lives, including household appliances and Internet of Things devices, without relying on expensive hardware or reliable internet connections.
How It Works
Because machine learning is computationally expensive, TensorFlow Lite for Microcontrollers requires a 32-bit processor, such as an ARM Cortex-M or ESP32, and the library is mostly written in C++, requiring a C++ compiler.
The workflow follows these steps:
Train the model on a computer or server
Convert to FlatBuffer format (.tflite file)
Convert to C array for embedding in firmware
Deploy using the TensorFlow Lite for Microcontrollers library
Run inference on the microcontroller
On the microcontroller, the TensorFlow Lite for Microcontrollers library uses the model to perform inference, such as feeding unseen photos to determine if there is a cat in the photo.
What You Can Build
Example applications include a Hello World demonstration of the absolute basics, and person detection that captures camera data with an image sensor to detect the presence or absence of a person. TinyML applications include visual and audio wake words that trigger an action when a person is detected in an image or a keyword is spoken, predictive maintenance on industrial machines using sensors to continuously monitor for anomalous behavior, and gesture and activity detection for medical, consumer, and agricultural devices, such as gait analysis, fall detection or animal health monitoring.
Edge Impulse: The TinyML Development Platform
What is Edge Impulse?
Edge Impulse is a development platform that simplifies building, training, and deploying machine learning models on embedded systems and edge devices such as microcontrollers, sensors, and single-board computers like the Raspberry Pi or Arduino. Edge Impulse is a cloud-based machine learning operations (MLOps) platform for developing embedded and edge ML (TinyML) systems that can be deployed to a wide range of hardware targets, addressing challenges of fragmented software stacks and heterogeneous deployment hardware by streamlining the TinyML design cycle with various software and hardware optimizations.
Key Features
End-to-End Workflow: Edge Impulse makes it easy to collect a dataset, choose the right machine learning algorithm, train a production-grade model, and run tests to prove that it works, with the whole process quick enough to run through in a few minutes.
Data Collection: Edge Impulse can easily collect data from any sensor and development board using the Data forwarder, a small application that reads data over serial and sends it to Edge Impulse.
Model Optimization: The platform provides estimates of how the model will perform on the target device, including memory usage (RAM and flash) and latency, helping ensure the model fits within hardware constraints.
Wide Hardware Support: Edge Impulse launched with the Arduino Nano 33 BLE Sense, but models can be exported as an Arduino library to run on any Arm-based Arduino platform including the Arduino MKR family or Arduino Nano 33 IoT, providing the board has enough RAM.
Typical Workflow
Data acquisition - Connect your device and collect labeled sensor data
Impulse design - Configure processing blocks (like MFCC for audio) and learning blocks (neural network)
Feature generation - Extract features from raw data
Training - Train the model with configurable parameters
Testing - Validate accuracy and performance
Deployment - Export as optimized library for your target hardware
The model trained using Edge Impulse can be around 18kb in size, which is mind-blowingly small for something so sophisticated and leaves a lot of space for application code.
Other Major Tools and Frameworks
STM32Cube.AI / X-CUBE-AI
X-CUBE-AI is a package that extends the capabilities of STM32CubeMX, adding the possibility to convert a pre-trained neural network into an ANSI C library that is performance optimized for STM32 microcontrollers based on ARM Cortex-M4 and M7 processor cores. The tool has advantages for developers such as a graphical user interface, support for different deep learning frameworks such as Keras and TensorFlow Lite, 8-bit-quantization, and compatibility with different STM32 microcontroller series.
ARM CMSIS-NN
Common Microcontroller Software Interface Standard (CMSIS) has a version to deploy Neural Networks (CMSIS-NN) that was developed hand by hand with TensorFlow Lite engineers, meaning the operations supported by ARM microcontrollers are the same that TensorFlow Lite supports. CMSIS-NN kernels are used at a low level by tools like STM32Cube.AI.
Common Practices and Optimization Techniques
Model Optimization
The key to running ML on microcontrollers is aggressive optimization:
Quantization: Converting 32-bit floating-point models to 8-bit integer representations, dramatically reducing memory footprint and computation requirements with minimal accuracy loss.
Pruning: Removing unnecessary connections in neural networks to reduce model size.
Knowledge Distillation: Training smaller "student" models to mimic larger "teacher" models.
Hardware Selection
A 32-bit processor such as an ARM Cortex-M or ESP32 is required for TensorFlow Lite for Microcontrollers. Popular platforms include:
ARM Cortex-M4/M7: STM32 series, nRF52 series
ESP32/ESP32-S3: For Wi-Fi/Bluetooth connectivity
Arduino Nano 33 BLE Sense: Features an Arm Cortex-M4 microcontroller running at 64 MHz with 1MB Flash memory and 256 KB of RAM, with onboard sensors including a 9-axis IMU
Development Workflow
Currently, STM32Cube.AI is more commonly used because the X-CUBE-AI expansion package provides end-to-end solutions for automatic neural network model conversion, validation, and system performance measurements, making ARM Cortex-M 32-bit STMicroelectronics microcontrollers the most common platform.
What You Can and Can't Do
What Works Well
Audio Classification: Models can recognize household sounds like running water from a faucet, and be trained in just a few minutes with only a small amount of audio data
Simple Image Recognition: Person detection, basic object recognition Gesture Detection: Using accelerometer/IMU data Keyword Spotting: Wake word detection and simple voice commands Anomaly Detection: Identifying unusual patterns in sensor data
Limitations
Memory Constraints: TinyML systems often have very flat memory hierarchies, due to small or non-existent caches and often no off-chip memory
Model Complexity: Complex vision models like modern CNNs, large language models, and multi-modal systems typically won't fit
Computation Speed: Real-time video processing remains challenging
Training: All training must happen off-device; microcontrollers can only run inference
Accuracy Trade-offs: Smaller models generally mean lower accuracy compared to cloud-based alternatives
How TinyML Works: A Technical Overview
The Inference Engine
The main application performs real-time predictions by defining the model to be loaded from a header file, adding necessary operation resolvers (like Fully Connected and ReLU layers), allocating a tensor arena which provides memory for intermediate computations and tensor data during inference, and initializing the interpreter using the model, the operation resolver, the tensor arena, and its size.
Memory Management
Models are typically stored in flash memory as constant C arrays. During inference, a "tensor arena" in RAM holds intermediate calculations. The challenge is balancing model complexity with available resources.
Hardware Acceleration
Many modern microcontrollers include DSP instructions or dedicated ML accelerators. ARM Cortex-M processors have advanced development of hardware architectures and DSP capabilities while maintaining low cost and power consumption.
Getting Started: A Practical Guide
Prerequisites
Basic programming skills (C/C++ and Python)
Understanding of machine learning fundamentals
Familiarity with embedded systems (helpful but not required)
Recommended Hardware
For Beginners:
- ESP32 DevKit ($10-20) - great value, Wi-Fi/Bluetooth
For Voice Projects:
ESP32-S3-BOX or ESP32-Korvo boards
Boards with built-in microphones
Software Setup
Choose Your Path:
Beginner-friendly: Edge Impulse (web-based, no local setup)
ESP32 voice: Clone ESP-Skainet and install ESP-IDF v4.4 or v5.0
General TinyML: Install TensorFlow, TensorFlow Lite, and your preferred IDE
First Project: Follow a tutorial to train a model that can recognize household sounds like running water from a faucet, using Edge Impulse to collect audio data, train a simple model, and export it as a C++ library
Learning Resources
Books:
- TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers by Pete Warden and Daniel Situnayake, the standard textbook on embedded machine learning
Courses:
- Harvard's Deploying TinyML course provides hands-on experience with deploying TinyML to physical devices, teaching programming in TensorFlow Lite for microcontrollers and featuring projects based on a TinyML Program Kit that includes an Arduino board with onboard sensors and an ARM Cortex-M4 microcontroller
Online Platforms:
Edge Impulse documentation and tutorials
TensorFlow Lite for Microcontrollers examples
ESP-Skainet GitHub repository
Sample Projects to Start With
Audio wake word detection - Using ESP-Skainet or Edge Impulse
Gesture recognition - Using accelerometer data
Simple image classification - Person detection with a camera module
Anomaly detection - Monitor sensor patterns for unusual behavior
Conclusion
AI on microcontrollers represents a paradigm shift in how we think about embedded intelligence. Machine learning at the very edge enables valuable use of the 99% of sensor data that is discarded today due to cost, bandwidth or power constraints, with applications spanning health, white goods, mobility, industry, retail and agriculture.
While the technology is still maturing, the tools available today—from ESP-Skainet's voice capabilities to TensorFlow Lite's flexibility and Edge Impulse's user-friendly platform—make it possible for developers to add AI capabilities to their embedded projects without needing deep machine learning expertise.
The key is understanding the constraints: work within memory limits, optimize aggressively, and choose problems suited to edge inference. Most papers on TinyML have shown quite promising results in terms of accuracy, execution time, power consumption, and memory footprint, though edge computing is a relatively new research topic facing many new challenges.
Start small, experiment with existing tools and examples, and gradually build your understanding. The future of embedded intelligence is being written right now, and with the democratization of TinyML tools, anyone can contribute to it.
Sources:



