Beyond the Cloud: Building a Local Wake Word Detection System with ESPHome & Home Assistant

Represent Beyond the Cloud: Building a Local Wake Word Detection System with ESPHome & Home Assistant article
8m read

Introduction: Reclaiming Your Voice Control Privacy and Performance

In the evolving landscape of smart homes, voice control has become an indispensable convenience. However, the reliance on cloud-based assistants like Amazon Alexa or Google Assistant often comes with trade-offs: privacy concerns due to constant microphone listening, potential latency issues, and a dependency on internet connectivity. For many tech enthusiasts and privacy-conscious homeowners, these compromises are unacceptable.

This guide dives deep into building a truly local, privacy-centric wake word detection system using ESPHome and integrating it seamlessly with Home Assistant. By moving wake word processing to an edge device, we eliminate cloud eavesdropping, drastically reduce latency, and ensure your smart home remains responsive even during internet outages. You'll learn how to craft a robust, customizable voice interface that empowers you with full control over your data and devices.

Step-by-Step Setup: Building Your Local Voice Assistant Hardware

1. Hardware Selection and Assembly

The foundation of our local voice assistant is an ESP32-S3 based development board, chosen for its enhanced processing power and AI capabilities, making it ideal for on-device wake word detection. You'll also need a suitable microphone.

Recommended Components:

  • ESP32-S3 Development Board: Look for boards with integrated USB-C for easy flashing (e.g., Adafruit ESP32-S3 Feather or generic ESP32-S3-DevKitC-1).
  • I2S Microphone: An INMP441 MEMS microphone breakout board is highly recommended for its digital output, noise resistance, and ease of integration.
  • Optional Speaker: For audio feedback or text-to-speech, a small amplified speaker connected to a DAC (e.g., MAX98357A I2S Class-D Amplifier).

Wiring Diagram (INMP441 to ESP32-S3):

Assuming a typical INMP441 board with pins: VCC, GND, SD (Data), SCK (Clock), WS (Word Select/L/R clock).

INMP441       --   ESP32-S3
--------------------------------
VCC           --   3.3V
GND           --   GND
SD (Data)     --   GPIO_34 (or any available I2S DATA pin)
SCK (Clock)   --   GPIO_32 (or any available I2S BCLK pin)
WS (L/R Clock)--   GPIO_33 (or any available I2S WS pin)

(Screenshot Placeholder: Image of wired ESP32-S3 and INMP441 on a breadboard)

2. ESPHome Firmware Creation for Wake Word Detection

Now, let's configure ESPHome to handle audio input, process it for wake word detection, and communicate with Home Assistant.

Initial ESPHome Configuration:

Create a new ESPHome device YAML file. Here's a basic structure:

esphome:
  name: local_voice_assistant
  platform: ESP32S3
  board: esp32-s3-devkitc-1 # Adjust to your specific board

# Enable Home Assistant API for seamless integration
api:

# Enable OTA for over-the-air updates
otta:

# Enable Wi-Fi
wifi:
  ssid: "YOUR_WIFI_SSID"
  password: "YOUR_WIFI_PASSWORD"
  manual_ip:
    static_ip: 192.168.1.XXX # Assign a static IP
    gateway: 192.168.1.1
    subnet: 255.255.255.0

# Configure Logger
logger:
  level: DEBUG # Helpful for debugging audio issues

# Configure I2S Audio Input (Microphone)
i2s_audio:
  type: PDM
  # For INMP441, set type: I2S and adjust pins as per wiring.
  # PDM example (if using a PDM mic like SPH0645LM4H): 
  # pdm_pin: GPIO_XX 
  # clk_pin: GPIO_YY
  # For I2S (INMP441):
  # sclk_pin: GPIO_32
  # ws_pin: GPIO_33
  # sd_pin: GPIO_34
  # channel: left # Or right, depending on wiring/mic
  # sample_rate: 16000Hz
  # bits_per_sample: 16
  # gain: 1.0 # Adjust as needed

# Configure Wake Word Detection (using OpenWakeWord)
wake_word:
  - platform: openwakeword
    model: "hey_esphome.tflite" # Or other desired model. See ESPHome docs for models.
    threshold: 0.5 # Adjust sensitivity (0.0 to 1.0)
    on_wake_word:
      - homeassistant.event: # Trigger a Home Assistant event
          event: esphome_wake_word_detected
          data:
            device: local_voice_assistant
            wake_word: hey_esphome
      - voice_assistant.start: # Start the Home Assistant Assist pipeline

# Optional: Home Assistant Voice Assistant integration (for Assist pipeline)
homeassistant_voice_assistant:

# Optional: Speaker (for text-to-speech or audio feedback)
# i2s_audio:
#   id: i2s_speaker
#   type: DAC
#   # Other I2S DAC pins like BCLK, WS, DATA
#   # dac_mode: mono
#   # use_apll: true # For higher quality audio
# speaker:
#   - platform: i2s
#     i2s_audio_id: i2s_speaker
#     on_start:
#       - volume.set: 50%
#     on_end:
#       - volume.set: 0%
#     volume: 0.5
#     id: media_speaker

# text_to_speech:
#   - platform: homeassistant
#     name: "HA TTS Speaker"
#     id: ha_tts_speaker
#     internal: true
#     speaker_id: media_speaker # Link to your speaker component

Explanation:

  • i2s_audio: Configures the digital microphone. Adjust type and pin assignments based on your specific mic (I2S for INMP441, PDM for PDM mics). The gain parameter is crucial for microphone sensitivity.
  • wake_word: This is the core component. We use openwakeword, which utilizes pre-trained TensorFlow Lite models. You need to specify a model file (e.g., hey_esphome.tflite). ESPHome will automatically download and embed this. The threshold controls sensitivity.
  • on_wake_word: Defines actions when the wake word is detected. Here, we trigger a custom Home Assistant event and immediately start the voice_assistant.start pipeline, which allows continuous listening for commands.
  • homeassistant_voice_assistant: This component bridges ESPHome's audio input/output directly to Home Assistant's Assist pipeline, enabling a full conversational experience.

Flash this firmware to your ESP32-S3 using the ESPHome dashboard or the command-line tool. Once flashed, the device should appear in Home Assistant's Integrations dashboard.

3. Home Assistant Integration and Automation

After your ESPHome device is online, Home Assistant will auto-discover it. Add the integration.

Configuring Home Assistant Assist Pipeline:

If you've enabled homeassistant_voice_assistant in ESPHome, your device will automatically link to the Home Assistant Assist pipeline. Navigate to Settings -> Voice Assistants -> Add an Assist pipeline or edit an existing one.

  1. Input: Select your ESPHome device as the Wake word and Conversation agent.
  2. Output: If you have a speaker connected to your ESPHome device, select it as the Output device for responses.

(Screenshot Placeholder: Home Assistant Assist pipeline configuration screen)

Basic Automation (if not using Assist pipeline fully):

If you only want to trigger specific automations directly from a wake word event without a full conversation, you can use the custom event:

# automation.yaml
- alias: "Wake Word Detected - Turn on Living Room Lights"
  trigger:
    - platform: event
      event_type: esphome_wake_word_detected
      event_data:
        device: local_voice_assistant
        wake_word: hey_esphome
  action:
    - service: light.turn_on
      target:
        entity_id: light.living_room_lights
  mode: single

Troubleshooting Your Local Voice Assistant

1. No Wake Word Detection / Poor Accuracy

  • Microphone Wiring: Double-check all connections between the mic and ESP32. Even subtle errors can prevent audio input.
  • ESPHome Logs: Connect to your ESPHome device via serial or the ESPHome dashboard logs. Look for i2s_audio or pdm_audio related errors, or messages indicating a low sample rate or issues initializing the microphone.
  • Microphone Gain: Adjust the gain parameter in your i2s_audio or pdm_audio configuration. Start with 1.0 and increase it gradually (e.g., 2.0, 4.0) if the device isn't picking up sounds. Too high gain can introduce distortion.
  • Wake Word Threshold: The threshold in the wake_word component dictates sensitivity. A lower value (e.g., 0.4) makes it more sensitive but increases false positives. A higher value (e.g., 0.7) makes it less sensitive. Experiment to find the sweet spot for your environment.
  • Environmental Noise: Loud background noise significantly impacts detection. Try testing in a quiet environment first.
  • Model Choice: Ensure you're using an appropriate wake word model. Some are more robust than others.

2. Home Assistant Not Responding After Wake Word

  • ESPHome-Home Assistant Connection: Verify that your ESPHome device is connected to Home Assistant. Check Settings -> Devices & Services -> ESPHome.
  • Assist Pipeline Configuration: Go to Settings -> Voice Assistants and ensure your ESPHome device is correctly selected as the Wake word and Conversation agent in your chosen pipeline.
  • Network Issues: Ensure your ESPHome device has a stable Wi-Fi connection and can reach your Home Assistant instance.
  • Home Assistant Logs: Check Home Assistant's logs for any errors related to the voice assistant or ESPHome integration.

Advanced Configuration and Optimization

1. Custom Wake Words (Advanced)

While ESPHome ships with several pre-trained OpenWakeWord models, you can train your own custom wake words for a truly personalized experience. This is a more advanced topic involving data collection and model training, often requiring tools like Google's TensorFlow Lite Micro or services like Picovoice's Porcupine. For OpenWakeWord specifically, refer to their training documentation. Once you have a .tflite model, simply replace the model path in your ESPHome YAML.

2. Multiple Wake Words for Contextual Control

You can configure multiple wake words on a single ESPHome device to trigger different actions or pipelines:

wake_word:
  - platform: openwakeword
    model: "hey_esphome.tflite"
    id: wake_word_main
    threshold: 0.5
    on_wake_word:
      - voice_assistant.start:
  - platform: openwakeword
    model: "hey_computer.tflite"
    id: wake_word_alt
    threshold: 0.6
    on_wake_word:
      - homeassistant.event:
          event: esphome_alternate_wake_word
          data:
            device: local_voice_assistant
            wake_word: hey_computer
      - script.turn_on: my_computer_control_script

This allows specific phrases to either engage the main Assist pipeline or trigger direct, non-conversational automations.

3. Microphone Arrays and Beamforming

For larger rooms or noisy environments, a single microphone might not suffice. Consider using a microphone array (e.g., boards with 2 or 4 PDM mics) to enable beamforming and noise reduction. While ESPHome's direct support for multi-mic beamforming is evolving, some advanced users integrate external audio processing chips or custom firmware to preprocess audio before feeding it to the wake word engine. This significantly improves accuracy and range.

4. Fine-Tuning Assist Pipeline Integration

Beyond basic commands, Home Assistant's Assist pipeline allows for complex intents. Ensure your intents are well-defined. You can also use the on_end and on_error actions of the voice_assistant component in ESPHome to provide audio feedback:

homeassistant_voice_assistant:
  on_start:
    # Play a chime sound
    - media_player.play_media:
        entity_id: media_player.local_speaker # If you have one
        media_content_id: "/local/chime_start.mp3"
        media_content_type: music
  on_end:
    - media_player.play_media:
        entity_id: media_player.local_speaker
        media_content_id: "/local/chime_end.mp3"
        media_content_type: music

Real-World Example: Offline Media & Lighting Control with 'Hey Computer'

Imagine walking into your living room and simply saying, "Hey Computer, lights on and play some jazz." This setup allows for precisely that, even if your internet is down.

ESPHome Configuration Snippet (living_room_voice.yaml):

# ... (common setup as above) ...

i2s_audio:
  type: PDM
  pdm_pin: GPIO_18
  clk_pin: GPIO_17
  sample_rate: 16000Hz
  bits_per_sample: 16
  gain: 4.0 # Tuned for living room

wake_word:
  - platform: openwakeword
    model: "hey_computer.tflite" # Using a custom or common 'hey_computer' model
    threshold: 0.55
    on_wake_word:
      - voice_assistant.start:

homeassistant_voice_assistant:

i2s_audio:
  id: local_speaker_output
  type: DAC
  # Speaker specific pins
  dac_mode: mono
speaker:
  - platform: i2s
    id: local_speaker
    i2s_audio_id: local_speaker_output
    volume: 0.7

text_to_speech:
  - platform: homeassistant
    id: ha_tts
    speaker_id: local_speaker

Home Assistant Assist Pipeline Setup:

Create or edit an Assist pipeline named "Living Room Voice":

  • Wake word: living_room_voice (your ESPHome device)
  • Conversation agent: living_room_voice
  • Text-to-speech: ha_tts (your ESPHome device's TTS service)
  • Output: local_speaker (your ESPHome device's speaker)

(Screenshot Placeholder: Example of Home Assistant Assist Pipeline configuration showing ESPHome device as input/output)

Home Assistant Intents & Automations:

Define an intent for media and lighting. In your configuration.yaml or intents.yaml:

# configuration.yaml or dedicated intents.yaml
intents:
  PlayMusicAndLights:
    - "{action} and play {genre} music"
    - "{action} the {area} lights and play {genre}"

conversation:
  - trigger: PlayMusicAndLights
    action:
      - service: light.turn_on
        data:
          entity_id: light.living_room_main
      - service: media_player.play_media
        data:
          entity_id: media_player.living_room_stereo # Replace with your media player
          media_content_id: "spotify:playlist:your_jazz_playlist_id" # Or radio URL
          media_content_type: music
      - service: tts.speak
        data:
          entity_id: tts.ha_tts # Your ESPHome TTS speaker
          message: "Playing some jazz and turning on the lights!"

Now, when you say "Hey Computer, turn on the lights and play some jazz," your ESPHome device detects the wake word, hands off to Home Assistant's Assist pipeline, which then processes the intent, turns on the lights, starts your music, and provides a verbal confirmation – all locally.

Best Practices and Wrap-up

1. Prioritize Privacy and Local Control

The primary benefit of this setup is privacy. By using ESPHome and OpenWakeWord, your voice data never leaves your local network. Continuously monitor your ESPHome logs to ensure no unintended data leakage or cloud connections are established.

2. Optimize Performance and Responsiveness

  • Static IP: Assign a static IP address to your ESPHome device to prevent delays during IP assignment and ensure stable communication.
  • Microphone Placement: Position your microphone optimally, away from direct sources of noise (fans, vents) and ideally within a few feet of typical speaking locations.
  • ESPHome Updates: Keep your ESPHome firmware updated to benefit from performance improvements, bug fixes, and new features related to audio processing and wake word detection.

3. Scalability and Redundancy

You can deploy multiple ESPHome voice assistant devices throughout your home, each with its own wake word model or linked to a central Home Assistant Assist pipeline. Home Assistant is smart enough to determine the closest device for a response if multiple are active, enhancing the user experience.

4. Security Considerations

  • Secure Wi-Fi: Ensure your Wi-Fi network is secured with WPA2/WPA3.
  • Home Assistant Access: Restrict access to your Home Assistant instance with strong passwords and, if exposed externally, use a VPN or Cloudflare Tunnel rather than port forwarding.
  • ESPHome Access: Limit access to your ESPHome dashboard to trusted devices.

By following these guidelines, you'll not only have a robust, private, and responsive voice control system but also a deeper understanding of how to leverage ESPHome and Home Assistant for advanced smart home automations. Enjoy the power of local voice control, tailored precisely to your needs!

Avatar picture of NGC 224
Written by:

NGC 224

Author bio: DIY Smart Home Creator

There are no comments yet
loading...