Audio intelligence primitives for decoded media streams.
Source → · CMake target: icey::speech
The speech module is the audio-side intelligence branch for Icey. It stays transport agnostic on purpose: feed it decoded PlanarAudioPacket instances from any capture, file, WebRTC, or HTTP pipeline, and it emits structured events back out as plain json::Value.
The first landing is intentionally narrow:
SpeechQueue is the explicit async clone boundary for decoded audio packets.VoiceActivityDetector performs lightweight level-based speech detection without introducing a model runtime.SpeechEvent and AudioRef serialize directly to JSON for existing Symple, WebSocket, or HTTP event planes.This module is audio-only. Video intelligence belongs in the sibling vision module rather than a combined ai surface.
speech depends on base, av, and json.
av must be available, so FFmpeg-backed packet types are present.The module builds only when icey::av is available.
The module is built around one explicit rule: keep the hot path borrowed and zero-copy until you actually cross an async boundary.
decoded PlanarAudioPacket
│
├── synchronous branch (borrowed packet)
│
└── SpeechQueue ── clone boundary ──► worker / detector
│
└── SpeechEvent -> json::ValueThat makes it safe to branch off a live send or record pipeline without forcing the transport path to pay for queue retention unless you asked for it.
speech::SpeechQueue: AsyncPacketQueue<av::PlanarAudioPacket> specialized for speech work.speech::VoiceActivityDetector: lightweight energy-based VAD with hysteresis and update throttling.speech::SpeechEvent: structured event payload with detector metadata, audio timing, and activity levels.#include "icy/speech/speechqueue.h"
#include "icy/speech/voiceactivitydetector.h"
speech::SpeechQueue queue(32);
speech::VoiceActivityDetector vad({
.source = "mic/front",
.sampleRateHint = 48000,
.startThreshold = 0.045f,
.stopThreshold = 0.020f,
.minSilenceUsec = 250000,
.updateIntervalUsec = 250000,
});
queue.emitter += [&](IPacket& packet) {
vad.process(static_cast<av::PlanarAudioPacket&>(packet));
};
vad.Event += [&](const speech::SpeechEvent& event) {
auto json = speech::toJson(event);
publish(json);
};Common placements:
If you need packet timing and ownership rules first, read PacketStream.