PythonESP32-S3ChromaDBDockerLLM

AutoDiary: Automated Daily Summarization from Images and Audio

AutoDiary is a wearable AI device that automatically captures images and voice notes throughout your day, transcribes and describes them using AI, and stores them as searchable memories. Caregivers or users can simply ask questions like "What did I do this morning?" and get instant, human-readable answers. Built on an ESP32-S3 microcontroller with a privacy-first, fully local architecture, combining computer vision, speech recognition, and retrieval-augmented generation.

CATEGORY: EDGE AI

AutoDiary: Automated Daily Summarization from Images and Audio

The Problem

Over 55 million people worldwide live with dementia, with Alzheimer's accounting for 60–70% of cases. Existing assistive tools require active user engagement which are impractical for those with severe cognitive decline.

Key Features

Multimodal capture: images via camera, voice via microphone, triggered by a simple button press
AI-powered understanding: automatic image description and speech transcription
Semantic memory search: natural language queries using Retrieval-Augmented Generation (RAG)
Private by design: all data processed and stored locally, no cloud dependency
Two form factors: smart glasses and a pendant wearable

System Architecture

The system is built across three layers:

Wearable device: ESP32-S3 microcontroller with camera, microphone, and Wi-Fi
Backend server: Flask-based processing pipeline with vision AI, speech-to-text, and vector search
Web interface: chat-based diary with timeline, media viewer, and query input

Performance Highlights

Image processing completes in under 9 seconds end-to-end, with speech transcription hitting 91% accuracy and vision descriptions at 93% accuracy. The pendant form factor sustains 6–7 hours of battery life, while memory retrieval responds in under 1 second.

Technologies Used

Hardware: Seeed XIAO ESP32-S3, OV2640 camera, I2S MEMS microphone
AI Models: Gemma-3 4B (vision), Whisper Small (speech), Qwen3-0.6B (embeddings)
Backend: Python, Flask, ChromaDB, SQLite
Deployment: Docker

Team

Built as a final-year B.Tech project at Walchand Institute of Technology, Solapur by Padmanabh Kulkarni, Rithik Purohit, and Krishna Shingan, under the supervision of Dr. R. S. Khamitkar.

Link

Docker Image: https://hub.docker.com/r/padmanabh10/autodiary
YouTube Demo: https://youtu.be/_2bnRmLbbPM?si=zZsU6XkTSu2Mi8pu