Padmanabh Kulkarni
← BACK TO PROJECTS
PythonESP32-S3ChromaDBDockerLLM

AutoDiary: Automated Daily Summarization from Images and Audio

AutoDiary is a wearable AI device that automatically captures images and voice notes throughout your day, transcribes and describes them using AI, and stores them as searchable memories. Caregivers or users can simply ask questions like "What did I do this morning?" and get instant, human-readable answers. Built on an ESP32-S3 microcontroller with a privacy-first, fully local architecture, combining computer vision, speech recognition, and retrieval-augmented generation.

CATEGORY: EDGE AI
AutoDiary: Automated Daily Summarization from Images and Audio

The Problem

Over 55 million people worldwide live with dementia, with Alzheimer's accounting for 60–70% of cases. Existing assistive tools require active user engagement which are impractical for those with severe cognitive decline.

fig1.jpg

Key Features

  • Multimodal capture: images via camera, voice via microphone, triggered by a simple button press

  • AI-powered understanding: automatic image description and speech transcription

  • Semantic memory search: natural language queries using Retrieval-Augmented Generation (RAG)

  • Private by design: all data processed and stored locally, no cloud dependency

  • Two form factors: smart glasses and a pendant wearable

System Architecture

fig3_2.png

The system is built across three layers:

  1. Wearable device: ESP32-S3 microcontroller with camera, microphone, and Wi-Fi

  2. Backend server: Flask-based processing pipeline with vision AI, speech-to-text, and vector search

  3. Web interface: chat-based diary with timeline, media viewer, and query input

Performance Highlights

Image processing completes in under 9 seconds end-to-end, with speech transcription hitting 91% accuracy and vision descriptions at 93% accuracy. The pendant form factor sustains 6–7 hours of battery life, while memory retrieval responds in under 1 second.

Technologies Used

  • Hardware: Seeed XIAO ESP32-S3, OV2640 camera, I2S MEMS microphone

  • AI Models: Gemma-3 4B (vision), Whisper Small (speech), Qwen3-0.6B (embeddings)

  • Backend: Python, Flask, ChromaDB, SQLite

  • Deployment: Docker

Team

Built as a final-year B.Tech project at Walchand Institute of Technology, Solapur by Padmanabh Kulkarni, Rithik Purohit, and Krishna Shingan, under the supervision of Dr. R. S. Khamitkar.

Link

Docker Image: https://hub.docker.com/r/padmanabh10/autodiary
YouTube Demo: https://youtu.be/_2bnRmLbbPM?si=zZsU6XkTSu2Mi8pu