Knowledge Seeker

Project start: 2025-04-16

Project description

Knowledge Seeker is an advanced tool for transcription, indexing, and information retrieval from video recordings. As the project leader, I coordinate the development of a system utilizing the latest AI technologies for speech-to-text processing and implementation of advanced semantic search mechanisms. The application enables users not only to find specific information in extensive video resources but also to generate responses to queries based on accumulated knowledge using the RAG (Retrieval-Augmented Generation) architecture.

Preliminary architecture logic

alt text

Main functionalities

Transcription of video recordings to text with preservation of time metadata
Processing transcriptions through chunking and generating embeddings
Vector database for storing and efficiently searching embeddings
User interface enabling both simple and semantic content searching
RAG (Retrieval-Augmented Generation) system for generating responses to user queries
Deployment in Digital Ocean cloud ensuring scalability and availability
Data export in JSON formats and streaming capability to user API

Development Roadmap

Integration with additional data sources (documents, presentations, audio)
Enhancement of RAG mechanisms with advanced filtering and re-ranking techniques
Implementation of components for automatic verification and updating of the knowledge base
Optimization of indexing and search processes for larger datasets
Development of API interface enabling integration with external applications

Skills

Python
Docker
Digital Ocean
LLM (Large Language Models)
Natural Language Processing
Vector Databases
RAG (Retrieval-Augmented Generation)
REST API
Streamlit
JSON/Embeddings
Whisper (Speech-to-Text)
PostgreSQL
Microservice Architecture
Qdrant/Weaviate

Technologies used in the project

OpenAI API
Whisper for audio transcription
Qdrant/Weaviate as vector database
LangChain for RAG implementation
FastAPI for backend services
Streamlit for user interface
Docker for containerization
Digital Ocean for hosting