Kaivo AI - Building an Agentic EdTech Platform with RAG and Multimodal AI

Posted Jan 26, 2025

By Ayaan Sharif

5 min read

Kaivo AI: Building an Agentic EdTech Platform with RAG and Multimodal AI

As education evolves in the digital age, I’ve been working on Kaivo AI, an ambitious personal project that aims to revolutionize how students and educators interact with educational content. This AI-powered platform integrates agentic workflows and multimodal capabilities to create a truly intelligent educational assistant.

Project Vision

Kaivo AI isn’t just another chatbot for education—it’s designed to be an intelligent agent that can understand, process, and generate educational content across multiple formats. The platform can:

Understand Complex Academic Content: Process textbooks, research papers, videos, and audio lectures
Generate Contextual Learning Materials: Create summaries, quizzes, and explanations tailored to individual learning styles
Provide Interactive Tutoring: Act as a personalized tutor that adapts to student needs
Support Multiple Learning Modalities: Work with text, images, audio, and video content

Technical Architecture

Core Technologies

Backend Infrastructure:

FastAPI: High-performance API framework for handling requests
Redis: Caching and session management for real-time interactions
LangGraph: Orchestrating complex agentic AI workflows

AI & ML Stack:

Google AI (Gemini): Primary language model for reasoning and generation
RAG Pipelines: For dynamic content retrieval and contextual responses
Multimodal Processing: Handling text, images, and audio inputs

Frontend:

React.js: Modern, responsive user interface
Real-time Updates: WebSocket connections for interactive learning

Agentic AI Workflows

The heart of Kaivo AI lies in its agentic architecture. Unlike traditional AI systems that simply respond to queries, Kaivo AI can plan, execute, and adapt its responses based on educational goals:

  
# Simplified example of our agentic workflow
from langgraph import StateGraph, Node

class EducationalAgent:
    def __init__(self):
        self.graph = StateGraph()
        self.setup_workflow()
    
    def setup_workflow(self):
        # Define the educational workflow nodes
        self.graph.add_node("analyze_query", self.analyze_student_query)
        self.graph.add_node("retrieve_content", self.retrieve_educational_content)
        self.graph.add_node("generate_explanation", self.generate_personalized_explanation)
        self.graph.add_node("create_practice", self.create_practice_problems)
        self.graph.add_node("assess_understanding", self.assess_student_understanding)
        
        # Define the workflow edges
        self.graph.add_edge("analyze_query", "retrieve_content")
        self.graph.add_edge("retrieve_content", "generate_explanation")
        self.graph.add_edge("generate_explanation", "create_practice")
    
    def analyze_student_query(self, state):
        """Analyze what the student is asking and their learning level"""
        query = state['student_query']
        # Use Gemini to understand the educational context
        analysis = self.gemini_model.analyze_educational_query(query)
        return {**state, 'query_analysis': analysis}
    
    def retrieve_educational_content(self, state):
        """Retrieve relevant content using RAG"""
        relevant_content = self.rag_pipeline.retrieve(
            query=state['student_query'],
            context=state['query_analysis']
        )
        return {**state, 'retrieved_content': relevant_content}

RAG Implementation for Education

Our RAG (Retrieval-Augmented Generation) system is specifically designed for educational content:

Content Indexing: Educational materials are processed and stored in vector databases
Contextual Retrieval: Relevant information is retrieved based on student queries and learning context
Dynamic Generation: Custom explanations are generated using the retrieved content
Continuous Learning: The system learns from student interactions to improve responses

  
class EducationalRAG:
    def __init__(self):
        self.vector_store = WeaviateVectorStore()
        self.gemini = GeminiModel()
        
    def process_educational_material(self, content, content_type):
        """Process and index educational content"""
        # Extract key concepts and create embeddings
        concepts = self.extract_educational_concepts(content)
        embeddings = self.create_multimodal_embeddings(content, content_type)
        
        # Store in vector database with educational metadata
        self.vector_store.store(
            content=content,
            embeddings=embeddings,
            metadata={
                'concepts': concepts,
                'difficulty_level': self.assess_difficulty(content),
                'content_type': content_type,
                'learning_objectives': self.extract_objectives(content)
            }
        )
    
    def generate_personalized_explanation(self, query, student_profile):
        """Generate explanations tailored to student level"""
        relevant_content = self.vector_store.similarity_search(query)
        
        prompt = f"""
        Based on the following educational content, create a personalized explanation for:
        Student Level: {student_profile['level']}
        Learning Style: {student_profile['style']}
        Question: {query}
        
        Content: {relevant_content}
        
        Provide a clear, engaging explanation with examples.
        """
        
        return self.gemini.generate(prompt)

Key Features in Development

1. Intelligent Content Summarization

The platform can take complex academic materials and create summaries appropriate for different learning levels:

Beginner: Simple explanations with basic vocabulary
Intermediate: More detailed with key concepts highlighted
Advanced: Comprehensive analysis with connections to related topics

2. Interactive Learning Paths

Using agentic AI, Kaivo creates personalized learning journeys:

Assesses current knowledge level
Identifies learning gaps
Creates step-by-step learning paths
Adapts based on progress and understanding

3. Multimodal Content Processing

The platform can work with various content types:

Text: Research papers, textbooks, articles
Images: Diagrams, charts, infographics
Audio: Lecture recordings, podcasts
Video: Educational videos, demonstrations

4. Real-time Tutoring

Acting as an intelligent tutor, the system can:

Answer questions in context
Provide hints without giving away answers
Generate practice problems
Offer encouragement and motivation

Current Progress and Challenges

What’s Working:

✅ Core RAG pipeline for text-based content
✅ Basic agentic workflows with LangGraph
✅ FastAPI backend with Redis caching
✅ React frontend with real-time updates
✅ Integration with Gemini for text generation

Current Challenges:

Multimodal Integration: Expanding beyond text to handle images and audio effectively
Personalization: Building robust student profiling and adaptive learning
Scalability: Preparing for multiple concurrent users
Content Quality: Ensuring generated educational content is accurate and pedagogically sound

Future Roadmap

Short Term (Next 3 months):

Complete multimodal content processing
Implement advanced student profiling
Add collaborative learning features
Beta testing with early users

Long Term (6-12 months):

Mobile application development
Integration with popular LMS platforms
Advanced analytics and progress tracking
Support for multiple languages

Technical Insights

Performance Optimization:

Caching Strategy: Using Redis to cache frequently accessed content and user sessions
Async Processing: Leveraging FastAPI’s async capabilities for better performance
Smart Batching: Processing multiple requests efficiently

AI Model Management:

Model Selection: Choosing the right model for each task (Gemini for reasoning, specialized models for specific domains)
Prompt Engineering: Carefully crafted prompts for educational contexts
Response Quality: Implementing feedback loops to improve AI responses

Open Source and Community

While Kaivo AI is currently a personal project, I’m planning to open-source key components:

Educational RAG pipeline templates
Agentic workflow patterns for education
Frontend components for educational interfaces

Conclusion

Building Kaivo AI has been an incredible learning experience in combining cutting-edge AI technologies with real-world educational needs. The intersection of agentic AI, RAG systems, and multimodal processing opens up exciting possibilities for personalized education.

The goal isn’t to replace human educators but to augment their capabilities and provide students with 24/7 access to intelligent, adaptive learning support.

Tech Stack: FastAPI, Redis, React.js, Google AI (Gemini), LangGraph, RAG, Weaviate, Python

Interested in educational AI or have thoughts on agentic systems? I’d love to connect and discuss ideas!

Projects, AI

This post is licensed under CC BY 4.0 by the author.