Getting Started

How It Works

Understand the technology behind hej!'s AI-powered chat experience.

hej! uses a combination of web crawling, vector embeddings, and large language models to create intelligent chatbots that understand your website's content. Here's how each component works together.

1. Intelligent Website Crawling

When you provide your website URL, our crawler (powered by Playwright) visits your site and discovers all accessible pages. Unlike simple crawlers, our AI:

Intelligently selects pages - Uses an LLM to identify the most relevant pages for your knowledge base, avoiding duplicate content and boilerplate
Extracts clean content - Removes navigation, ads, and other noise to focus on the valuable information
Captures screenshots - Takes visual snapshots for context and verification
Respects rate limits - Crawls responsibly without overwhelming your server

2. Vector Embedding & Indexing

Each page's content is transformed into vector embeddings using OpenAI's embedding models. These vectors capture the semantic meaning of your content:

Text chunking - Content is split into optimal-sized chunks for accurate retrieval
Semantic embeddings - Each chunk is converted to a high-dimensional vector representing its meaning
ChromaDB storage - Vectors are stored in a purpose-built vector database for fast similarity search
Metadata preservation - Original text, URLs, and page titles are stored alongside embeddings

Example: How "pricing" queries work

"What are your prices?"

[0.23, -0.14, 0.87, ...]

pricing-page.html

3. Semantic Search & Retrieval

When a user asks a question, we find the most relevant content from your knowledge base:

Query embedding - The user's question is converted to a vector
Similarity search - We find chunks with the closest semantic meaning (not just keyword matching)
Context assembly - Top matching chunks are combined to form the retrieval context
Tool calling - For complex queries, the AI can invoke multiple searches and tools

4. AI Response Generation

The retrieved context is combined with your custom system prompt and sent to the LLM:

System prompt - Defines the AI's personality, knowledge boundaries, and behavior
Context injection - Retrieved content is provided as reference material
Streaming response - Answers are streamed in real-time for a responsive experience
Tool execution - Custom tools can be called mid-response for dynamic data

The Complete RAG Pipeline

User Query

Embed Query

Vector Search

Retrieve Context

Generate Response

Stream to User

This pattern is known as Retrieval-Augmented Generation (RAG)

← Quick Start Website Crawling →