Getting Started

How It Works

Understand the technology behind hej!'s AI-powered chat experience.

hej! uses a combination of web crawling, vector embeddings, and large language models to create intelligent chatbots that understand your website's content. Here's how each component works together.

1. Intelligent Website Crawling

When you provide your website URL, our crawler (powered by Playwright) visits your site and discovers all accessible pages. Unlike simple crawlers, our AI:

  • Intelligently selects pages - Uses an LLM to identify the most relevant pages for your knowledge base, avoiding duplicate content and boilerplate
  • Extracts clean content - Removes navigation, ads, and other noise to focus on the valuable information
  • Captures screenshots - Takes visual snapshots for context and verification
  • Respects rate limits - Crawls responsibly without overwhelming your server

2. Vector Embedding & Indexing

Each page's content is transformed into vector embeddings using OpenAI's embedding models. These vectors capture the semantic meaning of your content:

  • Text chunking - Content is split into optimal-sized chunks for accurate retrieval
  • Semantic embeddings - Each chunk is converted to a high-dimensional vector representing its meaning
  • ChromaDB storage - Vectors are stored in a purpose-built vector database for fast similarity search
  • Metadata preservation - Original text, URLs, and page titles are stored alongside embeddings

Example: How "pricing" queries work

"What are your prices?"
[0.23, -0.14, 0.87, ...]
pricing-page.html

3. Semantic Search & Retrieval

When a user asks a question, we find the most relevant content from your knowledge base:

  • Query embedding - The user's question is converted to a vector
  • Similarity search - We find chunks with the closest semantic meaning (not just keyword matching)
  • Context assembly - Top matching chunks are combined to form the retrieval context
  • Tool calling - For complex queries, the AI can invoke multiple searches and tools

4. AI Response Generation

The retrieved context is combined with your custom system prompt and sent to the LLM:

  • System prompt - Defines the AI's personality, knowledge boundaries, and behavior
  • Context injection - Retrieved content is provided as reference material
  • Streaming response - Answers are streamed in real-time for a responsive experience
  • Tool execution - Custom tools can be called mid-response for dynamic data

The Complete RAG Pipeline

User Query
Embed Query
Vector Search
Retrieve Context
Generate Response
Stream to User

This pattern is known as Retrieval-Augmented Generation (RAG)

Feedback

Route: /docs/how-it-works