Table of Content

How to Develop a RAG System Using Node.js

Introduction to RAG

Building the RAG System in Node.js

What’s Next?

Conclusion

AI/ML Development

How to Develop a RAG System Using Node.js

June 23, 2025

“Don’t just prompt, empower your AI with context.”

If you’ve ever used ChatGPT and thought, “Hmm, I wish it could answer based on my data,” you’re not alone. That’s exactly where RAG (Retrieval-Augmented Generation) comes into play.

In this blog post, we’ll build a simplified but powerful RAG system using Node.js and OpenAI’s GPT model, perfect for developers curious to bridge the gap between raw LLM power and domain-specific intelligence.

Introduction to RAG

RAG is a technique that enhances the capabilities of large language models (LLMs) by retrieving external knowledge before generating a response. Rather than depending only on what the model already knows, RAG enhances it by pulling in up-to-date and relevant content while it runs, making responses more accurate and grounded.

Here’s how it typically works:

A user asks a question
The system retrieves relevant documents (from internal knowledge bases, wikis, etc.)
The LLM generates an answer based on the retrieved content

In short, it’s like giving the model access to a custom-built reference library — in real-time.

Prerequisites

To build a basic RAG system using Node.js and OpenAI, you’ll need:

Node.js installed (v14+ recommended)
An OpenAI API Key from platform.openai.com
A .env file to store your OpenAI API key and the port number of your server:

OPENAI_API_KEY=your_openai_key_here

PORT=3000

Node packages:
- express
- dotenv
- openai
RAM of the system should be at least 8GB as LLM calls and document processing can be memory-intensive

Building the RAG System in Node.js

We’ll now walk through the code step by step to show how the pieces of the RAG pipeline come together.

Step 1: Initiate a Node.js Project

Open your terminal, navigate to the directory where you want your project, and initialise a Node.js project by “npm init -y”. Also, install the required dependencies, ensure that under your project directory, there are the following files:

index.js
rag.js
.env

Step 2: Initialize OpenAI and Core RAG Functions

We’ll start by importing the openai module in the rag.js file and creating an instance of the OpenAI client, authenticated using the API key from .env.

The code below simulates the document retrieval step in RAG.

Normally, you would fetch relevant documents from a vector database (like Pinecone, Weaviate, or FAISS).

Here, for simplicity, it just returns an array of hardcoded mock documents relevant to the input query.

The above code takes the retrieved documents and combines them into a single string (context).

It constructs a structured prompt that clearly separates:

The context (background info),
The user question.

This format helps the language model understand that it should use the provided context to generate a more accurate answer.

This code sends the constructed prompt to the OpenAI Chat Completion API with the gpt-3.5-turbo model in the OpenAI Chat API format, where the role user describes the input as a user input and not as a system or assistant.

The max_tokens setting controls how long the AI's response can be. Here, we’ve capped it at 150 tokens to keep things concise and focused.

It returns the model’s reply, trimmed to remove any leading/trailing whitespace.

Step 3: Setup the Server and Create the API Endpoint

Now, back in the index.js, let's build our /ask endpoint that takes a question, runs it through the RAG pipeline (retrieve ➝ prompt ➝ generate), and returns the result to the user.

Import the installed modules, load the environment and import the three RAG functions from rag.js

Create an express app instance and parse the incoming request bodies as JSON so that it can read the req.body.question input.

This server endpoint will handle the post request and extract the question from the request body. Then it will call the RAG functions, which:

Gets simulated relevant documents based on the question.
Creates a prompt using those docs and the question.
Calls OpenAI to get a response based on that prompt.

And lastly, it sends the answer back to the client as a JSON response.

Step 4: Test Your RAG System

Now you can run the server and test it:

node index.js

Then send a request using curl, Postman, or your frontend:

You should get an answer like below, built with the help of retrieved documents.

{"answer": "RAG in AI stands for Retrieval-Augmented Generation. It is a model that combines the capabilities of information retrieval, question answering, and natural language generation to provide comprehensive answers to user queries."}

Hire Now!

Hire Node.js Developers Today!

Ready to bring your app vision to life? Start your journey with Zignuts expert iOS developers.

**Hire now**Hire Now**Hire Now**Hire now**Hire now

What’s Next?

You’ve just created a basic RAG system using Node.js and OpenAI. Here are a few ideas to level it up:

Real Vector Search: Replace the dummy retrieveDocuments with an actual search over embedded documents using Pinecone, Qdrant, or Weaviate.
Embeddings: Use openai.embeddings.create() to vectorise your documents.
Chunking: Break large documents into smaller, overlapping chunks before storing them.
Memory & History: Preserve user context over multiple interactions.
Rate Limiting & Auth: Add security to production-ready endpoints.

Conclusion

RAG is not just a buzzword; it’s a bridge between the static power of LLMs and the dynamic intelligence of your own data. With just a few lines of code, you’ve created an intelligent API that goes beyond raw GPT and responds with context-aware answers.

This setup is lightweight, beginner-friendly, and extensible, perfect for developers looking to experiment or prototype real-world AI applications.

So what’s next? Connect it to a real document store. Add embeddings. Scale it. Your API is no longer guessing, it’s learning from your world.

Need help building powerful AI-driven solutions?

At Zignuts, we specialize in creating intelligent, scalable AI systems, from RAG pipelines to full-stack applications. Whether you're building a prototype or scaling enterprise-grade tools, our experts are here to bring your ideas to life. Let’s build smarter together.

Deep Panchal

Passionate developer with expertise in building scalable web applications and solving complex problems. Loves exploring new technologies and sharing coding insights.