Loader
Before you can start indexing your documents, you need to load them into memory.
SimpleDirectoryReader
LlamaIndex.TS supports easy loading of files from folders using the SimpleDirectoryReader
class.
It is a simple reader that reads all files from a directory and its subdirectories.
import { SimpleDirectoryReader } from "llamaindex/readers/SimpleDirectoryReader";
// or
// import { SimpleDirectoryReader } from 'llamaindex'
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData("../data");
documents.forEach((doc) => {
console.log(`document (${doc.id_}):`, doc.getText());
});
Currently, it supports reading .txt
, .pdf
, .csv
, .md
, .docx
, .htm
, .html
, .jpg
, .jpeg
, .png
and .gif
files, but support for other file types is planned.
You can modify the reader three different ways:
overrideReader
overrides the reader for all file types, including unsupported ones.fileExtToReader
maps a reader to a specific file type. Can override reader for existing file types or add support for new file types.defaultReader
sets a fallback reader for files with unsupported extensions. By default it isTextFileReader
.
SimpleDirectoryReader supports up to 9 concurrent requests. Use the numWorkers
option to set the number of concurrent requests. By default it runs in sequential mode, i.e. set to 1.
Example
import type { Document, Metadata } from "llamaindex";
import { FileReader } from "llamaindex";
import {
FILE_EXT_TO_READER,
SimpleDirectoryReader,
} from "llamaindex/readers/SimpleDirectoryReader";
import { TextFileReader } from "llamaindex/readers/TextFileReader";
class ZipReader extends FileReader {
loadDataAsContent(fileContent: Uint8Array): Promise<Document<Metadata>[]> {
throw new Error("Implement me");
}
}
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData({
directoryPath: "../data",
defaultReader: new TextFileReader(),
fileExtToReader: {
...FILE_EXT_TO_READER,
zip: new ZipReader(),
},
});
documents.forEach((doc) => {
console.log(`document (${doc.id_}):`, doc.getText());
});