json-indexer

JsonIndexer is a utility class designed for efficient parsing and indexing of large JSON files. It reads the file in chunks to minimize memory usage, identifies specific arrays or objects by key, and builds an index for quick access. This index includes m

json-indexer

json-indexer is a TypeScript utility for efficient indexing of large JSON files. It allows you to parse files incrementally, minimizing memory usage while building a structured index for quick access to objects. This is particularly useful for scenarios where you need to work with massive JSON files containing arrays of objects.

Features

  • Efficient Parsing: Reads JSON files in chunks to handle large files without loading the entire content into memory.
  • Customizable Indexing: Allows you to define additional keys to include in the index.
  • Scalable: Suitable for large-scale data processing.
  • Type-Safe: Leverages TypeScript for strong typing and compile-time safety.

Installation

Install the package via npm:

npm install json-indexer

Usage

Example

Suppose you have a large JSON file (data.json) with the following structure:

{
  "shoes": [
    { "id": "1", "name": "Nike Air", "size": 42, "color": "black" },
    { "id": "2", "name": "Adidas Boost", "size": 43, "color": "white" },
    ...
  ]
}

You can use json-indexer to parse and index the shoes array like this:

import { JsonIndexer } from 'json-indexer';

// Your data type
interface Shoe {
    id: string;
    name: string;
    size: number;
    color: string;
}

// The resulting indexed data type
interface ShoeMetadata {
    // id, filePosition, and length are required
    id: string;
    filePosition: number;
    length: number;

    // Extra keys that should be added to the index
    name: string;
    size: number;
}

// Assume `file` is a File object representing your JSON file
const file = new File([/* file content */], "data.json", { 
  type: "application/json" 
});

// Create an instance of JsonIndexer
const indexer = new JsonIndexer(file);

// Build the index with additional properties
const shoeIndex = await indexer.index<ShoeMetadata>("shoes", ["name", "size"]);
/**
 * Output:
 * Map {
 *   "1" => { 
 *     id: "1",
 *     filePosition: 123,
 *     length: 456,
 *     name: "Nike Air",
 *     size: 42,
 *   },
 *   "2" => { ... }
 * }
 **/

// Subsequent lookups
const metadata = shoeIndex.get('1');
if (metadata) {
    const chunk = file.slice(
        metadata.filePosition,
        metadata.filePosition + metadata.length
    );
    const record = JSON.parse(await chunk.text());
}

API Reference

JsonIndexer

A class for indexing JSON files.

Constructor

constructor(file: File, chunkSize = 1024 * 1024)
  • file (File): The JSON file to index.
  • chunkSize (number, optional): Size of each chunk read from the file (default: 1 MB).

Methods

index<T>

async index<T extends { id: string, filePosition: number, length: number }>(
    key: string,
    additionalIndexKeys: Array<RequiredAdditionalKeys<T>> = []
): Promise<Map<string, T>>
  • Generic type T must extend the base type containing id, filePosition, and length.
  • key (string): The key of the array to index (e.g., "shoes").
  • additionalIndexKeys (Array<keyof T>): Keys to include in the index, beyond the base requirements.
  • Returns a Promise resolving to a Map where the keys are the id values of the indexed objects, and the values are the indexed objects with metadata

Benefits

  • Memory Efficient: Processes the file in chunks, avoiding high memory usage.
  • Incremental Parsing: Supports working with large files incrementally.
  • Customizable Metadata: Add aditional fields to the index for detailed object representation.
  • Flexible Type System: Generic type parameters at the method level for improved type safety and reusability

Error Handling

If you forget to include all required keys in additionalIndexKeys, the index() method will throw an error:

// This will throw an error because 'name' is required by the ShoeMetadata type
const index = await indexer.index<ShoeMetadata>("shoes", []);
// Error: Missing keys in additionalIndexKeys: name

License

This project is licensed under the MIT License.