Feature · @linda/parsers-*
Parse files in the browser.
PDFs, DOCXs, scans, audio. Everything parses client-side. The raw file never leaves the page; only the text you decide to send goes to the LLM.
Why browser-side?
Server-side file processing is a 700 MB Docker image, an S3 bucket, a multer endpoint, a Sharp dependency, and a privacy story you have to defend. Browser-side is none of that. Drop a PDF; pdf.js extracts text; the LLM answers; you log what you want.
The packages
| Package | Size | What it does |
|---|---|---|
@linda/parsers | ~3 KB gz | Core registry, text / JSON / CSV parsers. |
@linda/parsers-pdf | ~700 KB lazy | PDF text extraction via pdf.js. |
@linda/parsers-office | ~1.1 MB lazy | DOCX (mammoth) and XLSX (SheetJS). |
@linda/parsers-archive | ~30 KB lazy | ZIP unpack via fflate. |
@linda/parsers-ocr | ~2–10 MB lazy | Image-to-text via tesseract.js. |
@linda/parsers-ml | lazy | Embeddings + NER + summary via transformers.js. |
@linda/parsers-audio | ~40 MB lazy | Whisper (WebGPU, COI-gated). |
Each @linda/parsers-* package is a thin shim
(~1–3 KB gzipped). The heavy dep dynamic-imports inside parse(),
so it's only paid when actually used.
Capability gating
Each parser declares requires capabilities. Audio
(Whisper) needs WebGPU and a cross-origin-isolated context (COI).
On browsers without WebGPU, Linda silently skips registration —
no crashes, no errors.
Code
import "@linda/parsers-pdf"; // registers the PDF parser
// User drops a PDF; Linda parses it in-browser.
// /user/files/<id>/parsed/text.md is now readable by the LLM.
linda.on("onFileUpload", async ({ file, parsed }) => {
if (file.type === "application/pdf") {
console.log("Parsed:", parsed.text.length, "chars");
}
}); Output layout
/user/files/<id>/
├── info.md # human summary
├── meta.json # mime, size, hash, parser used
└── parsed/
├── text.md # extracted text
├── pages.jsonl # page-by-page (PDF)
├── tables.csv # tables found
└── ocr.txt # OCR output (if image) The model reads from the parsed artifacts. The raw bytes stay in browser memory; you can hash them, you can persist them via the durable-state hook, but they don't leave the page automatically.
🚀 See browser-side parsing live — drop a PDF and ask questions, all in the browser.
Ship an agent-driven flow this afternoon.
Install Linda, paste a config, and your form turns into an agent that fills its own inputs.