Skip to content

html-to-markdown

Convert HTML to Markdown, Djot, or plain text. One Rust core, 12 language bindings, identical output on every runtime. Part of the kreuzberg.dev document intelligence ecosystem.


Why html-to-markdown

  • Rust core

Single-pass DOM walk written in Rust. The same code path runs from Python, the browser, and the CLI — no per-language conversion logic.

  • 12 bindings

Rust, Python, TypeScript, Go, Ruby, PHP, Java, C#, Elixir, R, C, and WebAssembly. One option name maps to one option name in every language.

  • Three output formats

Markdown (CommonMark) by default, plus Djot and plain text via output_format. The same options apply to every format.

  • Metadata extraction

Document title, Open Graph, Twitter Card, JSON-LD, links, and images in one pass. Enabled by default — disable with extract_metadata: false.

  • Table extraction

HTML tables into result.tables with structured cells, row/column spans, and header flags, alongside the rendered Markdown.

  • Visitor pattern

42 element-level callbacks on the HtmlVisitor trait to skip, replace, or preserve any node. Zero cost when unused.


Language Support

Language Install API Reference
Rust cargo add html-to-markdown-rs Reference
Python pip install html-to-markdown Reference
TypeScript / Node npm install @kreuzberg/html-to-markdown Reference
Go go get github.com/kreuzberg-dev/html-to-markdown/packages/go/v3 Reference
Ruby gem install html-to-markdown Reference
PHP composer require kreuzberg-dev/html-to-markdown Reference
Java Maven dev.kreuzberg:html-to-markdown Reference
C# dotnet add package KreuzbergDev.HtmlToMarkdown Reference
Elixir {:html_to_markdown, "~> 3.4"} Reference
R install.packages("htmltomarkdown") Reference
C (FFI) Shared library + header Reference
WebAssembly npm install @kreuzberg/html-to-markdown-wasm Reference
CLI cargo install html-to-markdown-cli CLI Guide

Explore the Docs

  • Get Started

Install a binding and run your first convert() call.

Installation

  • Guides

Visitor pattern, table extraction, error handling.

Visitor pattern

  • Concepts

Architecture, conversion pipeline, plugin system.

Architecture

  • Reference

Options reference, generated API docs, per-language guides.

Configuration

  • CLI

Every conversion option as a command-line flag.

CLI Guide

  • Migration

Upgrading from earlier versions.

Migration


Part of kreuzberg.dev

html-to-markdown ships as a standalone library and as the HTML pipeline inside the Kreuzberg document intelligence stack.

Document intelligence core — text, tables, and metadata from 91+ file formats. Uses html-to-markdown for every HTML input.

Managed SaaS API for document extraction. Same engine, no infrastructure.

Web crawler that pairs with html-to-markdown for crawl-then-convert pipelines.

Universal LLM client — feed it the Markdown out of html-to-markdown.

306 Tree-sitter grammars on demand for code intelligence.

Community chat for kreuzberg.dev users and contributors.


Getting Help

Edit this page on GitHub