html-to-markdown¶
Convert HTML to Markdown, Djot, or plain text. One Rust core, 12 language bindings, identical output on every runtime. Part of the kreuzberg.dev document intelligence ecosystem.
Why html-to-markdown¶
- Rust core
Single-pass DOM walk written in Rust. The same code path runs from Python, the browser, and the CLI — no per-language conversion logic.
- 12 bindings
Rust, Python, TypeScript, Go, Ruby, PHP, Java, C#, Elixir, R, C, and WebAssembly. One option name maps to one option name in every language.
- Three output formats
Markdown (CommonMark) by default, plus Djot and plain text via output_format. The same options apply to every format.
- Metadata extraction
Document title, Open Graph, Twitter Card, JSON-LD, links, and images in one pass. Enabled by default — disable with extract_metadata: false.
- Table extraction
HTML tables into result.tables with structured cells, row/column spans, and header flags, alongside the rendered Markdown.
- Visitor pattern
42 element-level callbacks on the HtmlVisitor trait to skip, replace, or preserve any node. Zero cost when unused.
Language Support¶
| Language | Install | API Reference |
|---|---|---|
| Rust | cargo add html-to-markdown-rs |
Reference |
| Python | pip install html-to-markdown |
Reference |
| TypeScript / Node | npm install @kreuzberg/html-to-markdown |
Reference |
| Go | go get github.com/kreuzberg-dev/html-to-markdown/packages/go/v3 |
Reference |
| Ruby | gem install html-to-markdown |
Reference |
| PHP | composer require kreuzberg-dev/html-to-markdown |
Reference |
| Java | Maven dev.kreuzberg:html-to-markdown |
Reference |
| C# | dotnet add package KreuzbergDev.HtmlToMarkdown |
Reference |
| Elixir | {:html_to_markdown, "~> 3.4"} |
Reference |
| R | install.packages("htmltomarkdown") |
Reference |
| C (FFI) | Shared library + header | Reference |
| WebAssembly | npm install @kreuzberg/html-to-markdown-wasm |
Reference |
| CLI | cargo install html-to-markdown-cli |
CLI Guide |
Explore the Docs¶
- Get Started
Install a binding and run your first convert() call.
- Guides
Visitor pattern, table extraction, error handling.
- Concepts
Architecture, conversion pipeline, plugin system.
- Reference
Options reference, generated API docs, per-language guides.
- CLI
Every conversion option as a command-line flag.
- Migration
Upgrading from earlier versions.
Part of kreuzberg.dev¶
html-to-markdown ships as a standalone library and as the HTML pipeline inside the Kreuzberg document intelligence stack.
Document intelligence core — text, tables, and metadata from 91+ file formats. Uses html-to-markdown for every HTML input.
Managed SaaS API for document extraction. Same engine, no infrastructure.
Web crawler that pairs with html-to-markdown for crawl-then-convert pipelines.
Universal LLM client — feed it the Markdown out of html-to-markdown.
306 Tree-sitter grammars on demand for code intelligence.
Community chat for kreuzberg.dev users and contributors.
Getting Help¶
- Bugs and feature requests — Open an issue on GitHub
- Community chat — Join the Discord
- Contributing — Read the contributor guide