Skip to content

CLI

The html-to-markdown CLI converts HTML files or URLs to Markdown from the command line.

Installation

cargo install html-to-markdown-cli

Or via Homebrew:

brew install kreuzberg-dev/tap/html-to-markdown

Basic Usage

# Convert stdin
echo '<h1>Title</h1><p>Content</p>' | html-to-markdown

# Convert a file
html-to-markdown input.html

# Convert a file and save output
html-to-markdown input.html -o output.md

# Fetch and convert a remote URL
html-to-markdown --url https://example.com > output.md

Input

Flag Description
FILE Input HTML file. Use - or omit for stdin.
--url URL Fetch HTML from a URL (alternative to file/stdin).
--user-agent UA Custom User-Agent header when using --url.
-e, --encoding ENCODING Input character encoding (default: utf-8).

Output

Flag Description
-o, --output FILE Write output to file (default: stdout).
-f, --output-format FORMAT Output format: markdown (default) or djot.

Heading Options

Flag Values Default Description
--heading-style STYLE atx, underlined, atx-closed atx How headings are formatted. atx uses # prefixes; underlined uses ===/--- for h1/h2.

List Options

Flag Values Default Description
--list-indent-type TYPE spaces, tab spaces Indentation character for nested lists.
--list-indent-width N 1–8 2 Spaces per nesting level.
-b, --bullets CHARS e.g. *+- - Characters to cycle through for unordered list markers.

Text Formatting

Flag Values Default Description
--strong-em-symbol CHAR *, _ * Symbol for bold and italic.
--newline-style STYLE backslash, spaces backslash How <br> tags are rendered.
--sub-symbol SYMBOL any string "" Wrapper symbol for <sub> text.
--sup-symbol SYMBOL any string "" Wrapper symbol for <sup> text.
--highlight-style STYLE double-equal, html, bold, none double-equal Rendering of <mark> elements.
--escape-asterisks off Escape * characters.
--escape-underscores off Escape _ characters.
--escape-misc off Escape [, ], <, >, #, etc.
--escape-ascii off Escape all ASCII punctuation (strict CommonMark).

Code Blocks

Flag Values Default Description
--code-block-style STYLE indented, backticks, tildes indented Format for multi-line code blocks.
-l, --code-language LANG any string "" Default language tag for fenced code blocks.
Flag Values Default Description
--no-autolinks off Disable autolink conversion. By default, when link text equals the href, the output is <url>. Pass this flag to emit [url](url) instead.
--default-title off Use href as link title when no title attribute exists.
--link-style STYLE inline, reference inline inline emits [text](url). reference emits [text][1] with numbered definitions at the end of the document.

Images

Flag Description
--keep-inline-images-in ELEMENTS Comma-separated element names where images stay as ![alt](src) (e.g. a,strong).
--skip-images Drop all <img> elements entirely. No ![alt](src) output, no alt-text fallback.

Tables

Flag Description
--br-in-tables Preserve line breaks in table cells as <br> rather than converting to spaces.

Whitespace

Flag Values Default Description
--whitespace-mode MODE normalized, strict normalized Whitespace handling strategy.
--strip-newlines off Remove all newlines from input before processing.

Wrapping

Flag Values Default Description
-w, --wrap off Enable output line wrapping.
--wrap-width N 20–500 80 Column width for wrapping.

Element Handling

Flag Description
--convert-as-inline Treat block elements as inline (no paragraph breaks).
--strip-tags TAGS Comma-separated tags to strip (text content preserved, no Markdown conversion).
--preserve-tags TAGS Comma-separated tags to emit verbatim as raw HTML instead of converting.
--max-depth N Silently truncate subtrees beyond this DOM nesting depth. Omit for unlimited depth.

Metadata

Flag Description
--extract-metadata Prepend a metadata comment block (title, description, Open Graph, links, images) to the Markdown output. In --json mode, populates the metadata field.

JSON Output

--json swaps the default Markdown output for a full ConversionResult object: content, metadata, tables, document, images, and warnings on a single JSON value. The flags in this section control which fields are populated.

Flag Description
--json Output a full ConversionResult as JSON instead of Markdown.
--include-structure Populate document with the parsed semantic tree. Requires --json.
--extract-inline-images Populate images with extracted data URIs and SVGs. Requires --json.
--no-content Skip Markdown rendering. content is empty, metadata and structure still populate. Requires --json.
--show-warnings Print each processing warning to stderr as Warning [<kind>]: <message>. Works with or without --json.

Preprocessing

Flag Values Default Description
-p, --preprocess off Clean up HTML before conversion.
--preset LEVEL minimal, standard, aggressive standard Preprocessing aggressiveness (requires --preprocess).
--keep-navigation off Preserve <nav> elements during preprocessing (requires --preprocess).
--keep-forms off Preserve form elements during preprocessing (requires --preprocess).

Debugging

Flag Description
--debug Output diagnostic warnings and information.

Shell Completions and Man Page

# Generate shell completions
html-to-markdown --generate-completion bash > html-to-markdown.bash
html-to-markdown --generate-completion zsh > _html-to-markdown
html-to-markdown --generate-completion fish > html-to-markdown.fish

# Generate man page
html-to-markdown --generate-man > html-to-markdown.1

Examples

# Web scraping with aggressive preprocessing
html-to-markdown page.html --preprocess --preset aggressive

# Extract full structured result as JSON
html-to-markdown input.html --json \
    --extract-metadata --include-structure \
    -o output.json

# Discord/Slack-friendly output (2-space list indents)
html-to-markdown input.html --list-indent-width 2

# Custom heading and list styles
html-to-markdown input.html \
    --heading-style atx \
    --bullets '*' \
    --list-indent-width 2

# Fetch and convert with Djot output
html-to-markdown --url https://example.com --output-format djot

CLI Flag ↔ ConversionOptions Mapping

Each CLI flag has a corresponding ConversionOptions field. Library users can cross-reference here when translating a CLI invocation to code (or vice versa).

CLI Flag ConversionOptions field Notes
--output-format FORMAT output_format "markdown" | "djot" | "plain" | "none"
--heading-style STYLE heading_style "atx" | "underlined" | "atx-closed"
--list-indent-type TYPE list_indent_type "spaces" | "tab"
--list-indent-width N list_indent_width integer
--bullets CHARS bullets string
--strong-em-symbol CHAR strong_em_symbol "*" | "_"
--newline-style STYLE newline_style "backslash" | "spaces"
--sub-symbol SYMBOL sub_symbol string
--sup-symbol SYMBOL sup_symbol string
--highlight-style STYLE highlight_style "double-equal" | "html" | "bold" | "none"
--escape-asterisks escape_asterisks boolean flag
--escape-underscores escape_underscores boolean flag
--escape-misc escape_misc boolean flag
--escape-ascii escape_ascii boolean flag
--code-block-style STYLE code_block_style "indented" | "backticks" | "tildes"
-l, --code-language LANG code_language string
--no-autolinks autolinks inverted: flag sets autolinks = false; default is true
--default-title default_title boolean flag
--link-style STYLE link_style "inline" | "reference"
--keep-inline-images-in ELEMS keep_inline_images_in comma-separated tag list
--skip-images skip_images boolean flag
--br-in-tables br_in_tables boolean flag
--whitespace-mode MODE whitespace_mode "normalized" | "strict"
--strip-newlines strip_newlines boolean flag
-w, --wrap wrap boolean flag
--wrap-width N wrap_width integer
--convert-as-inline convert_as_inline boolean flag
--strip-tags TAGS strip_tags comma-separated tag list
--preserve-tags TAGS preserve_tags comma-separated tag list
--max-depth N max_depth integer
--extract-metadata extract_metadata boolean flag
--include-structure include_document_structure boolean flag; --json only
--extract-inline-images extract_images boolean flag; --json only
-p, --preprocess preprocess boolean flag
--preset LEVEL preset "minimal" | "standard" | "aggressive"
--keep-navigation keep_navigation boolean flag
--keep-forms keep_forms boolean flag
-e, --encoding ENCODING encoding CLI only — decoded before convert()
--debug debug CLI only — diagnostic output to stderr

Flags without a ConversionOptions counterpart: FILE, --url, --user-agent, -o/--output, --json, --no-content, --show-warnings, --generate-completion, --generate-man.


Found a bug or mistake on this page?

If something here is wrong or out of date, open an issue on GitHub or contribute a fix via pull request.

Edit this page on GitHub