Basic Conversion¶

This guide walks through the fundamentals of converting HTML to Markdown with html-to-markdown. You will learn how to perform simple conversions, handle common HTML patterns, and apply basic options.

Quick Start¶

The simplest conversion takes an HTML string and returns Markdown:

PythonTypeScriptRustRubyPHPGoCElixir

from html_to_markdown import convert

html = "<h1>Hello</h1><p>This is <strong>fast</strong>!</p>"
markdown = convert(html)

import { convert } from '@kreuzberg/html-to-markdown';

const markdown: string = convert('<h1>Hello World</h1>');
console.log(markdown); // # Hello World

use html_to_markdown_rs::convert;

let html = "<h1>Hello</h1><p>This is <strong>fast</strong>!</p>";
let markdown = convert(html, None).unwrap();
// # Hello
//
// This is **fast**!

require 'html_to_markdown'

html = "<h1>Hello</h1><p>This is <strong>fast</strong>!</p>"
markdown = HtmlToMarkdown.convert(html)

use HtmlToMarkdown\Service\Converter;
use function HtmlToMarkdown\convert;

// Object-oriented usage
$converter = Converter::create();
$markdown = $converter->convert('<h1>Hello</h1><p>This is <strong>fast</strong>!</p>');

// Procedural helper
$markdown = convert('<h1>Hello</h1>');

package main

import (
    "fmt"
    "github.com/kreuzberg-dev/html-to-markdown/packages/go/v2/htmltomarkdown"
)

func main() {
    markdown, _ := htmltomarkdown.Convert("<h1>Hello</h1><p>World</p>")
    fmt.Println(markdown)
}

#include "html_to_markdown.h"
#include <stdio.h>

int main(void) {
    const char *html = "<h1>Hello</h1><p>World</p>";
    char *markdown = html_to_markdown_convert(html);
    if (markdown) {
        printf("%s\n", markdown);
        html_to_markdown_free_string(markdown);
    }
    return 0;
}

{:ok, markdown} = HtmlToMarkdown.convert("<h1>Hello</h1><p>World</p>")
IO.puts(markdown)

Converting HTML Fragments¶

You do not need to provide a complete HTML document. html-to-markdown handles fragments gracefully:

from html_to_markdown import convert

# Full document
convert("<html><body><h1>Title</h1></body></html>")

# Fragment -- works equally well
convert("<h1>Title</h1><p>Paragraph</p>")

# Single element
convert("<strong>Bold text</strong>")

# Plain text (no HTML tags) -- uses fast path
convert("Just plain text")

Common HTML Patterns¶

Headings¶

<h1>Main Title</h1>
<h2>Subtitle</h2>
<h3>Section</h3>

Converts to (with default ATX style):

# Main Title

## Subtitle

### Section

Paragraphs and Formatting¶

<p>This is a paragraph with <strong>bold</strong>, <em>italic</em>,
and <code>inline code</code>.</p>
<p>A second paragraph with a <a href="https://example.com">link</a>.</p>

Converts to:

This is a paragraph with **bold**, *italic*, and `inline code`.

A second paragraph with a [link](https://example.com).

Lists¶

<ul>
  <li>First item</li>
  <li>Second item
    <ul>
      <li>Nested item</li>
    </ul>
  </li>
</ul>
<ol>
  <li>Step one</li>
  <li>Step two</li>
</ol>

Converts to:

- First item
- Second item
  - Nested item

1. Step one
2. Step two

Code Blocks¶

<pre><code class="language-python">def hello():
    print("Hello, world!")
</code></pre>

Converts to:

```python
def hello():
    print("Hello, world!")
```

Tables¶

<table>
  <thead>
    <tr><th>Name</th><th>Language</th></tr>
  </thead>
  <tbody>
    <tr><td>PyO3</td><td>Python</td></tr>
    <tr><td>NAPI-RS</td><td>TypeScript</td></tr>
  </tbody>
</table>

Converts to:

| Name | Language |
| --- | --- |
| PyO3 | Python |
| NAPI-RS | TypeScript |

Images¶

<img src="photo.jpg" alt="A sunset" title="Beautiful sunset">

Converts to:

![A sunset](photo.jpg "Beautiful sunset")

Blockquotes¶

<blockquote>
  <p>To be or not to be, that is the question.</p>
  <p>-- Shakespeare</p>
</blockquote>

Converts to:

> To be or not to be, that is the question.
>
> -- Shakespeare

Using Options¶

Pass a configuration object to control output formatting:

PythonTypeScriptRustRubyPHP

from html_to_markdown import ConversionOptions, convert

html = "<h1>Hello</h1><p>This is <strong>formatted</strong> content.</p>"
options = ConversionOptions(
    heading_style="atx",
    list_indent_width=2,
)
markdown = convert(html, options)

import { convert, ConversionOptions } from '@kreuzberg/html-to-markdown';

const options: ConversionOptions = {
  headingStyle: 'atx',
  listIndentWidth: 2,
  wrap: true,
};

const markdown = convert('<h1>Title</h1><p>Content</p>', options);

use html_to_markdown_rs::{convert, ConversionOptions, HeadingStyle};

let options = ConversionOptions {
    heading_style: HeadingStyle::Atx,
    list_indent_width: 2,
    ..Default::default()
};
let markdown = convert(html, Some(options))?;

require 'html_to_markdown'

html = "<h1>Hello</h1><p>This is <strong>fast</strong>!</p>"
markdown = HtmlToMarkdown.convert(html, heading_style: :atx, code_block_style: :fenced)

use HtmlToMarkdown\Config\ConversionOptions;
use HtmlToMarkdown\Service\Converter;

$converter = Converter::create();

$options = new ConversionOptions(
    headingStyle: 'Atx',
    listIndentWidth: 2,
);

$markdown = $converter->convert('<h1>Hello</h1>', $options);

For a complete list of all configuration options, see the Configuration Options guide.

Error Handling¶

Conversion can fail if the input is invalid (binary data, PDF files, etc.). Always handle errors appropriately:

PythonTypeScriptRustGo

from html_to_markdown import convert, ConversionError

try:
    markdown = convert(html_input)
except ConversionError as e:
    print(f"Conversion failed: {e}")

import { convert } from '@kreuzberg/html-to-markdown';

try {
  const markdown = convert(htmlInput);
} catch (error) {
  console.error('Conversion failed:', error);
}

use html_to_markdown_rs::{convert, ConversionError};

match convert(html, None) {
    Ok(markdown) => println!("{}", markdown),
    Err(ConversionError::InvalidInput(msg)) => {
        eprintln!("Invalid input: {}", msg);
    }
    Err(e) => eprintln!("Error: {}", e),
}

markdown, err := htmltomarkdown.Convert(html)
if err != nil {
    log.Fatalf("Conversion failed: %v", err)
}

Binary input rejection

html-to-markdown validates input and rejects binary data such as PDF files, images, and other non-text content. If you receive an InvalidInput error, verify that your input is actually HTML text.

Tips and Best Practices¶

Let the library handle malformed HTML. html5ever implements browser-grade error recovery. You do not need to pre-clean HTML before passing it to html-to-markdown.
Use the default options first. The defaults produce clean, CommonMark-compatible Markdown. Only customize when you have a specific need.
Reuse option objects. If you are converting multiple documents with the same settings, create the options once and pass them to each call.
Fragment conversion is safe. You can pass <div> fragments, single elements, or even plain text. No <html> wrapper is needed.
Character encoding. The library expects UTF-8 input. If your source uses a different encoding, convert it to UTF-8 first. UTF-16 input is automatically detected and recovered.

Next Steps¶

Configuration Options -- customize every aspect of the conversion
Metadata Extraction -- extract titles, links, headers, and structured data
Visitor Pattern -- programmatic customization of element conversion
CLI Usage -- convert files from the command line