Contributing¶
Thank you for your interest in contributing to html-to-markdown. This guide covers development setup, testing, code quality standards, and the pull request process.
Prerequisites¶
Core Development¶
- Rust 1.80+ (stable) -- install
- Python 3.10+ -- for Python bindings and scripts
- uv -- Python package manager (install)
- Task -- Task runner (install)
- prek -- Pre-commit hooks (
uv tool install prek)
Optional (Language-Specific)¶
- Node.js 18+ and pnpm 10+ -- for TypeScript/WASM bindings
- Ruby 3.2+ with bundler -- for Ruby gem
- PHP 8.4+ with Composer -- for PHP extension
- wasm-pack -- for WASM builds (
cargo install wasm-pack)
Development Setup¶
Clone the repository and run the setup task:
This will:
- Install Python dependencies with
uv sync - Build the Rust extension with maturin
- Install prek hooks for commit linting and code quality
Install pre-commit hooks:
Running Tests¶
All Tests¶
This runs both Rust and Python test suites.
Rust Tests¶
Or directly:
Python Tests¶
TypeScript / WASM Tests¶
For specific packages:
pnpm run test:node # NAPI-RS bindings
pnpm run test:wasm # WebAssembly bindings
pnpm run test:ts # TypeScript package
Ruby Tests¶
Coverage¶
This generates Rust and Python coverage reports in lcov format.
Coverage thresholds:
- Rust core: 95% minimum
- Python bindings: 80% minimum
- TypeScript bindings: 80% minimum
- Ruby bindings: 80% minimum
Code Quality¶
Formatting and Linting¶
Rust¶
- Formatting:
cargo fmt --all - Linting:
cargo clippy --workspace -- -D warnings(zero warnings enforced)
Python¶
- Formatting: ruff format (120 character line length)
- Linting: ruff check with strict rule set
- Type checking: mypy in strict mode
TypeScript¶
- Linting/Formatting: Biome
- Type checking: TypeScript 5.x in strict mode
Pre-commit Hooks¶
All Rust and Python checks run automatically on commit via prek. To run all hooks manually:
Use prek, not pre-commit
This project uses prek for pre-commit hooks, not the pre-commit tool. They are different tools -- make sure you have prek installed.
Benchmarking¶
Performance regressions greater than 5% will fail CI. Always run benchmarks before submitting performance-related changes.
Making Changes¶
Rust Core Changes¶
- Edit code in
crates/html-to-markdown/src/ - Run Rust tests:
task test:rust - Rebuild language bindings as needed:
- Python:
task build - Node.js:
cd crates/html-to-markdown-node && pnpm run build - WASM:
cd crates/html-to-markdown-wasm && pnpm run build:all
- Python:
- Run integration tests across affected bindings
Python API Changes¶
- Edit code in
packages/python/html_to_markdown/ - Update type stubs in
_rust.pyiif changing the API surface - Run tests:
task test:python
TypeScript / Node.js Changes¶
- Edit Rust code in
crates/html-to-markdown-node/src/lib.rs - Rebuild:
pnpm run build - Test:
pnpm test
Adding Tests¶
- Rust tests:
crates/*/src/lib.rsorcrates/*/tests/ - Python tests:
packages/python/tests/(pytest patterns) - TypeScript tests:
packages/typescript/tests/(vitest) - Ruby specs:
packages/ruby/spec/
Commit Guidelines¶
Commits must follow Conventional Commits:
feat: add support for definition lists
fix: handle nested blockquotes correctly
docs: update visitor pattern guide
refactor: simplify table cell parsing
test: add edge case tests for hOCR conversion
The prek commitlint hook enforces this format automatically.
Pull Request Process¶
- Fork the repository
- Create a feature branch:
git checkout -b feat/amazing-feature - Make your changes following the guidelines above
-
Run tests and linting:
-
Commit with conventional commit format
- Push and create a pull request
PR Checklist¶
- Tests pass (
task test) - Linting passes (
task lint) - New public APIs have documentation and examples
- Breaking changes include migration notes
- Coverage thresholds are maintained
CI Workflows¶
Pull requests trigger path-filtered CI workflows:
ci-rust-- Rust core tests and lintingci-python-- Python binding testsci-node-- Node.js binding testsci-wasm-- WebAssembly binding testsci-ruby-- Ruby binding testsci-php-- PHP binding testsci-go-- Go binding testsci-java-- Java binding testsci-elixir-- Elixir binding testsci-validate-- Cross-cutting validation checks
All relevant workflows must pass before merging.
Project Structure¶
html-to-markdown/
├── crates/ # Rust crates
│ ├── html-to-markdown/ # Core conversion library
│ ├── html-to-markdown-cli/ # CLI binary
│ ├── html-to-markdown-ffi/ # C FFI shared library
│ ├── html-to-markdown-node/ # NAPI-RS bindings (Node.js)
│ ├── html-to-markdown-wasm/ # wasm-bindgen (browsers)
│ ├── html-to-markdown-py/ # PyO3 bindings (Python)
│ └── html-to-markdown-php/ # ext-php-rs (PHP)
├── packages/ # Language-specific packages
│ ├── python/ # PyPI package
│ ├── typescript/ # npm TypeScript package
│ ├── ruby/ # RubyGems gem
│ ├── php/ # Composer package
│ ├── go/ # Go module
│ ├── java/ # Maven package
│ ├── csharp/ # NuGet package
│ ├── elixir/ # Hex package
│ └── r/ # R package
├── docs/ # Documentation (MkDocs)
├── examples/ # Runnable examples
├── scripts/ # Build and utility scripts
└── tools/ # Development tools (benchmark harness)
Getting Help¶
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Discord: Kreuzberg Community
License¶
By contributing, you agree that your contributions will be licensed under the MIT License.