BareGit
# CommonMark Implementation Design

## 1. Overview
This design document details the technical strategy for implementing the remaining core CommonMark features in the MacroDown library. Based on the CommonMark specification and the requirements listed in the CommonMark Help page, the currently missing features are **Lists (Ordered and Unordered)**, **Thematic Breaks (Horizontal Rules)**, and **Indented Code Blocks**. 

This document provides step-by-step guidance on how to extend the existing parsing pipeline (`BlockParser`, `Converter`, and `StandardLibrary`) to fully support these constructs. The focus is strictly on extending the two-phase parsing strategy (Block Phase followed by Inline Phase/Macro Conversion) while adhering to the extreme detail requirements for inexperienced implementers.

## 2. Lists (Unordered and Ordered)

Lists in CommonMark are container blocks that contain List Item blocks. A list is formed when consecutive list items of the same type (unordered vs ordered) are parsed.

### 2.1 Extending the `Block` Data Structure
To represent the specific properties of a list (whether it's ordered or unordered, and the starting number for ordered lists), we need to add fields to the `Block` struct located in `include/block.h`.

**Changes to `struct Block`:**
```cpp
    // Add to include/block.h inside struct Block:
    char list_char = 0;      // Marker character: '-', '+', '*', '.' or ')'
    int list_start = 1;      // Starting number for ordered lists (default 1)
    bool is_ordered = false; // True if it's an ordered list (1., 2.), false for unordered (*, -, +)
    size_t indent = 0;       // Indentation level required for child content
```
*Explanation*: 
- `list_char`: Records the exact character used to start the list. For unordered lists, it is the bullet (`*`, `-`, `+`). For ordered lists, it is the punctuation mark after the number (`.` or `)`).
- `list_start`: CommonMark allows ordered lists to start at an arbitrary number (e.g., `2. Item`). We need to store this so the HTML `<ol start="2">` can be rendered.
- `indent`: List items define an indentation context for their children. If a list item starts with `-   ` (dash and 3 spaces), any child blocks (like nested paragraphs) must be indented by 4 spaces.

### 2.2 Extending `BlockParser::matches`
The `matches` method is responsible for determining if an open block in our stack can accept the current line.

**Logic for `BlockType::List` and `BlockType::ListItem`:**
- A `List` block always matches as long as it has at least one open `ListItem` child.
- A `ListItem` block matches if the current line's indentation is strictly greater than or equal to the `ListItem`'s base `indent`. If it matches, we "consume" that indentation by incrementing the `offset`. If the line is completely blank, it also matches an open `ListItem` (as list items can contain blank lines).

### 2.3 Extending `BlockParser::process_line`
When a line doesn't match existing blocks, we check if it opens a new block. We need to add logic to detect List Items.

**Step-by-step algorithm to detect a List Item:**
1. Determine the current line's indentation. If it's 4 or more spaces, it might be an indented code block, not a new list item (unless we are already inside a list).
2. Look for an unordered list marker: a single `*`, `-`, or `+` followed by a space (or the end of the line).
3. Look for an ordered list marker: a sequence of 1 to 9 digits, followed by `.` or `)`, followed by a space (or the end of the line).
4. **Calculations**:
    - Let `marker_length` be the length of the matched marker (e.g., 2 for `- `, 3 for `1. `).
    - Determine the number of spaces following the marker. Let this be `spaces`. If `spaces` > 4, CommonMark says it should be treated as 1 space.
    - The new block's `indent` equals the line's starting indentation + `marker_length` + `spaces`.
5. **List vs ListItem Creation**:
    - If the current open block (the "tip") is NOT a `List`, OR it is a `List` but the marker type (`list_char` or `is_ordered`) differs from the tip, we must create a new `List` container block first.
    - Append the new `List` to the current tip, then open it.
    - Then, create a `ListItem` block, set its `indent`, `list_char`, and `is_ordered` properties, append it to the `List`, and open it.

### 2.4 Converter and Standard Library
The `Converter` maps AST blocks to Macro AST nodes.
1. **Converter (`src/converter.cpp`)**:
    - In `convert_block`, when `block->type == BlockType::List`:
      - Determine the macro name: `"ol"` if `block->is_ordered`, otherwise `"ul"`.
      - Convert all child `ListItem` blocks and pack them into a `Group` node as the single argument to the macro.
    - When `block->type == BlockType::ListItem`:
      - The macro name is `"li"`.
      - Convert its children and pack them into the argument.
2. **Standard Library (`src/standard_library.cpp`)**:
    - Define `%ul{content}`: `<ul
>%content</ul>
`
    - Define `%ol{content}`: `<ol
>%content</ol>
` *(Note: To support `start`, you might need to change `Converter` to pass `start` as a first argument to `ol`, but for simplicity, a basic `%ol` is sufficient for standard unordered lists, or we can define intrinsic `%ol` that takes start number if we want 100% compliance).*
    - Define `%li{content}`: `<li>
%content</li>
`

## 3. Thematic Breaks (Horizontal Rules)

A thematic break consists of three or more matching characters (`-`, `_`, or `*`) with optional spaces between them.

### 3.1 BlockParser Logic
In `BlockParser::process_line`, before checking for paragraphs or headings, we check for a thematic break.
1. Check the line's initial indentation. If it's >= 4 spaces, it cannot be a thematic break.
2. Scan the line starting from the non-space character.
3. If the character is `-`, `_`, or `*`, count how many times it appears.
4. Allow spaces between the characters. If any other character is encountered, it is NOT a thematic break.
5. If the count of the marker character is >= 3, and the rest of the line contains only spaces, it IS a thematic break.
6. **Action**: Close the current paragraph (if open). Create a new `Block` of type `BlockType::ThematicBreak`, add it to the parent, and immediately close it (set `open = false` since it cannot contain children or multiline content).

### 3.2 Converter and Standard Library
1. **Converter**: 
   - When `block->type == BlockType::ThematicBreak`: Macro name is `"hr"`. It takes no arguments.
2. **Standard Library**:
   - Define `%hr{}`: `<hr />
`

## 4. Indented Code Blocks

Indented code blocks are triggered by lines indented by 4 or more spaces.

### 4.1 BlockParser Logic
In `BlockParser::process_line`:
1. Check if the line has an indentation of 4 or more spaces.
2. Ensure the current tip is NOT a paragraph. (An indented line cannot interrupt a paragraph in CommonMark).
3. If not in a paragraph, create a new `BlockType::IndentedCode` block.
4. The literal content of this block is the line's content *after* stripping exactly 4 spaces.
5. In `matches`: 
   - An `IndentedCode` block matches if the line has >= 4 spaces. We consume exactly 4 spaces and add the rest to `literal_content` (with a newline).
   - It also matches blank lines (which are added as empty lines to the code block), provided a non-blank indented line follows eventually. (For simplicity, CommonMark allows trailing blank lines to be stripped later).

### 4.2 Converter and Standard Library
1. **Converter**: 
   - When `block->type == BlockType::IndentedCode`: Map it to the `"fenced_code"` macro or a dedicated `"code_block"` macro. Since `fenced_code` already exists and takes two arguments (info string and content), we can reuse it by passing an empty string `""` as the first argument, and `literal_content` as the second argument.
2. **Standard Library**: 
   - The intrinsic `fenced_code` macro already handles an empty info string correctly by emitting `<pre><code>%content
</code></pre>`.

## 5. Summary of Implementation Steps

To implement these features successfully, an engineer should follow this exact sequence:
1. Modify `include/block.h` to add `list_char`, `is_ordered`, and `indent` to `struct Block`.
2. Update `src/block_parser.cpp` -> `matches()` to correctly handle `List` and `ListItem` continuation.
3. Update `src/block_parser.cpp` -> `process_line()` to parse:
   - Thematic Breaks (`---`, `***`, `___`).
   - Indented Code Blocks (4+ spaces).
   - List markers for Unordered and Ordered lists.
4. Update `src/converter.cpp` -> `convert_block()` to support `BlockType::List`, `BlockType::ListItem`, `BlockType::ThematicBreak`, and `BlockType::IndentedCode`. Map them to their respective macros (`ul`, `ol`, `li`, `hr`, and `fenced_code`).
5. Update `src/standard_library.cpp` -> `registerMacros()` to add definitions for `ul`, `ol`, `li`, and `hr`.
6. Add comprehensive unit tests in `tests/test_block_parser.cpp` to verify parsing accuracy against standard CommonMark list, hr, and indented code snippets.