BareGit
# MacroDown Design Document

## 1. Overview
MacroDown is a C++ Markdown processor that extends the CommonMark syntax with a TeX-like macro system. The core philosophy is that **all** markup elements (headers, emphasis, links) are treated as syntactic sugar for macro calls. The processor parses the document into a tree of macro invocations and evaluates them using a standard library of macro definitions to produce HTML.

## 2. Architecture

The system operates in a linear pipeline:

```mermaid
graph TD
    Input[Source Text] --> BlockParser[Block Parsing (Phase 1)]
    BlockParser --> BlockTree[Block Tree]
    BlockTree --> InlineParser[Inline Parsing (Phase 2)]
    InlineParser --> MacroAST[Macro Syntax Tree]
    MacroAST --> Evaluator[Macro Evaluator]
    Evaluator --> HTML[Output HTML]
```

### 2.1 The Macro System
The core logic revolves around Macros.
*   **Definition Syntax**: `%def[name]{arg1, arg2, ...}{body}`
    *   Example: `%def[my_macro]{t1, t2}{It’s a %em{%t1} that is %t2.}`
*   **Call Syntax**: `%name{arg1}{arg2}...`
    *   Example: `%my_macro{test}{good}`
*   **Expansion**: The Evaluator recursively expands macros until only text remains.

### 2.2 CommonMark Parsing Strategy
We strictly follow the CommonMark "Appendix A" strategy:
1.  **Phase 1 (Block Structure)**: Analyze the document line-by-line to construct a tree of Blocks (Paragraphs, Lists, Blockquotes). This handles nesting and indentation.
2.  **Phase 2 (Inline Structure)**: Walk the Block tree and parse the text content of leaf blocks into Inline elements (Emphasis, Links, Code).
3.  **Transformation**: Convert the CommonMark Block/Inline tree into the unified **Macro AST**.
    *   `# Heading` $\rightarrow$ `%h1{Heading}`
    *   `*Bold*` $\rightarrow$ `%em{Bold}`

### 2.3 Custom Markups
The system supports user-defined custom markups that map to macros. The content of the markup is determined by a regular expression.
*   **Prefix Markup**: Starts with a specific character (e.g., `#tag`) and captures text matching a regex pattern. By default, it ends at a whitespace or punctuation boundary (except `_`, `-`, `@`, and `.`).
    *   Example: `#tag` $\rightarrow$ `%tag_macro{tag}`
*   **Delimited Markup**: Starts and ends with the same character (e.g., `:highlight:`) and captures text matching a regex pattern. No whitespace is allowed inside.
    *   Example: `:highlight:` $\rightarrow$ `%highlight_macro{highlight}`

## 3. Data Structures

### 3.1 AST Nodes
The final tree consists of a unified `Node` type using `std::variant` to hold different data types:

```cpp
struct Text {
    std::string content;
};

struct Macro {
    std::string name;
    std::vector<std::unique_ptr<Node>> arguments;
    bool is_special = false;
};

struct Group {
    std::vector<std::unique_ptr<Node>> children;
};

struct Node {
    using Data = std::variant<Text, Macro, Group>;
    Data data;

    // Call function on each node in the tree (pre-order traversal).
    // The callback function takes const Node& as an argument.
    template<typename Callback>
    void forEach(Callback f) const;
};
```

The `forEach` method provides a way to iterate over all nodes in the tree, including children of `Group` nodes and arguments of `Macro` nodes.


### 3.2 Block Structure (Intermediate)
During Phase 1, we use a structure mirroring CommonMark blocks to maintain state (open/closed blocks, list types).

```cpp
enum class BlockType {
    Document,
    Quote,
    List,
    ListItem,
    FencedCode,
    IndentedCode,
    HtmlBlock,
    Paragraph,
    Heading,
    ThematicBreak,
    // ... potentially others
};

struct Block {
    BlockType type;
    std::vector<std::unique_ptr<Block>> children; // For container blocks
    std::string literal_content; // For leaf blocks (raw text to be parsed later)
    int level = 0; // For headings
    // ... metadata for parsing state
};
```

### 3.3 Unicode Strategy
*   **Library**: Use [uni-algo](https://github.com/uni-algo/uni-algo) for all Unicode-related operations.
*   **Storage**: All text will be stored in `std::string` assuming **UTF-8** encoding.
*   **Operations**:
    *   **Iteration**: Use `uni::iter::utf8` for safe code point traversal.
    *   **Properties**: Use `uni::is_space` and `uni::is_punct` (or equivalent category checks) to comply with CommonMark's definitions of whitespace and punctuation.
    *   **Optimization**: Byte-by-byte scanning will still be used for performance when looking for ASCII-only delimiters (`%`, `{`, `}`, `[`, `]`).

### 3.4 Custom Markup Definitions
Users can define custom markups that are transformed into macros during the inline parsing phase.

```cpp
struct PrefixMarkup {
    std::string prefix;      // The trigger character(s)
    std::string macro_name;  // Target macro to transform into
    std::string pattern;     // Regex pattern for the marked-up text
};

struct DelimitedMarkup {
    std::string delimiter;   // The character used for start and end
    std::string macro_name;  // Target macro to transform into
    std::string pattern;     // Regex pattern for the content between delimiters
};
```

## 4. Component Design

### 4.0 Top-level Interface (`MacroDown`)
The `MacroDown` class provides a simplified two-step interface for rendering documents, as required by the specification.

*   **Step 1: Parse** (`parse`): Takes a source string and returns a single root `Node` (the syntax tree).
*   **Step 2: Render** (`render`): Takes the root `Node` and produces the final HTML string using the internal `Evaluator`.
*   **Configuration**: Allows defining custom markups via `definePrefixMarkup` and `defineDelimitedMarkup`.

It automatically initializes the standard library of macros.

### 4.1 Block Parser (`BlockParser`)
*   **Input**: Line iterator.
*   **Mechanism**:
    *   Maintains a stack of "Open" blocks.
    *   For each line, determines which open blocks match the line's indentation/markers.
    *   Closes unmatched blocks and opens new ones.
    *   Adds text to the currently open leaf block.
*   **Output**: Root `Block`.

### 4.2 Inline Parser (`InlineParser`)
*   **Input**: `literal_content` string from a Block.
*   **Mechanism**:
    *   Scans for delimiters (`*`, `_`, `[`, `` ` ``, `!`).
    *   **Crucially**: Scans for the Macro start character `%`.
    *   **Custom Markups**: Scans for user-defined prefix and delimited markups.
    *   Uses the "Delimiter Stack" algorithm from CommonMark spec to resolve emphasis nesting.
*   **Output**: Converts the block's text into a list of `Node`s (Text and Macro nodes).

### 4.3 Evaluator (`Evaluator`)
*   **Environment**: A map `std::map<std::string, MacroDefinition>`.
*   **Mechanism**:
    *   Traverses the `Node` tree.
    *   If it's a `Text` node, append to output.
    *   If it's a `Macro` node:
        *   Look up definition.
        *   Bind arguments.
        *   Parse the definition body (if it's a user macro) or execute C++ callback (if intrinsic).
        *   Recursively evaluate the result.
    *   If it's a `Group` node, recursively evaluate all children.

## 5. The Standard Library
The system will boot with a "Prelude" of defined macros to support Markdown features.

| Markdown Element | Macro Signature | HTML Expansion |
| :--- | :--- | :--- |
| Header 1 | `%h1{content}` | `<h1>%content</h1>` |
| Paragraph | `%p{content}` | `<p>%content</p>` |
| Emphasis | `%em{content}` | `<em>%content</em>` |
| Strong | `%strong{content}` | `<strong>%content</strong>` |
| Link | `%link{url}{text}` | `<a href="%url">%text</a>` |
| Image | `%img{url}{alt}` | `<img src="%url" alt="%alt" />` |
| List Item | `%li{content}` | `<li>%content</li>` |
| Unordered List | `%ul{content}` | `<ul>%content</ul>` |
| Code | `%code{content}` | `<code>%content</code>` |
| Blockquote | `%quote{content}` | `<blockquote>%content</blockquote>` |

## 6. Implementation Plan

### Phase 1: Core Setup
*   CMake build system.
*   `Node` class hierarchy.
*   Basic `Evaluator` for text-only nodes.

### Phase 2: Macro Engine
*   Implement `%def`.
*   Implement parsing of `%call{args}`.
*   Unit tests for macro expansion logic.

### Phase 3: Block Parsing
*   Implement the "Container Block" and "Leaf Block" logic.
*   Handle simple paragraphs and ATX headings (`#`).
*   Convert these Blocks into `%p` and `%h1` macros.

### Phase 4: Inline Parsing
*   Implement `InlineParser` to handle text.
*   Add support for `*em*` and `**strong**` mapping to macros.
*   Integrate `%macro` parsing within normal text.

### Phase 5: Standard Library & HTML
*   Implement the C++ callbacks or default definitions for the standard library macros.
*   Finalize `main.cpp` CLI.

## 7. Build System
*   **CMake**: 3.24+
## 8. Coding Conventions

*   **File Names**: `snake_case` (e.g., `macro_engine.h`, `block_parser.cpp`).
*   **Classes and Types**: `CapCase` (e.g., `Macro`, `BlockType`).
*   **Variables**: `snake_case` (e.g., `literal_content`, `is_special`).
*   **Global Constants**: `UPPER_CASE` (e.g., `MAX_RECURSION_DEPTH`).
*   **Functions**:
    *   `camelCase` for multi-word names (e.g., `evaluateMacro`).
    *   `lower case` for single-word names (e.g., `type()`, `evaluate()`).
*   **Indentation**: Indent by 4 spaces. Left brace in new line.
    (Unless there is nothing inside the brace.)
*   **Space before parenthesis: no space.
    Example:
    ```c++
    if(...)
    {
        ...;
    }

    void f() {}
    ```