MultiLevel Pattern Miner
A modular Python engine for mining repeated structural patterns from unstructured text and turning them into LLM-ready specs, validation bundles, and review dashboards.
MultiLevel Pattern Miner identifies recurring document structures across six hierarchical levels, from phrase to document, in Markdown and plain text. It mines those patterns into reusable libraries, compiles them into machine-readable specifications, and supports downstream validation and dashboard-based review of new drafts.
Key Features
- Mines recurring structural patterns across phrase, line, paragraph, chunk, section, and document levels
- Supports Markdown and plain-text parsing with heading, list, and table awareness
- Exports mined libraries as YAML or JSON with source locations and example excerpts
- Compiles mined patterns into LLM-ready bridge specs for templates, prompts, and validation
- Validates draft content against compiled structural, lexical, and policy checks
- Renders static HTML dashboards for batch review, filtering, and CSV export
Status: Active
Knowledge Platform
A local-first, modular Python desktop application for structured knowledge work, centered on typed graphs, runtime-loadable modules, and SQLite-backed workspaces.
Knowledge Platform treats the graph as the primary artifact instead of files or folders. It provides a typed graph engine, SQLite persistence, and a config-driven plugin architecture that lets modules contribute graph types, services, projections, and UI widgets. The current alpha release ships with an Outline module for building hierarchical document structures on top of the shared platform.
Key Features
- Uses typed graph schemas to validate nodes, edges, and attributes before writes are persisted
- Stores workspaces locally in SQLite with no cloud dependency or server requirement
- Loads modules from configuration or entry points rather than hard-wiring them into the application shell
- Separates UI, services, core graph logic, domain schemas, and persistence behind explicit contracts
- Includes an Outline module with tree projection, ordered parent-child structure, inline editing, and context actions
- Provides a tested Python package with desktop UI, developer documentation, and module-extension guidance
Status: Alpha
Open Knowledge Systems
A technical manual and Python reference implementation for turning Markdown-based content into structured, machine-readable knowledge assets for validation, graph projection, and publication workflows.
Open Knowledge Systems combines a design-level manuscript with a working ETL slice for schema-aware content pipelines. The repository focuses on normalizing Markdown chapters with YAML front matter into stable chunk records, projecting those records into graph-friendly JSON for Neo4j-oriented workflows, and publishing the surrounding architecture and implementation guidance as a documentation site and book-style outputs.
Key Features
- Publishes a technical manual covering content modeling, ETL, graph design, retrieval, validation, and publication architecture
- Parses Markdown files with YAML front matter into normalized chunk records based on level-two heading boundaries
- Generates deterministic IDs, document slugs, source metadata, and ordered section records for downstream processing
- Builds graph-oriented JSON projections with chapter, topic, author, schema, and relationship data for Neo4j-style loading
- Ships a CLI for corpus transformation into chunk and graph outputs
- Supports documentation and book publishing with MkDocs, notebooks, EPUB, and PDF-oriented build scripts
Status: Active
DITA Package Processor
A deterministic, modular Python pipeline for analyzing DITA packages, generating validated migration plans, and executing them safely through bounded, plugin-driven handlers.
DITA Package Processor transforms DITA package migration into an explicit, contract-based workflow: discover -> normalize -> plan -> execute. It scans package structure and relationships, normalizes findings into planning contracts, generates deterministic action plans, and applies those plans through dry-run-first execution with optional explicit writes.
Key Features
- End-to-end deterministic pipeline for discovery, normalization, planning, and execution
- Read-only discovery of maps, topics, media, relationships, and classification evidence
- Contract-first planning with schema validation for reproducible, auditable plans
- Dry-run by default, with explicit
--applyrequired for filesystem mutation - Plugin architecture for custom discovery patterns, planning action emission, and execution handlers
- Execution/materialization reporting for preflight validation and post-run traceability
Status: Active
DITA ETL Pipeline
A composable Python ETL pipeline for converting Markdown, HTML, and DOCX source content into structured DITA 1.3 XML, with built-in assessment, format-specific extraction, typed stage contracts, and documentation-focused output assembly.
DITA ETL Pipeline processes mixed-format documentation through four modular stages: Assess, Extract, Transform, and Load. It evaluates source files up front, converts them into intermediate DocBook, classifies and transforms them into DITA topics, and assembles final output bundles including a DITA map, topics, assets, and assessment artifacts for review.
Key Features
- Runs a four-stage pipeline with explicit Assess, Extract, Transform, and Load boundaries
- Converts Markdown, HTML, and DOCX inputs into structured DITA 1.3 XML
- Uses typed, validated contracts between stages for predictable pipeline behavior
- Supports pluggable extractor strategies, including Pandoc-based and Oxygen-based DOCX handling
- Generates assessment outputs such as inventories, duplicate detection maps, conversion plans, and HTML reports
- Applies configurable topic-type classification rules and composes final
ditamap, topic, and asset output bundles
Status: Active
Markdown Validator
A rule-based Python validator for checking Markdown documents against declarative JSON rules, with support for front-matter policies, XPath-based body checks, workflows, and batch reporting.
Markdown Validator scans Markdown files used in static-site documentation workflows and evaluates them against reusable JSON rule sets. It validates YAML front matter and rendered document structure, supports conditional workflow chains, and can be used from both a CLI and a Python API for single-file checks, directory-wide scans, and CI validation gates.
Key Features
- Validates YAML front-matter metadata such as required keys, expected values, regex matches, and date freshness
- Evaluates rendered Markdown body content with XPath queries for headings, structure, text extraction, and node counts
- Supports declarative JSON rule sets with
RequiredandSuggestedlevels plus remediation messages - Runs optional workflow chains for conditional validation logic across multiple rule results
- Provides a CLI for single-file and recursive directory validation with text, JSON, and CSV output formats
- Includes an interactive REPL for developing and testing rules against loaded Markdown documents
- Exposes a Python API for embedding validation into other tooling and automation pipelines
Status: Active
Novel Testbed
A modular Python narrative compiler that segments prose, builds scene-level contracts, infers reader-state changes, and assesses whether a novel’s structure actually moves.
Novel Testbed treats fiction as a testable system. It turns raw or annotated Markdown into structured narrative modules, compiles those modules into YAML contracts, uses optional OpenAI-backed inference to populate reader-state transitions, and validates whether each scene, exposition block, or transition produces meaningful structural change.
Key Features
- Segments raw prose into compiler-friendly Markdown with chapter and module boundaries
- Parses annotated Markdown into scene, exposition, and transition modules with source fingerprints
- Generates blank YAML contracts for manual narrative specification and review
- Uses LLM-backed inference to populate
pre_state,post_state, and expected scene-level changes - Assesses contracts against narrative rules and exports machine-readable JSON reports
- Ships with a four-stage CLI workflow:
segment,parse,infer, andassess
Status: Active (Alpha)
Let’s Talk
Interested in contributing or adapting a solution for your own content workflow?