GitHub Solutions

MultiLevel Pattern Miner

A modular Python engine for mining repeated structural patterns from unstructured text and turning them into LLM-ready specs, validation bundles, and review dashboards.

MultiLevel Pattern Miner identifies recurring document structures across six hierarchical levels, from phrase to document, in Markdown and plain text. It mines those patterns into reusable libraries, compiles them into machine-readable specifications, and supports downstream validation and dashboard-based review of new drafts.

Key Features

Mines recurring structural patterns across phrase, line, paragraph, chunk, section, and document levels
Supports Markdown and plain-text parsing with heading, list, and table awareness
Exports mined libraries as YAML or JSON with source locations and example excerpts
Compiles mined patterns into LLM-ready bridge specs for templates, prompts, and validation
Validates draft content against compiled structural, lexical, and policy checks
Renders static HTML dashboards for batch review, filtering, and CSV export

Status: Active

Knowledge Platform

A local-first, modular Python desktop application for structured knowledge work, centered on typed graphs, runtime-loadable modules, and SQLite-backed workspaces.

Knowledge Platform treats the graph as the primary artifact instead of files or folders. It provides a typed graph engine, SQLite persistence, and a config-driven plugin architecture that lets modules contribute graph types, services, projections, and UI widgets. The current alpha release ships with an Outline module for building hierarchical document structures on top of the shared platform.

Key Features

Uses typed graph schemas to validate nodes, edges, and attributes before writes are persisted
Stores workspaces locally in SQLite with no cloud dependency or server requirement
Loads modules from configuration or entry points rather than hard-wiring them into the application shell
Separates UI, services, core graph logic, domain schemas, and persistence behind explicit contracts
Includes an Outline module with tree projection, ordered parent-child structure, inline editing, and context actions
Provides a tested Python package with desktop UI, developer documentation, and module-extension guidance

Status: Alpha

Open Knowledge Systems

A technical manual and Python reference implementation for turning Markdown-based content into structured, machine-readable knowledge assets for validation, graph projection, and publication workflows.

Code

Open Knowledge Systems combines a design-level manuscript with a working ETL slice for schema-aware content pipelines. The repository focuses on normalizing Markdown chapters with YAML front matter into stable chunk records, projecting those records into graph-friendly JSON for Neo4j-oriented workflows, and publishing the surrounding architecture and implementation guidance as a documentation site and book-style outputs.

Key Features

Publishes a technical manual covering content modeling, ETL, graph design, retrieval, validation, and publication architecture
Parses Markdown files with YAML front matter into normalized chunk records based on level-two heading boundaries
Generates deterministic IDs, document slugs, source metadata, and ordered section records for downstream processing
Builds graph-oriented JSON projections with chapter, topic, author, schema, and relationship data for Neo4j-style loading
Ships a CLI for corpus transformation into chunk and graph outputs
Supports documentation and book publishing with MkDocs, notebooks, EPUB, and PDF-oriented build scripts

Status: Active

DITA Package Processor

A deterministic, modular Python pipeline for analyzing DITA packages, generating validated migration plans, and executing them safely through bounded, plugin-driven handlers.

DITA Package Processor transforms DITA package migration into an explicit, contract-based workflow: discover -> normalize -> plan -> execute. It scans package structure and relationships, normalizes findings into planning contracts, generates deterministic action plans, and applies those plans through dry-run-first execution with optional explicit writes.

Key Features

End-to-end deterministic pipeline for discovery, normalization, planning, and execution
Read-only discovery of maps, topics, media, relationships, and classification evidence
Contract-first planning with schema validation for reproducible, auditable plans
Dry-run by default, with explicit --apply required for filesystem mutation
Plugin architecture for custom discovery patterns, planning action emission, and execution handlers
Execution/materialization reporting for preflight validation and post-run traceability

Status: Active

DITA ETL Pipeline

A composable Python ETL pipeline for converting Markdown, HTML, and DOCX source content into structured DITA 1.3 XML, with built-in assessment, format-specific extraction, typed stage contracts, and documentation-focused output assembly.

DITA ETL Pipeline processes mixed-format documentation through four modular stages: Assess, Extract, Transform, and Load. It evaluates source files up front, converts them into intermediate DocBook, classifies and transforms them into DITA topics, and assembles final output bundles including a DITA map, topics, assets, and assessment artifacts for review.

Key Features

Runs a four-stage pipeline with explicit Assess, Extract, Transform, and Load boundaries
Converts Markdown, HTML, and DOCX inputs into structured DITA 1.3 XML
Uses typed, validated contracts between stages for predictable pipeline behavior
Supports pluggable extractor strategies, including Pandoc-based and Oxygen-based DOCX handling
Generates assessment outputs such as inventories, duplicate detection maps, conversion plans, and HTML reports
Applies configurable topic-type classification rules and composes final ditamap, topic, and asset output bundles

Status: Active

Markdown Validator

A rule-based Python validator for checking Markdown documents against declarative JSON rules, with support for front-matter policies, XPath-based body checks, workflows, and batch reporting.

Markdown Validator scans Markdown files used in static-site documentation workflows and evaluates them against reusable JSON rule sets. It validates YAML front matter and rendered document structure, supports conditional workflow chains, and can be used from both a CLI and a Python API for single-file checks, directory-wide scans, and CI validation gates.

Key Features

Validates YAML front-matter metadata such as required keys, expected values, regex matches, and date freshness
Evaluates rendered Markdown body content with XPath queries for headings, structure, text extraction, and node counts
Supports declarative JSON rule sets with Required and Suggested levels plus remediation messages
Runs optional workflow chains for conditional validation logic across multiple rule results
Provides a CLI for single-file and recursive directory validation with text, JSON, and CSV output formats
Includes an interactive REPL for developing and testing rules against loaded Markdown documents
Exposes a Python API for embedding validation into other tooling and automation pipelines

Status: Active

Novel Testbed

A modular Python narrative compiler that segments prose, builds scene-level contracts, infers reader-state changes, and assesses whether a novel’s structure actually moves.

Novel Testbed treats fiction as a testable system. It turns raw or annotated Markdown into structured narrative modules, compiles those modules into YAML contracts, uses optional OpenAI-backed inference to populate reader-state transitions, and validates whether each scene, exposition block, or transition produces meaningful structural change.

Key Features

Segments raw prose into compiler-friendly Markdown with chapter and module boundaries
Parses annotated Markdown into scene, exposition, and transition modules with source fingerprints
Generates blank YAML contracts for manual narrative specification and review
Uses LLM-backed inference to populate pre_state, post_state, and expected scene-level changes
Assesses contracts against narrative rules and exports machine-readable JSON reports
Ships with a four-stage CLI workflow: segment, parse, infer, and assess

Status: Active (Alpha)

Let’s Talk

Interested in contributing or adapting a solution for your own content workflow?

Get in touch →

MultiLevel Pattern Miner#

Key Features#

Knowledge Platform#

Key Features#

Open Knowledge Systems#

Key Features#

DITA Package Processor#

Key Features#

DITA ETL Pipeline#

Key Features#

Markdown Validator#

Key Features#

Novel Testbed#

Key Features#

Let’s Talk#

MultiLevel Pattern Miner

Key Features

Knowledge Platform

Key Features

Open Knowledge Systems

Key Features

DITA Package Processor

Key Features

DITA ETL Pipeline

Key Features

Markdown Validator

Key Features

Novel Testbed

Key Features

Let’s Talk