Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.

Doc Processing in YSH - Notation, Query, Templating

This is a slogan for "maximalist YSH" design:

Documents, Objects, and Tables - HTML, JSON, and CSV

This design doc is about the first part - documents and document processing.

† from a paper about the C# language

Table of Contents
Intro
Use Cases for HTML Processing
Operations

Intro

Let's sketch a design for 3 aspects of doc processing:

  1. HTM8 Notation - A subset of HTML5 meant for easy implementation, with regular languages.
  2. A subset of CSS for querying
  3. Templating in the Markaby style (a bit like Lisp, but unlike JSX templates)

The basic goal is to write ad hod HTML processors.

YSH programs should loosely follow the style of the DOM API in web browsers, e.g. document.querySelectorAll('table#mytable') and the doc fragments it returns.

Note that the DOM API is not available in node.js or Deno by default, much less any alternative lightweight JavaScript runtimes.

I believe we can write include something that's simpler, and just as powerful, in YSH.

Use Cases for HTML Processing

These will help people get an idea.

  1. making Oils cross-ref.html
  2. table language - md-ul-table
  3. safe HTML subset, e.g. for publishing user results on continuous build

Design goals:

Operations

Constructors:

doc {  # prints valid HT8
  p {
    echo 'hi'
  }
  p {
    'hi'  # I think I want to turn on this auto-quote feature
  }
  raw '<b>bold</b>'
}

And then

doc (&mydoc) {  # captures the output, and creates a value.Obj
  p {
    'hi'  # I think I want to turn on this auto-quote feature
    "hi $x"
  }
}

This is the same as the table constructor

Module:

source $LIB_YSH/doc.ysh

doc (&d) {
}
doc {
}
doc('<p>')

This can have both __invoke__ and __call__

var results = d.query('#a')

# The doc could be __invoke__ ?
d query '#a' {
}

doc query (d, '#a') {
  for result in (results) {
    echo hi
  }
}

# we create (old, new) pairs?
# this is performs an operation like:
# d.outerHTML = outerHTML
var d = d.replace(pairs)

Safe HTML subset

d query (tags= :|a p div h1 h2 h3|) {
  case (_frag.tag) {
    a {
      # get a list of all attributes
      var attrs = _frag.getAttributes()
    }
  }
}

If you want to take user HTML, then you first use an HTML5 -> HT8 converter.

Generated on Sun, 05 Jan 2025 23:28:55 -0500