- Blog/
Inspecting yaml frontmatter in markdown files with Nushell
Markdown is widely used when working with text based generative artificial intelligence.
YAML is suitable for more structured data.
The two can be combined as follows.
---
title: Blog about Nushell and structured data
date: 2024-09-19
tags:
- Nushell
- data
---
The body text goes here.
Imagine the above text is in a file called blog-1.md
The part between the ---
lines is the frontmatter in YAML. Everything below the second ---
line is the markdown content.
Then in Nushell I create a small custom command like so:
def frontmatter [] {
lines |
split list '---' |
do { |lst|
let fm = $lst.0 |
to text | from yaml |
into value | into record
{
frontmatter: $fm
content: ($lst | skip 1 | to text | str trim)
}
} $in
}
With this I can do:
open blog-1.md | frontmatter
And I get
╭─────────────┬────────────────────────────────────────────────────╮
│ │ ╭───────┬────────────────────────────────────────╮ │
│ frontmatter │ │ title │ Blog about Nushell and structured data │ │
│ │ │ date │ 11 hours ago │ │
│ │ │ │ ╭───┬─────────╮ │ │
│ │ │ tags │ │ 0 │ Nushell │ │ │
│ │ │ │ │ 1 │ data │ │ │
│ │ │ │ ╰───┴─────────╯ │ │
│ │ ╰───────┴────────────────────────────────────────╯ │
│ content │ The body text goes here. │
╰─────────────┴────────────────────────────────────────────────────╯
This can be used in other pipelines, eg. when publishing
let body = open $path | frontmatter | get content
let meta = open $path | frontmatter | get frontmatter
But also for finding, for example, posts that are not yet published:
open *.md | each { frontmatter } | get frontmatter | where draft == true
and slightly more complicated things like finding the most frequently used tags:
glob **/*.md | par-each { open $in | frontmatter | get frontmatter.tools? } | flatten | uniq -c | sort-by count
giving
╭────┬───────────┬───────╮
│ # │ value │ count │
├────┼───────────┼───────┤
│ 0 │ latex │ 1 │
│ 1 │ fontforge │ 1 │
│ 2 │ groff │ 1 │
│ 3 │ LLM │ 1 │
│ 4 │ Nushell │ 1 │
│ 5 │ markdown │ 1 │
│ 6 │ Marvin │ 1 │
│ 7 │ drupal │ 1 │
│ 8 │ pandoc │ 2 │
│ 9 │ llm │ 2 │
│ 10 │ nvim │ 3 │
│ 11 │ aichat │ 3 │
│ 12 │ ChatGPT │ 4 │
│ 13 │ Nushell │ 13 │
╰────┴───────────┴───────╯
Note the ?
after .tools
in the above command. This is because not all my markdown files has a tools key in the frontmatter. When ?
is used, files without the tools key will just be ignored instead of breaking the pipeline.
Much, much more can be done with this of course. Not least combining it with LLMs.
Final note:
Doing
glob **/*.md | par-each { open $in | frontmatter }
is more robust than
open **/*.md | par-each { frontmatter }
when some files have lots of weird characters in them.