Lazy Blogging

Writing was already tedious enough, time to improve that!

Last I left off, this blog was created, and was both quite plain and manual. I had already mentioned an idea I wanted to implement, but pushed it off as it wasn't necessary for the MVP: automatic replacements.

Yaml Front Matter

First though was the slugging issue. "What slugging issue?" you ask? Well the automatic conversion of titles to slugs works for now, but eventually I'd give two blogs the same title, my solution is to add a YAML-encoded front matter block at the top containing various attributes of the post, currently only providing a slug to use.

Summoning Cthulhu

From automatically linking to HTML tags, to automatic <abbr> insertions, the whole works! I do suppose the fact that you're reading this and can see all these abbreviations, links, etc - is a spoiler to my success.

Yeah you read that right, I threw Regex at this, but it's only going to be in moderation - I know what you're thinking: famous last words. I started off writing a few replacements of each type:

CSS: [Cascading Style Sheets, https://developer.mozilla.org/en-US/docs/Web/CSS]
MVP: [Minimum Viable Product]
UI: [User Interface]

and

yt-dlp: [https://github.com/ytdl-org/youtube-dl, youtube-dl fork, "`$0`"]
\```javascript null```: [https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/null]
\```html <(\w*).*?>```: [https://developer.mozilla.org/en-US/docs/Web/HTML/Element/$1]
\```css ([a-z-]*):.*?```: [https://developer.mozilla.org/en-US/docs/Web/CSS/$1]

As can be seen, the link replacements used more regex initially, and the order of arguments in these two files are not the same, which I'd be handling manually in the code.

Curious as to how I escaped the above keys? Say hello to the zero-width space:

First step was to actually perform the parsing and normalization, so each was read and parsed, then based on it's named a remap function was declared that would convert the position-based strings into a consistent interface:

interface ReplaceInfo {
  regex: string
  type: string
  newContent?: string
  title?: string
  url?: string
}

Now there was a issue that I needed to plan for, which would be easily explained via example:

MPEG-TS vs TS

Now when the text MPEG-TS is to be abbreviated it should take priority over the TS being abbreviated - which as you an see, MPEG-TS, is working. In order to achieve this I needed to remove any matches that were fully contained by another match - were being nested. Sounds awfully like one of those overlapping-interval challenges, right?

Thankfully this code wasn't for any interview, so after obtaining all the matches while keeping the initial regex references, I brute forced the detection and filtering of these fully-contained matches.

Replacing

Now onto the final task - taking all the matches and updating the raw markdown text accordingly. The most complicated subtask here was to perform manual templating of my argument strings, replacing $n with the n-th group of the match, I could replace() the $n string with the substring() of the raw markdown using the RegExp.indices.

Lastly I needed to generate the replacement string based on the type of replacement it was - link, abbreviation, raw - and perform the substring()-based replacement of the entire raw markdown string. In order for these indexes to keep their accuracy I did have to sort the matches by their starting index and perform the replacements from last to first, taking only a single call to sort() to complete.

With that - as you can read - I had finished the code and all that remained was the slow building up of these replacement-dictionaries as I went through the existing, and added new blog posts.