andyreagan.com

How to publish Org mode blog to HTML

Backlinks: 2024-07-09 | Org mode | Personal projects | Website

Create blog page

First, run the python script create_blog_page.py with:

python3 create_blog_page.py

which will

make the blog.org file,x
make all of the tag_ files,
clean up all links that have extra brackets
reformat the backlinks tray.

Publish from org mode

Next, use C-c C-e P p and publish the "website" project. ~~If the cache needs to be cleared, do so by running rm -r ~/.org-timestamps.~~ To do this while clearing the cache, use C-u M-x org-publish-current-project.

Copy static

Copy the presentations and teaching folders in whole cloth along with our customer index page (and we'll re-copy the images and other while we're at it):

rsync -av --delete teaching presentations static index.html ~/public_html/

Clean HTML, strip images, fix links and title tag

Now 4 steps!

Run the clearing script

python3 process_html.py

Clear images, do this from the static directory so it doesn't process everything in teaching resources:

cd ~/public_html/static
find . -type f \( -iname "*.jpeg" -o -iname "*.jpg" -o -iname "*.png" -o -iname "*.heic" \) | while read img; do
  echo "Processing: $img"
  # Create temporary output file name
  output="${img%.*}.grainy.${img##*.}"
  # Apply ImageMagick transformations
  magick "$img" -strip -resize '1280x720>' -quality 80 +noise Gaussian -attenuate 70% -colorspace Gray -auto-orient "$output"
  # Replace original with processed version
  mv "$output" "$img"
  echo "Completed: $img"
done

Fix links:

cd ~/public_html
find . -type f -name "*.html" -exec sed -i '' 's/Library\/CloudStorage\/SynologyDrive-OnDemandSync\/org\///g' {} \;

Add a static, but non-empty title element

cd ~/public_html  e
find . -type f -name "*.html" -exec sed -i '' 's/<head>/<head><title>andyreagan.com<\/title>/g' {} \;

Preview by launching a web server python3 -m http.server from that public html folder and going to http://localhost:8000.

cd ~/public_html  
python3 -m http.server

Publish it!

Publish online:

rsync -avz --delete --exclude="@eaDir" ~/public_html/ [email protected]:/volume1/web/

That's the internal network IP, if I'm home. The tailscale IP is 100.120.245.106.

Before org mode, there was Pelican + GH pages, see blog post Github pages with Pelican and the archived repo.

Before Pelican, there was Wordpress (it's still online…).

Blog TODO items

DONE Fix links now from Drive folder

State "DONE" from "WAITING" [2025-05-21 Wed 09:34]
State "IN-PROGRESS" from "TODO" [2025-05-21 Wed 09:34]

Remove the folder in the links, can just do it globally. This is an easy find-replace, just removing one string

find . -type f -name "*.html" -exec sed -i '' 's/Library\/CloudStorage\/SynologyDrive-OnDemandSync\/org\///g' {} \;

DONE Image size is too large (even on desktop)

State "DONE" from "WAITING" [2025-05-21 Wed 11:54]
State "IN-PROGRESS" from "TODO" [2025-05-21 Wed 11:54]

We can fix this using the test site. Just a max-width 100% does the trick.

DONE Insert <head> title element for page titles

TODO Include the created timestamp in the footer

We previously had emacs write a CREATED and UPDATED timestamp to the top of files. The org-publish works with a DATE at the top, so we'll use that instead of our UPDATED, but then we also want to include the CREATED as when the post was created. For practical purposes, all of the posts are going to have UPDATED dates that are pretty recent, given all of the editing. I swapped the UPDATED with DATE like this:

find . -type f -name "*.org" -exec sed -i '' 's/#+UPDATED: /#+DATE:/g' {} \;

And updated our auto-insert to use DATE like:

(defun org-add-timestamps-to-file ()
  "Add CREATED and UPDATED timestamps to org files."
  (when (and (eq major-mode 'org-mode)
             (not buffer-read-only))
    (save-excursion
      (goto-char (point-min))
      ;; Check if CREATED timestamp exists
      (unless (re-search-forward "^#\\+CREATED:" nil t)
        (goto-char (point-min))
        (insert (format "#+CREATED: %s\n" (format-time-string "[%Y-%m-%d %a %H:%M]"))))
      ;; Update or add UPDATED timestamp
      (goto-char (point-min))
      (if (re-search-forward "^#\\+DATE:" nil t)
          (progn
            (kill-line)
            (insert (format " %s" (format-time-string "[%Y-%m-%d %a %H:%M]"))))
        (goto-char (point-min))
        (forward-line)
        (insert (format "#+DATE: %s\n" (format-time-string "[%Y-%m-%d %a %H:%M]")))))))

CLOSED: [2025-05-21 Wed 09:48]

State "DONE" from "WAITING" [2025-05-21 Wed 09:48]
State "IN-PROGRESS" from "TODO" [2025-05-21 Wed 09:48]

While having the title at the top of the page like

#+TITLE: Test page

does get put into the html page title, it also gets put into the page. This seems to clash with the first heading, how I'm using it. The first heading is the title, not everything is a list. We need a header to link to, so using title-only doesn't work.

Workaround number 1 is just to insert a <title> element by find-replace, just one more hack. I would prefer just not putting the title into the exported page, and then we can have a title on every page? A quick test shows that a find-replace on <head> with <head><title>Manual title</title> will work because the manual title is first. Let's just do that and move on!

How to write a new blog post

Start a file with the format YYYY-MM-DD-[short-name].org, write and save, and then follow the above How to publish Org mode blog to HTML. You can double check the format for the filename in the create_blog_page.py script.

Add a line with [random] at the end for the navigation links to get created.

Notes

These are some notes while I was still developing the process with org mode.

notes.org::

this one is special (all notes in one big file)
blow out all the headers under Areas as their own pages
slugify the headers into page names in public_html/notes/[slug].html
copy in images/etc
have a special viewer? that would js only

YYYY-DD-MM-slug.org:

these are posts, put into posts pages
[filename].org -> [filename].html

*.org:

these are "pages"
some can be linked from header
everything is in the root of the public_html

How different is this from standard publish html command? not super different?

Here's the complex example from org docs, doesn't look so bad!

One file or many files

My concern about blowing out the main file into many smaller files is that the links will be broken from within emacs unless we delete all of smaller files again? Will we be messing with emacs?

Now I need to think about how I want to explode the notes here.

Need to preserve emacs experience of linking and such (this is the core software to use the tool).
Need to pull in drafts pages into these notes.
Want a copy of these notes (with links!) to persist into the publish step.
Make the vis page work as well by pulling in JS into the page.

If we do explode the whole thing into separate files, we need to preserve the ability to search! Already this may be strained with blog pages.

Beyond this piece, think about how/when/where we need to integrate a database.
Maybe it makes sense just to have plain text files for everything.

Including html

#+HTML: <span class="inline-html">Inline HTML content</span>

or as a block (note that #+BEGIN_HTML does not exist, it's #+BEGIN_EXPORT html):

#+BEGIN_EXPORT html
<div class="example">
  <p>Your HTML content here</p>
</div>
#+END_EXPORT

Converting images

HEIC files don't display inline in emacs or on the web. Using imagemagick to convert (brew install imagemagick).

magick IMG_1440.HEIC IMG_1440.JPG

Block quotes

Highlight the text then use C-c C-, q.

Include code

Block:

#+BEGIN_SRC [language]

#+END_SRC

Inline use equals signs like this.