Your configs suck? Try a real programming language. | beepb00p
Or yet another rant about YAML
In this post, I’ll try to explain why I find most config formats frustrating to use and
In this section, I’m mostly referring to JSON/YAML/TOML/ini files, which are the most common config formats I encounter.
I’ll refer to such configs as plain configs. Not sure if there is a better name for it, please let me know!
An incomplete list of my frustrations:
bits of configs can’t be reused
For example, while YAML, in theory, supports reusing/including bits of the config (they call it anchors),
Usually, you just don’t have any means of reusing parts of your config and have to copy-paste.
can’t contain any logic
This is considered as a positive by many, but I would argue that when you can’t define temporary variables, helper functions, substitute strings or concatenate lists, it’s a bit fucked up.
The workarounds (if present) are usually pretty horrible and impose cognitive overhead. Programming language constructs are reinvented from scratch:
Github Actions use a custom syntax for that
In addition, they’ve got their own set of functions to manipulate the variables.
scoping
I.e. there are several custom scopes for env directive in Github Actions.
can’t be validated
You can validate the config syntax itself (i.e. check JSON for correctness), but you can’t do semantic checks.
This is kind of a consequence of not having logic in the config files.
Very few programs bother with that and usually, your program crashes because of something that would be trivial to catch with any simple type system.
YAML simply stands out with its implicit conversions and portability issues
There are enough rants about it, so I’ll just leave a link to a good one: “YAML: probably not so great after all”.
Summary: we spend time learning useless syntax, instead of productive work.
So what happens when people encounter these problems?
you write a program that ‘evaluates’ the config
Often, you end up reimplementing an interpreter for a simple functional language in the process.
you write a program to validate the config
For the most part, it’s boilerplate for type checking. You’re not only working on a solved problem but in addition, end up with mediocre error messages as a result.
All this stuff is unpleasant and distracts you from your main objective.
Perhaps you can see where I’m coming with this.
The idea is to write your config in your target programming language.
Then, you simply import/evaluate your config file and viola – you’re done. That’s it.
Toy example:
config.py
Using the config:
I find it pretty neat.
includes: trivial, use imports
You can even import the very package you’re configuring.
logic
You have your language’s syntax and libraries available to use.
Of course, one could go crazy and make it incomprehensible.
validation
You can keep validation logic right in the config, so it would be checked at the time of loading.
Are there any problems with that approach? Sure:
interoperability
Okay, maybe if your program is in Python it makes sense. But what if it isn’t, or you’ll rewrite it to another language (i.e. compiled, like c++) later.
If you’ll be running your software somewhere without an interpreter, then sure, good point.
In case of Python specifically, it’s present in most modern OS distributions. So you might get away with the following:
in the main() function, build the config, convert to JSON and dump to the stdout
This step is possible with no boilerplate due to Python’s dynamic nature.
Yep, you will still have to manually deserialize config in the c++ code. But I think that’s at least not worse than only using JSON and editing it manually.
general-purpose programming languages are harder to reason about
This is somewhat subjective. Personally, I’d be more likely overwhelmed by an overly verbose plain config. I’d always prefer a neat and compact DSL.
A large factor here is code style: I’m sure you can make your config file readable in almost any programming language,
The biggest issues are probably security and termination checking.
security
I.e. if your config executes arbitrary code, then it may steal your passwords, format your hard drive, etc.
If your configs are supplied by third parties you don’t trust, then I agree that plain configs are safer.
In addition, this is something that can be potentially solved by sandboxing. Whether it’s worth the effort depends on the nature of your project, but for something like CI executor you need it anyway.
Also, note that using a plain config format doesn’t necessarily save you from trouble. See “YAML: insecure by default”.
termination checking
Even if you don’t care about security, you don’t want your config to hang the program.
Personally, I’ve never run into such issues, but here are some potential workarounds for that:
using a subset of the language might help, for example, Bazel
Anyone knows examples of conservative static analysis tools that check for termination in general purpose languages?
note that using a plain config doesn’t mean it won’t loop infinitely
your config can take very long time to evaluate, while technically taking finite time to complete
See “Why Dhall advertises the absence of Turing-completeness”
While an Ackermann function is a contrived example,
Some reasons I find Python specifically enjoyable for writing config files:
However, you can achieve a similarly pleasant experience in most modern programming languages (provided they are dynamic enough).
Some projects that allow for using code as configuration:
setuptools, the standard way of installing Python packages
Allows using both setup.cfg and setup.py files. That way if you can’t achieve something solely with plain config, you can fix this in setup.py.
Jupiter, interactive computing tool
Uses a python file to configure the export.
Emacs: famously uses Elisp for its configuration
While I’m not a fan of Elisp at all, it does make Emacs very flexible and it’s possible to achieve any configuration you want.
On the other hand, if you’ve ever read other people’s Emacs setups, you can see it also demonstrates how things can get out of hand when you allow
Bazel uses a subset of Python for describing build rules
While it’s deliberately restricted to ensure termination checking and determinism, configuring Bazel is orders of magnitude more pleasant than any other build system I’ve used.
Nix: language designed specifically for the Nix package manager
While a completely new language feels like an overkill, it’s still nicer to work with than plain configs.
Dhall: language designed specifically for config files
Dhall advertises itself as “JSON + functions + types + imports”. And indeed, it looks great, and solves most of the issues I listed.
One downside is that it’s not widespread yet. If you don’t have bindings for your target language, you’d end up parsing JSON again.
But again, if your program is written in Javascript and doesn’t interact with other languages, why don’t you just make the config Javascript?
Some ways I’ve found to minimize the frustration while using plain configs:
write as little in config files as possible
This typically applies to CI pipeline configs (i.e. Gitlab/Circle/Github Actions) or Dockerfiles.
Often such configs are bloated with shell commands, which makes it impossible to run locally without copying line by line.
prefer helper shell scripts and call them from your pipeline
It is a bit frustrating since it introduces indirection and scatters code around.
Sometimes you can get away if your pipeline is short, so use your own judgment.
Let the CI only handle setting up a VM/container for you, caching the dependencies, and publishing artifacts.
generate instead of writing manually
The downside is that the generated config may diverge if edited manually.
You can add the warning comment that the config is autogenerated with the link to the generator, and make the config file read-only to discourage manual editing.
In addition, if you’re running CI, you can make the consistency check a part of the pipeline itself.
(commandline) flags are great for configuration
Overall, I agree, but there are still cases when using flags isn’t feasible.
It’s also prone to leaking secrets (keys/tokens/passwords) – both in your shell history and via ps.
Xmonad: config is the executable
Interesting approach, but not always feasible, e.g. you might not have the compiler installed.
A followup question, which I don’t have an answer for: why is is that way?
Open to all feedback, and feel free to share your config pain and how are you solving it!
suggest that using a real programming language (i.e. general purpose one, like Python) is often a feasible and more pleasant alternative for writing configs.
¶1 Most modern config formats suck
some software like Github Actions doesn’t support it
.gitconfig
uses a custom syntax for merging the configs
Have fun learning a new language you never wanted to!
for
loop: build matrices and ‘excludes’ always give me a headacheif
statement: e.g. when in CircleCI
Typically you’ll have to write a supplementary program to check your configs and remember to call it before passing to a program.
¶2 Workarounds
Often they end up using a ‘real’ (i.e. general purpose, Turing complete) programming language anyway:
¶3 Use a real programming language
I’ll have Python in mind here, but the same idea can be applied to any dynamic enough language (i.e. Javascript/Ruby/etc).
from typing import NamedTuple
class Person(NamedTuple):
name: str
age: int
PEOPLE = [
Person('Ann' , 22),
Person('Roger', 15),
Person('Judy' , 49),
]
from pathlib import Path
config = {}
exec(Path('config.py').read_text(), config)
people = config['PEOPLE']
print(people)
[Person(name='Ann', age=22), Person(name='Roger', age=15), Person(name='Judy', age=49)]
Let’s see how it helps us with the problems I described:
So you can define a DSL for configuration, which will be imported and used in the config file.
For example, something like pathlib alone can save you massive amounts of config duplication.
But personally I’d rather accept potential for abusing rather than being restricted.
Mature static analysis tools (i.e. JS flow/eslint/pylint/mypy) can be used to aid you.
¶Downsides
Modern FFI is tedious and linking against your config is going to be pretty tricky.
popen()
), read the raw JSON and process
even for people not familiar with the language at all.
However, often it’s not the case, and the user controls their own config.
that means that if you truly care about malicious inputs, you want to sandbox anyway.
¶Why Python?
¶Who else does it?
a general purpose language for configuration.
However, at least it makes writing configs pleasant.
¶4 What if you don’t have a choice?
And yeah, there are ways to debug, but they have a pretty slow feedback loop.
But, as an upside, you can lint (e.g. shellcheck) your pipeline scripts, and make it easier to run locally.
¶5 Extra links
¶6 —
I’m sure Ansible/CircleCI or Github Actions are developed by talented engineeres who have considered pros and cons of using YAML.
Do the pros really outweight the cons?