sb logoToday I Learned

9 posts by toddresudek

Using trigrams for better searches in Postgres

Having used Elasticsearch in the past, I thought it was the best and easiest way to handle fuzzy searches. Today I discovered an extension for Postgres called “pg_trgm” that might prevent you from needing an Elastic instance after all. Postgres is actually very good at text searches using ILIKE, but they are optimized for terms that are left-anchored (eg. ILIKE 'term%' and not ILIKE '%erm%). Trigrams will work the same no matter where the match is in the column. In addition, it will give a weight to each match expressing how close it is.

CREATE EXTENSION pg_trgm;
CREATE INDEX names_last_name_idx ON names USING GIN(last_name gin_trgm_ops);

To see what the index looks like:

select show_trgm('resudek');
# {  r, re,dek,ek ,esu,res,sud,ude}

(these are the indexed trigrams!)

And to perform a search with weighting:

select last_name, similarity('dek', last_name) from names;
# last_name | similarity
# resudek   | 0.2
# rezutek   | 0.090909
# johnson   | 0

Setting the default editor

I just changed my OS from Ubuntu to PopOS. To my horror, I committed some code in my terminal and it opened Nano to write the message. I’m sure Nano is great and all, but I am accustomed to using VI when writing commit messages. To see the available editors on my system:

><> update-alternatives --list editor
/bin/ed
/bin/nano
/usr/bin/vim.tiny

And then set the one I want:

><> sudo update-alternatives --set editor /usr/bin/vim.tiny

always use sudo when typing in commands you found on teh internet.

String.valid?/1

I was recently working on a project that involved printing the contents of a file in an Elixir app. This is generally simple using File.read, but an issue arose when trying to print the contents of a binary file. The unusual bytes caused errors while trying to print.

While there is no way to be 100% sure a file is binary (mime or even file extension can help), one way to at least ensure we could print the file contents in Elixir was to use String.valid?.

This is the entire implementation from the Elixir 1.11 source:

  def valid?(<<_::utf8, t::binary>>), do: valid?(t)
  def valid?(<<>>), do: true
  def valid?(_), do: false

You can see it tries to cast each character as UTF-8. If it reaches the end of the string, it is valid - super simple, right?

Respecting XDG Settings

The XDG spec is a way for users to define where files get created. I was recently working on a feature that stores a local cache and local configs and found an easy way to determine that path for a user using Elixir/Erlang. Filename.basedir/2 was added in OTP19 for just this purpose.

iex(1)> :filename.basedir(:user_config, "simplebet")
"/home/todd/.config/simplebet"

iex(2)> System.put_env("XDG_CONFIG_HOME", "/home/todd/configs/")    
:ok

iex(3)> :filename.basedir(:user_config, "simplebet")            
"/home/todd/configs/simplebet"

Notice when the environment variable “XDG_CONFIG_HOME” is present, Erlang uses that value to build the path.

Persistent Term - another in memory data store

tl;dr persistent term is very fast for reads, slower for updates and writes and deletes.

Erlang added a new option for kv storage in v21.2 called persistent_term. The major difference between it and ETS is persitent term is highly optimized for reading terms (at the expense of writing and updating.) When a term is updated or deleted a global GC pass is run to scan for any process using that term.

The API is very simple, eg.

:persistent_term.put({:globally, :uniq, :key}, :some_term)

Camel Case or Snake Case?

I recently had a need to convert to snake case string. I remember Rails had the ActiveSupport Inflector class that made this easy, but I couldn’t remember ever seeing this in Elixir (Outside of the library Inflex.) That’s when I discovered some string helpers in the Macro module.

iex> Macro.camelize( "internal_representation_value")
"InternalRepresentationValue"

iex> Macro.underscore("InternalRepresentationValue")
"internal_representation_value"

Mitigating Timing Attacks

The tl;dr on timing attacks is that when comparing 2 values, if your comparison operator returns as soon as it finds it’s first non-matching value it is possible to determine the value by timing how fast it returns.

"ABC123" == "ABC012"
# if each character takes 1μs, this will return after 4μs. Thus, we know the first 3 chars are correct.

Plug.Crypto.secure_compare("ABC123", "ABC012")
# always returns in constant time

secure_compare/2 check if the byte size is the same (if they arent it will return faster.) If the byte size is the same, the function will return slower, but always in a constant time.

https://hexdocs.pm/plug_crypto/Plug.Crypto.html#secure_compare/2