Tokenization for language modeling: BPE vs. Unigram Language Modeling (2020)