Direct Nash Optimization: Teaching language models to self-improve