Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models