The study focuses on the use of pre-trained language models for text-to-code generation. The researchers explore how sequence tokens can be adapted and represented differently depending on their modality. They experiment with separating embedding spaces between modalities during further model pre-training. The research shows consistent improvements in text-to-code generation across two models and two test sets.

 

Publication date: 8 Feb 2024
Project Page: https://github.com/huawei-noah/noah-research/tree/master/NLP/text2code_mrpt
Paper: https://arxiv.org/pdf/2402.05783