Building on the shoulders of giants: transfer learning in NLP

Transfer learning is a way for data science experiments to build on existing successful models. It has been popular in Computer Vision where open source deep learning models trained on ImageNet are commonly used as a base for training more specialised models. Using the same methodology in NLP (Natural Language Processing) means next generation machine comprehension models do not have to start from a blank canvas.

Open source models for machine comprehension are now widely used as a viable alternative to building new projects without a base model.

Transfer learning usually follows one of three paths. 1) Re-train all weights of an existing model architecture, 2) freeze some layers and train others or 3) freeze the entire architecture and model layers. The easiest starting point is usually the last method, particularly when your training data set is small. The other approaches to transfer learning are typically used as part of later stage model testing and tuning.

BERT (Bidirectional Encoder Representations from Transformers) is a popular model for machine comprehension. Developed by Google AI, it also has several specialised flavours (eg RoBERTa for sentiment). BERT has been trained on a large amount of unlabelled text including Wikipedia and Book Corpus (over 3 billion words). A quick web search will explain how to implement BERT for numerous NLP (Natural Language Processing) tasks such as spam detection or chat bot.

Beyond BERT there are many more open source NLP base models. HuggingFace is particularly active in providing access to libraries and API’s. As they state on their site, “solving NLP, one commit at a time”.

OpenAI‘s model GPT3 (Generative Pre-trained Transformer) is also a popular starting point. It has been trained using 175 billion parameters. The size of such models naturally limit the ability for NLP practioners to import and build, so API’s are the natural interface.

There are numerous approaches to solving NLP challenges. These include, but are not limited to: language used, time periods, technical considerations, size of data sets and general considerations of the use case. However, building on the shoulders of NLP giant models is certainly a good consideration for many use cases in NLP.