This project aims to customize BERT to use the output of the specific intermediate layer of pre-trained BERT for certain target tasks. The number of hidden layers is the parameter that can be specified by the user and if the parameter is larger than 12, from the 1st to 12th layer is replaced to the pre-trained BERT while from 13th layer is randomly initialized.