Detailed explanation of the use of BERT (actual combat)

Detailed explanation of the use of BERT (actual combat)

The BERT model can essentially be regarded as a new word2Vec. For existing tasks, just consider the output of BERT as word2vec, and build your own model on top of it.

1. Download BERT

The first 4 are English models, Multilingual is a multi-language model, and the last is a Chinese model (only word level). Uncased means that all letters are converted to lowercase, while Cased preserves case.

The BERT source code can be obtained on Tensorflow's GitHub.

### The demo address of this article , you need to download the BERT-Base, Chinesemodel and put it in the root directory

2. Load BERT

There is a demo of how to use BERT in the official source code. The TPUEstimator package is used in the demo, which is not good for debugging. In fact, the loading of BERT is very simple.

Look at the code directly

import tensorflow as tf
from bert import modeling
import os

#  bert 
bert_config = modeling.BertConfig.from_json_file("chinese_L-12_H-768_A-12/bert_config.json")
#   bert 
input_ids=tf.placeholder (shape=[64,128],dtype=tf.int32,name="input_ids")
input_mask=tf.placeholder (shape=[64,128],dtype=tf.int32,name="input_mask")
segment_ids=tf.placeholder (shape=[64,128],dtype=tf.int32,name="segment_ids")

#  bert 
model = modeling.BertModel(
    use_one_hot_embeddings=False #  TPU  True CPU  GPU  False  

init_checkpoint = "chinese_L-12_H-768_A-12/bert_model.ckpt"
use_tpu = False
tvars = tf.trainable_variables()
#  BERT 

(assignment_map, initialized_variable_names) = modeling.get_assignment_map_from_checkpoint(tvars,

tf.train.init_from_checkpoint(init_checkpoint, assignment_map)"**** Trainable Variables ****")
for var in tvars:
    init_string = ""
    if in initialized_variable_names:
        init_string = ", *INIT_FROM_CKPT*""  name = %s, shape = %s%s",, var.shape,
with tf.Session() as sess:

The above is an extraction according to the source code.

The following code can also load the model

import tensorflow as tf
from bert import modeling
import os

pathname = "chinese_L-12_H-768_A-12/bert_model.ckpt" #  
bert_config = modeling.BertConfig.from_json_file("chinese_L-12_H-768_A-12/bert_config.json")#  
configsession = tf.ConfigProto()
configsession.gpu_options.allow_growth = True
sess = tf.Session(config=configsession)
input_ids = tf.placeholder(shape=[64, 128], dtype=tf.int32, name="input_ids")
input_mask = tf.placeholder(shape=[64, 128], dtype=tf.int32, name="input_mask")
segment_ids = tf.placeholder(shape=[64, 128], dtype=tf.int32, name="segment_ids")

with sess.as_default():
    model = modeling.BertModel(
    saver = tf.train.Saver()  bert demo1 
    saver.restore(sess, pathname)


It is very clear here, which is the commonly used TensorFlow model loading method.

3. Use the model

Bert get the model output is very simple, model.get_sequence_output()and the model.get_pooled_output()two methods.

output_layer = model.get_sequence_output()#  token output  [batch_size, seq_length, embedding_size]  seq2seq  ner  

output_layer = model.get_pooled_output() #  output

So what does bert's input look like? Look at the following code

def convert_single_example( max_seq_length,
  tokens_a = tokenizer.tokenize(text_a)
  tokens_b = None
  if text_b:
    tokens_b = tokenizer.tokenize(text_b)#  
  if tokens_b:
    #   max_seq_length - 3
    #  [CLS], [SEP], [SEP]
    _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)
    #  [CLS], [SEP]   max_seq_length - 3
    if len(tokens_a) > max_seq_length - 2:
      tokens_a = tokens_a[0:(max_seq_length - 2)]

  #  bert type_ids   segment_ids
  # (a)  :
  #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]
  #  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1
  # (b)  :
  #  tokens:   [CLS] the dog is hairy . [SEP]
  #  type_ids: 0     0   0   0  0     0 0
  #   "type_ids"  
  #  0 1 
  #  [SEP]  type_ids  

  tokens = []
  segment_ids = []
  for token in tokens_a:
  if tokens_b:
    for token in tokens_b:
  input_ids = tokenizer.convert_tokens_to_ids(tokens)#  ids
  #  mask
  input_mask = [1] * len(input_ids)
  #  0
  while len(input_ids) < max_seq_length:
  assert len(input_ids) == max_seq_length
  assert len(input_mask) == max_seq_length
  assert len(segment_ids) == max_seq_length
  return input_ids,input_mask,segment_ids #  bert input_ids,input_mask,segment_ids  

The above code is to convert a single sample. The comments in the code explain it in detail. The parameters are explained below max_seq_length:: is the maximum length of each sample, that is, the maximum number of words. tokenizer: It is a module provided in the bert source code. In fact, the main function is to split the sentence into words and map the words into id text_a: sentence a text_b: sentence b

4 Notable places

  • 1. The bert model has a maximum length for the input sentence. For the Chinese model, I see 512 words.
  • 2. When we use to model.get_sequence_output()get the word vector of each word, note that the head and tail are the vectors of [CLS] and [SEP]. You need to pay attention when doing NER or seq2seq.
  • 3. The bert model has high memory requirements. When running the demo of this article, if the memory is insufficient, you can reduce the batch_size and max_seq_length to try.