BertForSequenceClassification only seems to have linear activation at the end - is this a bug?

PhilipMay · September 30, 2020, 10:59am

Hi,
the BertForSequenceClassification class seems to have a linear activation at the end of the head.

See here: https://github.com/huggingface/transformers/blob/3323146e904a1092def4e8527de9d2a7479c1c14/src/transformers/modeling_bert.py#L1351

IMO for a binary classification it should have a sigmoid function at the end and for a one of multiple classes classification there should be a softmax at the end. How does this come?

Thanks
Philip

valhalla · September 30, 2020, 2:54pm

Hi @PhilipMay
This is not a bug, At L1351 the pooled output is passed through classification head (linear layer) to get the logits

CrossEntropyLoss does not require softmax , it calculates loss using logits

BertForSequenceClassification returns logits, then you can the apply softmax on the returned logits to get the class scores.

Topic		Replies	Views
How do I do multi Class (multi head) classification? 🤗Transformers	6	4596	October 18, 2022
What is the classification head doing exactly? 🤗Transformers	16	25291	November 4, 2024
BertForSequenceClassification classification head question 🤗Transformers	0	310	July 7, 2022
Always only a single Linear layer as the classification head? 🤗Transformers	0	354	February 23, 2023
Trying to understand XForSequenceClassification heads Intermediate	8	1370	September 24, 2020

BertForSequenceClassification only seems to have linear activation at the end - is this a bug?

Related topics