Understanding Span Extraction Loss in BERT

For example, in this case, BERT might identify that “quick” and “brown” are part of a span that describes the color of something (like a fox), while “jumps” and “lazy” belong to different spans that describe actions or qualities.

To help BERT learn how to do this, we use a loss function called Span Extraction Loss. This loss function tells BERT whether it’s doing a good job of identifying the right spans in a given text. For example, if BERT thinks “quick” and “brown” are part of the same span but they actually belong to different ones (like “quickly” and “jumps”), then Span Extraction Loss will penalize BERT for making that mistake.

Here’s an example of how this loss function works in practice: let’s say we have a text like “The cat sat on the mat.” We want to identify spans that describe actions (like “sat”) and objects (“cat” and “mat”). To do this, BERT might output something like [(0, 3), (6, 12)], which indicates that it thinks the span from position 0 to position 3 describes an action, while the span from position 6 to position 12 describes an object.

Now let’s say we have another text: “The cat chased the mouse.” We want BERT to identify spans for actions (“chased”) and objects (“cat” and “mouse”). To do this, BERT might output something like [(0, 5), (8, 16)], which indicates that it thinks the span from position 0 to position 5 describes an action, while the span from position 8 to position 16 describes objects.

In both cases, Span Extraction Loss will compare BERT’s output to a set of ground truth spans (which are provided by humans) and calculate how far apart they are. If BERT is close to the ground truth spans, then it gets rewarded with a low loss value. But if BERT is way off base, then Span Extraction Loss will penalize it with a high loss value.

Over time, as BERT learns from these examples and tries to minimize its losses, it becomes better at identifying the right spans in new texts that it hasn’t seen before. And that’s how we can use Span Extraction Loss to help BERT understand text even better than a human could ever dream of!

SICORPS