Text Classification Pipeline¶

The TransformersSharp.TextClassificationPipeline class provides a high-level interface for performing text classification tasks using pre-trained models from the Hugging Face Transformers library. It simplifies the process of classifying text by handling tokenization, model inference, and decoding.

What is a Text Classification Pipeline?¶

A text classification pipeline is designed to classify input text into predefined categories. It is commonly used for tasks like:

Sentiment analysis
Topic classification
Spam detection

Key Features of Text Classification Pipelines:¶

Pre-trained Models: Leverages state-of-the-art models like BERT, DistilBERT, and others.
Batch Processing: Supports single and batch inputs for efficient processing.
Confidence Scores: Provides confidence scores for each classification label.

Using the TextClassificationPipeline Class¶

The TextClassificationPipeline class in TransformersSharp provides methods to classify text. Below are examples of how to use it.

Classifying a Single Input¶

using TransformersSharp.Pipelines;

var pipeline = TextClassificationPipeline.FromModel("distilbert-base-uncased-finetuned-sst-2-english");
var result = pipeline.Classify("I love programming!");
foreach (var (label, score) in result)
{
    Console.WriteLine($"Label: {label}, Score: {score}");
}

Equivalent Python Code:

from transformers import pipeline

pipeline = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
result = pipeline("I love programming!")
for item in result:
    print(f"Label: {item['label']}, Score: {item['score']}")

Classifying Batch Inputs¶

using TransformersSharp.Pipelines;

var pipeline = TextClassificationPipeline.FromModel("distilbert-base-uncased-finetuned-sst-2-english");
var inputs = new List<string> { "I love programming!", "I hate bugs!" };
var results = pipeline.ClassifyBatch(inputs);
foreach (var (label, score) in results)
{
    Console.WriteLine($"Label: {label}, Score: {score}");
}

Equivalent Python Code:

from transformers import pipeline

pipeline = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
inputs = ["I love programming!", "I hate bugs!"]
results = pipeline(inputs)
for item in results:
    print(f"Label: {item['label']}, Score: {item['score']}")

Accessing the Tokenizer¶

The TextClassificationPipeline class provides access to the associated tokenizer through the Tokenizer property. This allows users to preprocess inputs or decode outputs manually if needed.

Example: Accessing the Tokenizer¶

using TransformersSharp.Pipelines;

var pipeline = TextClassificationPipeline.FromModel("distilbert-base-uncased-finetuned-sst-2-english");
var tokenizer = pipeline.Tokenizer;
var inputIds = tokenizer.Tokenize("I love programming!");
Console.WriteLine(string.Join(", ", inputIds.ToArray()));

Equivalent Python Code:

from transformers import pipeline

pipeline = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
tokenizer = pipeline.tokenizer
input_ids = tokenizer("I love programming!", return_tensors="pt")["input_ids"]
print(input_ids.tolist())