Text Classification Pipeline¶
The TransformersSharp.TextClassificationPipeline
class provides a high-level interface for performing text classification tasks using pre-trained models from the Hugging Face Transformers library. It simplifies the process of classifying text by handling tokenization, model inference, and decoding.
What is a Text Classification Pipeline?¶
A text classification pipeline is designed to classify input text into predefined categories. It is commonly used for tasks like:
- Sentiment analysis
- Topic classification
- Spam detection
Key Features of Text Classification Pipelines:¶
- Pre-trained Models: Leverages state-of-the-art models like BERT, DistilBERT, and others.
- Batch Processing: Supports single and batch inputs for efficient processing.
- Confidence Scores: Provides confidence scores for each classification label.
Using the TextClassificationPipeline Class¶
The TextClassificationPipeline
class in TransformersSharp
provides methods to classify text. Below are examples of how to use it.
Classifying a Single Input¶
using TransformersSharp.Pipelines;
var pipeline = TextClassificationPipeline.FromModel("distilbert-base-uncased-finetuned-sst-2-english");
var result = pipeline.Classify("I love programming!");
foreach (var (label, score) in result)
{
Console.WriteLine($"Label: {label}, Score: {score}");
}
Equivalent Python Code:
from transformers import pipeline
pipeline = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
result = pipeline("I love programming!")
for item in result:
print(f"Label: {item['label']}, Score: {item['score']}")
Classifying Batch Inputs¶
using TransformersSharp.Pipelines;
var pipeline = TextClassificationPipeline.FromModel("distilbert-base-uncased-finetuned-sst-2-english");
var inputs = new List<string> { "I love programming!", "I hate bugs!" };
var results = pipeline.ClassifyBatch(inputs);
foreach (var (label, score) in results)
{
Console.WriteLine($"Label: {label}, Score: {score}");
}
Equivalent Python Code:
from transformers import pipeline
pipeline = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
inputs = ["I love programming!", "I hate bugs!"]
results = pipeline(inputs)
for item in results:
print(f"Label: {item['label']}, Score: {item['score']}")
Accessing the Tokenizer¶
The TextClassificationPipeline
class provides access to the associated tokenizer through the Tokenizer
property. This allows users to preprocess inputs or decode outputs manually if needed.
Example: Accessing the Tokenizer¶
using TransformersSharp.Pipelines;
var pipeline = TextClassificationPipeline.FromModel("distilbert-base-uncased-finetuned-sst-2-english");
var tokenizer = pipeline.Tokenizer;
var inputIds = tokenizer.Tokenize("I love programming!");
Console.WriteLine(string.Join(", ", inputIds.ToArray()));
Equivalent Python Code: