Case Study · NLP & AI Research

From 0.653 to 0.782 BLEU with a
Prefix-Tuned Encoder-Decoder

Mitigating Low-Resource Domains in Chatbot Development via Parameter-Efficient Fine-Tuning (PEFT).

2024 Dataset: 4,489 Q&A Pairs
0.85%
Parameters Updated (5.9M)
+20%
BLEU Score Increase
89.67
chrF Score Achieved
3
Optimal Prefix Tokens

Background

Developing chatbots for specialized domains is often hindered by data scarcity and informal linguistic variations (low-resource domains), which significantly degrade model relevance and performance.

In the Indonesian context, this data scarcity is intensified by vast linguistic diversity, encompassing formal, informal, and numerous regional dialects. Capturing these semantic nuances is difficult for traditional systems, which frequently fail to recognize user intent when phrasing deviates from standard linguistic patterns.


The Low-Resource Challenge

🔍

Data Scarcity

Specialized institutional domains, such as academic services at FILKOM UB, lack massive, diverse training datasets needed to train generative models from scratch.

🗣️

Morphological Complexity

Indonesian dialectal variations and informal structures mean the same intent can be expressed in vastly different syntactic forms.

⚠️

Overfitting Risks

Standard fine-tuning involves updating all model parameters, which is computationally expensive and prone to overfitting in low-data regimes.

Architectural Gap

Standard Transformer decoders struggle to maintain generation quality and context alignment without rigorous guidance in zero-shot or few-shot scenarios.


The Proposed Solution

I designed a Parameter-Efficient Fine-Tuning (PEFT) strategy leveraging a novel prefix-tuned encoder-decoder architecture. By synergizing a Semantic-Based Model with a Generative-Based Model, this framework acts as a highly capable engine without needing full retraining.

Semantic Encoder

IndoSBERT-Large

Semantic Vector

MLP Adapter

Projects Vector to Continuous Prefix

3 Prefix Tokens

Generative Decoder

GPT-2 Medium

Input: [Prefix Tokens] + Target Context
Output: Contextually Generated Answer
1

Semantic Encoder (IndoSBERT-Large)

A pretrained encoder processes structurally diverse user inputs into a holistic, fixed-dimensional semantic vector, clustering dialectal variations that share the same intent.

2

MLP Adapter Mapping

The semantic vector is mathematically projected via a Multi-Layer Perceptron adapter to become continuous trainable vectors (prefixes) serving as soft prompts.

3

Generative Decoder (GPT-2 Medium)

These continuous semantic prefixes are prepended to the frozen GPT-2 Medium decoder, guiding the auto-regressive text generation strictly according to the semantic intent.


Semantic Encoding & Clustering

The core advantage of IndoSBERT-Large in this architecture is its ability to cluster semantically identical queries (formal queries vs. regional dialects) into tight, distinct groups. Standard token-by-token processing fails here due to informal variations.

Traditional Approach

Who is the dean of my faculty?
Who's lead my faculty?
Semantic gap: Maps to distinct vectors

Proposed Approach (IndoSBERT)

Who is the dean of my faculty?
Who's lead my faculty?
Shared Semantic Vector Space

Ablation Study & Hyperparameter Optimization

Rigorous hyperparameter tuning and ablation studies were conducted to determine the optimal configuration. The results validate that explicit semantic guidance and a specific prefix token length are strictly necessary to prevent severe performance degradation.

Model A: Full Benchmark

IndoSBERT-Large + GPT-2 Medium (3 Prefix Tokens).

0.782
BLEU Score

Model B: Semantic Ablation

Encoder removed. Prefixes initialized as randomly trainable vectors.

0.501
BLEU Score

Model C: Pretraining Ablation

Encoder architecture maintained but trained entirely from scratch.

0.583
BLEU Score

Grid Search Highlights

Model Encoder Model Decoder Prefix Tokens BLEU chrF
IndoSBERT-Large GPT2-Medium (Indo) 3 0.782 89.672
IndoSBERT-Large GPT2-Large (Indo) 3 0.757 86.971
IndoSBERT-Large GPT2-Medium (Indo) 5 0.724 85.872
Transformer (Baseline) Transformer 0 0.653 77.654
IndoBERT-Large GPT2-Medium (Indo) 0 0.591 70.941

Results Achieved

Evaluated against a standard baseline Transformer model trained from scratch, the proposed prefix-tuned architecture displayed remarkable improvements:

0.653 → 0.782
BLEU Score Improvement (~20%) over standard baseline
77.65 → 89.67
chrF Score Improvement handling morphological complexity
0.85%
Only 5.9 million out of ~695 million parameters updated

Want to dive deeper into the research?

The full academic paper is available, detailing the ablation studies, cosine similarity embeddings, and hyperparameter tuning phases.

Read the Full Paper →