Case Study — Sistem Monitoring Operasional Cerdas

Background

Developing chatbots for specialized domains is often hindered by data scarcity and informal linguistic variations (low-resource domains), which significantly degrade model relevance and performance.

In the Indonesian context, this data scarcity is intensified by vast linguistic diversity, encompassing formal, informal, and numerous regional dialects. Capturing these semantic nuances is difficult for traditional systems, which frequently fail to recognize user intent when phrasing deviates from standard linguistic patterns.

The Low-Resource Challenge

🔍

Data Scarcity

Specialized institutional domains, such as academic services at FILKOM UB, lack massive, diverse training datasets needed to train generative models from scratch.

🗣️

Morphological Complexity

Indonesian dialectal variations and informal structures mean the same intent can be expressed in vastly different syntactic forms.

⚠️

Overfitting Risks

Standard fine-tuning involves updating all model parameters, which is computationally expensive and prone to overfitting in low-data regimes.

⚡

Architectural Gap

Standard Transformer decoders struggle to maintain generation quality and context alignment without rigorous guidance in zero-shot or few-shot scenarios.

The Proposed Solution

I designed a Parameter-Efficient Fine-Tuning (PEFT) strategy leveraging a novel prefix-tuned encoder-decoder architecture. By synergizing a Semantic-Based Model with a Generative-Based Model, this framework acts as a highly capable engine without needing full retraining.

Semantic Encoder

IndoSBERT-Large

Semantic Vector

MLP Adapter

Projects Vector to Continuous Prefix

3 Prefix Tokens

Generative Decoder

GPT-2 Medium

Input: [Prefix Tokens] + Target Context
Output: Contextually Generated Answer

Semantic Encoder (IndoSBERT-Large)

A pretrained encoder processes structurally diverse user inputs into a holistic, fixed-dimensional semantic vector, clustering dialectal variations that share the same intent.

MLP Adapter Mapping

The semantic vector is mathematically projected via a Multi-Layer Perceptron adapter to become continuous trainable vectors (prefixes) serving as soft prompts.

Generative Decoder (GPT-2 Medium)

These continuous semantic prefixes are prepended to the frozen GPT-2 Medium decoder, guiding the auto-regressive text generation strictly according to the semantic intent.

Semantic Encoding & Clustering

The core advantage of IndoSBERT-Large in this architecture is its ability to cluster semantically identical queries (formal queries vs. regional dialects) into tight, distinct groups. Standard token-by-token processing fails here due to informal variations.

Traditional Approach

Who is the dean of my faculty?

Who's lead my faculty?

Semantic gap: Maps to distinct vectors

Proposed Approach (IndoSBERT)

Who is the dean of my faculty?

Who's lead my faculty?

Shared Semantic Vector Space

Ablation Study & Hyperparameter Optimization

Rigorous hyperparameter tuning and ablation studies were conducted to determine the optimal configuration. The results validate that explicit semantic guidance and a specific prefix token length are strictly necessary to prevent severe performance degradation.

Model A: Full Benchmark

IndoSBERT-Large + GPT-2 Medium (3 Prefix Tokens).

0.782

BLEU Score

Model B: Semantic Ablation

Encoder removed. Prefixes initialized as randomly trainable vectors.

0.501

BLEU Score

Model C: Pretraining Ablation

Encoder architecture maintained but trained entirely from scratch.

0.583

BLEU Score

Grid Search Highlights

Model Encoder	Model Decoder	Prefix Tokens	BLEU	chrF
IndoSBERT-Large	GPT2-Medium (Indo)	3	0.782	89.672
IndoSBERT-Large	GPT2-Large (Indo)	3	0.757	86.971
IndoSBERT-Large	GPT2-Medium (Indo)	5	0.724	85.872
Transformer (Baseline)	Transformer	0	0.653	77.654
IndoBERT-Large	GPT2-Medium (Indo)	0	0.591	70.941

Results Achieved

Evaluated against a standard baseline Transformer model trained from scratch, the proposed prefix-tuned architecture displayed remarkable improvements:

0.653 → 0.782

BLEU Score Improvement (~20%) over standard baseline

77.65 → 89.67

chrF Score Improvement handling morphological complexity

0.85%

Only 5.9 million out of ~695 million parameters updated

Want to dive deeper into the research?

The full academic paper is available, detailing the ablation studies, cosine similarity embeddings, and hyperparameter tuning phases.

Read the Full Paper →