Education Data Science

Classifying Agency Speech in South Africa Truth and Reconciliation Hearings Transcripts Using Hybrid BiLSTM-Transformer Model with Multi-Head Attention

Project Year
2025
Abstract

This paper uses a hybrid BiLSTM-transformer model with syntactic, polarity and context feature extractions to classify utterances of low, mixed, high, and not agency in witness hearings held during the 1996-97 South Africa Truth and Reconciliation Commission (SA TRC). The SA TRC addressed gross violations of human rights in South Africa under apartheid, where it sought to establish a historic account of violations and provide redress and healing for affected victims. Using a small dataset of 281 labeled witness utterances, the hybrid model achieves a 0.7228 F1 score across all-label prediction, and 0.78-0.84 prediction accuracy on agency-relevant (low, mixed, high) labels. The model outperforms a baseline finetuned RoBERTA model, especially across agency-relevant labels. The paper also offers improved model interpretation using attention visualization, offering meaningful contributions to the study of psycholinguistics and affective science fields, where emotional constructions may be deconstructed by their linguistic features. The paper finally offers a novel use of NLP to further leverage insights from individual-level testimony, not possible in its historic context due to resource and technology constraints.

EDS Students

Tracy Li
Tracy Li
Class: 2025
Areas of interest: Education in emergencies (EiE), humanitarian response, international education policy, early childhood education, low-tech innovations, teacher training