← Back to Portfolio

SpeAKN: Speak for ALS with Korean NLP

2023 — AI-Powered Communication System for ALS Patients
Hankuk University of Foreign Studies • HUFS AI Project
Look to Speak Application Interface
Figure 1: 'Look-to-Speak' Application Interface
Assistive Technology Research
Korean NLP Eye-Tracking Speech Recognition Transformer Models GRU Networks Assistive Technology

Project Overview

SpeAKN (Speak for ALS with Korean NLP) is an innovative AI-powered communication system designed specifically for ALS patients who have lost their ability to speak. The system combines eye-tracking technology with advanced Korean natural language processing to provide contextually appropriate response suggestions, enabling meaningful communication for patients with progressive motor function decline.

AI-TRACKING System Pipeline
Figure 2: Overall model architecture overview. Two AI models were used: the first model processes voice data, and the second model uses the sentence output from the first model as input to generate final response sentences.

Problem Statement

ALS (Amyotrophic Lateral Sclerosis) is a neurodegenerative disease in which motor nerve cells in the brain and spinal cord progressively deteriorate. Over time, ALS patients lose their ability to communicate using natural language, with 80-95% of patients requiring alternative communication methods known as AAC (Augmentative and Alternative Communication).

Existing solutions like "Look to Speak" require users to make numerous selections before reaching their intended response, and users cannot pre-input their desired responses. Our AI-TRACKING system addresses these limitations by combining eye-tracking technology with artificial intelligence to provide contextually relevant response options.

Technical Architecture

System Pipeline

The SpeAKN system operates through a two-stage AI pipeline:

  1. Speech-to-Text (Wav2Text): Converts incoming voice questions to text using advanced speech recognition
  2. Context-Aware Response Generation (Text2Text): Analyzes the textual context and generates appropriate response options
  3. Eye-Tracking Selection: Patients select their intended response using eye movements
SpeAKN Model Architecture
Figure 6: Overall architecture of the SpeAKN model. Composed of Wav2Text and Text2Text models, appropriately utilizing transformer encoder and self-attention layers. Unlike existing models, a GRU layer was added to focus on learning Korean word order.

Data and Training

Dataset

This project utilized the National Institute of Korean Language's datasets:

Model Optimization

We experimentally tested various optimization approaches:

Optimizer Comparison
Figure 3: X-axis represents training epochs, Y-axis represents training time. Shows that the AdamW Optimizer's training time stabilizes during the learning process.

The AdamW Optimizer demonstrated superior performance with faster learning progression and better stabilization during training compared to the Sophia Optimizer.

Architecture Innovations

Our SpeAKN model employs several key innovations:

LSTM vs GRU Comparison
Figure 4: (Top) LSTM layer, (Bottom) GRU layer. Moving rightward shows the parameters of learned hidden states. Compared to the LSTM layer, the GRU layer shows that red parameters (vanishing gradients) disappear at H5.

Audio Processing Optimization

To handle varying audio lengths efficiently, we analyzed the distribution of audio data lengths:

Audio Length Distribution
Figure 5: Histogram representation of audio data lengths. The maximum length is 220,000 while the average is 25,000, approximately 10 times smaller. By setting audio data length to 100,000, we can avoid the curse of dimensionality.

We set the standardized audio length to 100,000 samples to balance computational efficiency with data preservation, avoiding the curse of dimensionality while maintaining essential information.

Results and Performance

Initial Performance Metrics

Model Component Train MSE Validation MSE Input Type Output Quality
Speech2Text 10.039 12.569 Voice/Question Moderate accuracy
Text2Text 11.234 13.788 Text/Answer Requires improvement

Attention Visualization

To verify model learning effectiveness, we visualized attention patterns:

Attention Visualization
Figure 7: Visualizing Attention - Analysis showed the model needed improvement in focusing on important linguistic features.

Data Quality Analysis

Our analysis revealed important insights about the training data:

Training Data Distribution
Figure 8: Pie chart showing the data actually used for training - Only 14.3% of the original dataset consisted of meaningful question-answer pairs.

Korean Language Challenges

We identified specific challenges related to Korean language processing:

Token Frequency Analysis
Figure 9: Bar chart showing tokens appearing more than 5,000 times. Korean particles such as '이' and '는' appear frequently, presenting unique processing challenges.
Token Distribution
Figure 10: Pie chart showing the proportion of tokens by frequency of appearance - 53.8% of tokens appeared only once, suggesting the need for better handling of rare tokens.

Eye-Tracking Implementation

The final system integrates eye-tracking technology for user interaction. When a question like "아픈 곳은 없어요?" (Do you have any pain?) is processed, the system generates contextually appropriate response options that patients can select through eye movements.

ALS Communication System Demonstration
ALS Communication System - Real-time demonstration of eye-tracking interface and communication workflow

Technical Implementation Details

Key Features

Technical Stack

Team Collaboration

Development Team

  • June Lee — Project Lead, NLP Architecture
  • Ohjoon Kwon — Speech Processing
  • Youngjin Jeong — Model Optimization

Research Team

  • Chaeyeon Kim — Data Analysis
  • Jeongmin Lee — Eye-Tracking Implementation
  • Faculty Advisors — Clinical Guidance

Impact and Future Directions

Clinical Significance

SpeAKN represents a significant advancement in assistive technology for ALS patients by:

Future Enhancements

Open Source Contribution

The eye-tracking implementation is available as an open-source project: https://github.com/junhyk-lee/Look_to_Speak

Research Publications

This work contributes to the growing body of research in assistive technology and demonstrates the potential of AI-powered solutions for improving quality of life for patients with neurodegenerative diseases.