top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Legal Case Outcome Classifier [Machine Learning Project]

Project type

NLP Machine Learning

Date

Jan 2024 - May 2024

Location

Singapore

🏛️ Every year, there are 350,000 legal cases in Singapore, each with a textual document written to record its background information and outcome.

💡 Can we find some patterns in the legal documents, which can help us determine the case outcome given the background and meta-information of the case?

This machine learning project records an attempt at predicting the outcomes of an appeal given the case description and other meta-information of the case.

📊 Data Collection - Our team scrapped ~4k legal documents from the websites of local courts, and after data preprocessing based on criteria such as the availability of explicit outcomes, ~2k textual documents were kept.

🧑🏻‍💻 Feature Engineering - Different feature engineering methods, such as tf-idf score, word2vec, SBERT etc, were applied to the eventual set of training data. Meta-information of the case, such as court level, coram, areas of laws etc were also included as features after one-hot encoding.

💽 Data Processing: At the same time as feature engineering, we also checked for the balance, linear separability and curse of dimensionality in our dataset, and applied principle component analysis (PCA) to address the high dimensionality of our legal text dataset.

🏋️ Model Training - Each type of textual feature obtained after feature engineering (excluding the one-hot encoder) was used independently to train different baseline linear models and neural networks, with the one-hot encoder appended to it as additional features.

For more info, please refer to our codes and slides.

bottom of page