top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Legal Case Outcome Classifier [Machine Learning Project]

Project type

NLP Machine Learning Project

Date

Jan 2024 - May 2024

Location

Singapore

🏛️ Legal Case Outcome Prediction Project

Overview:
This machine learning project explores the possibility of predicting the outcome of legal cases based on their textual descriptions and meta-information. With approximately 350,000 legal cases annually in Singapore, each documented with case backgrounds and outcomes, our goal was to identify patterns that could help predict appeal outcomes. The project used a combination of legal text data and case metadata, applying various machine learning techniques to build a model that could predict case outcomes based on these inputs.

Roles:
Lead Data Scientist:
- Directed the data collection and preprocessing efforts, ensuring the dataset was cleaned and aligned with the project’s goals.
- Oversaw the feature engineering process, experimenting with different techniques such as tf-idf, word2vec, and SBERT for text embeddings, as well as encoding meta-information like court level and areas of law.
- Managed the application of PCA to address high dimensionality in the dataset and performed balance and separability analysis.
- Supervised model training using both traditional linear models and neural networks, refining each based on performance and feature set.

Skills Learnt:
- Hybrid Feature Engineering Approach: Introduced a hybrid feature engineering strategy combining both traditional text-based features (e.g., tf-idf, word2vec) and advanced contextual embeddings (e.g., SBERT), which had not been previously applied in this domain.
- Dimensionality Reduction with PCA: Applied principal component analysis (PCA) to handle the curse of dimensionality in the legal text dataset, which allowed us to improve model performance and efficiency.
- Meta-Information Integration: Focused on integrating important meta-information (e.g., court level, coram, area of law) in an effective way using one-hot encoding, which significantly enriched the predictive power of the models.

Self-Learnt Skills:
- Advanced Feature Engineering for Legal Text: Gained expertise in using sophisticated techniques like SBERT and word2vec for transforming legal documents into machine-readable representations, learning the nuances of handling domain-specific texts.
- Dimensionality Reduction: Developed a solid understanding of PCA and its application in large, high-dimensional datasets like legal texts, allowing me to optimize model training and reduce overfitting.
- Legal Domain Understanding: Collaborated closely with legal experts to understand the significance of meta-information in case outcomes, improving my ability to select and preprocess relevant features for machine learning tasks.

Results:
- Comprehensive Dataset for Legal Outcome Prediction: Curated a robust dataset of ~2,000 cases with both textual data and meta-information, which served as a solid foundation for model training.
- Multiple Model Development: Trained baseline models using linear approaches and neural networks, with varying degrees of success. The inclusion of meta-information and dimensionality reduction techniques improved the predictive power of the models.
- Insights for Further Work: The project demonstrated the potential of machine learning in legal prediction tasks, highlighting key areas for further research, such as fine-tuning model performance and exploring more advanced neural architectures for better accuracy.

Personal 

Favorite Comedian: Conan O'Brien
Favorite Architecture: CapitaSpring

Favorite City: Singapore/Hefei
Favorite Chinese Food: Chinese Beef Vermicelli 
Favorite Indian Food:  Chicken Korma

There willl be more to come if I am not lazy...


 

Stay connected, subscribe to my website

Thank You for Subscribing!

bottom of page