About Me

Professional portrait of Ujjwal Singh Rao, Data Scientist and Vice President at MSCI

Greetings! I'm Ujjwal (@brightertiger), and I extend a warm welcome to my website.

As a seasoned data scientist with over 10 years of professional experience, I've delved into projects encompassing big data analytics, predictive modeling, machine learning, deep learning, and natural language processing. I completed my education at the Indian Institute of Technology (IIT) Kharagpur in 2013. During my leisure time, I enjoy participating in Kaggle competitions as @brightertiger, contributing to open source projects on GitHub, and sharing knowledge with the machine learning community.

If you come across any collaboration opportunities, don't hesitate to get in touch!

Professional Experience

MSCI Vice President

2024 - Present

I am a member of the Data Extraction team, tasked with developing Retrieval Augmented Generation (RAG) pipelines, using Large Language Models (LLMs), for fetching data and information from financial documents.

HERE Maps Lead Data Scientist

2023 - 2024

I was a member of the Map Observables team, tasked with constructing Self-Driving Maps for BMW's Urban Cruise Control. My work involved:

  • Tackling global-scale challenges by harnessing petabytes of data for creating high-definition maps in the field of autonomous driving. I have successfully enhanced crucial performance indicators such as False Positives, False Negatives, and Accuracy by more than 50% when compared to traditional legacy systems.
  • Applying machine learning algorithms and XGBoost models to integrate data observations from diverse input sources, including dashcams and overhead imagery. This process allows to deduce the accurate location and attributes of road signs.
  • Crafting innovative graph-based solutions to counteract positional observation drift from drive-based data sources used in map content. This implementation resulted in a notable reduction of False Positives by around 5%, surpassing the performance of radial search-based clustering.
  • Constructing a question-answering engine using LLAMA over extensive product and data requirement documents for data validations. This tool empowers users to efficiently search through these documents, extracting details and significantly enhancing productivity.

Gojek Tech Senior Data Scientist

2021 - 2023

I was a member of the Care Tech team, where I leveraged machine learning, deep learning, and natural language processing techniques to extract insights and facilitate automation. This involved analyzing customer service interactions across diverse channels such as email, in-app requests, chat, Twitter, and more. My work involved:

  • Facilitating AI/ML-driven intent detection through the implementation of multilingual NLP models. I developed intent classification models based on XLM-RoBERTa to support various languages, including Bahasa and English, achieving an accuracy rate exceeding 80%. Additionally, I deployed these models into production using torchscript and MLFlow.
  • Constructing named entity recognition (NER) models based on IndoBERT, utilizing open-source IndoNLU datasets. These models were designed to identify entities such as food, quantity, date, and chit-chat within text utterances.
  • Enhancing the search experience for help center articles by incorporating tags to encompass semantic diversity in search queries. I implemented a TF-IDF and Logistic Regression pipeline to extract pertinent keywords for each article, contributing to an improved search functionality.
  • Establishing a pipeline for issue discovery to identify emerging themes in service tickets and app reviews. Utilizing PyLDAVis and BERTopic libraries, I implemented topic modeling. Additionally, I trained sentence transformer models using SetFit for better results.

American Express 2014 - 2021

7+ years of progressive experience

Senior Data Scientist 2018 - 2021

I was part of the data science team working on Natural Language Understanding (NLU) layer of the AskAmex chatbot. My work involved:

  • Training transformer-based models (like BERT, distilBERT, RoBERTa etc.) for intent classification. I removed label noise from training datasets using various robust machine learning techniques which lead to 5% increase in prediction accuracy.
  • Building human-in-the-loop (HITL) pipelines for collecting labeled data at a minimal cost. I used weak supervision and active learning strategies to filter relevant data points for annotation. I built various interactive tools to help data labelers work efficiently. I introduced best practices and quality checks in the annotation pipelines to ensure high-quality output.
  • Collaborating with product teams to improve customer experience. I built interactive tools to visualize the performance of servicing journeys. These tools helped identify the edge cases that often lead to automation failures. I introduced tracking around sentiment level KPIs (apart from automation) to holistically capture the channel performance.

Data Scientist 2017 - 2018

I was part of the data science team working on an offer recommendation engine for the mobile app and website. My work involved:

  • Building factorization machine models to predict click-through rate. I built spark-based feature engineering pipelines to process terabytes of clickstream data for training these models. The models were part of the final stacked ensemble that got deployed in production.
  • Optimizing impression caps on offers to drive higher overall engagement on the channel. I built xgboost models to analyse the sensitivity of click-through rate with respect to impressions. I used the partial dependency plots from these models to identify the impression cap that maximised f-beta score.

Senior Data Analyst 2015 - 2017

I was part of the modeling team working on up-sell, cross-sell targeting via email campaigns. My work involved:

  • Building artificial neural network-based models. These were binary classification models which predicted the probability of an existing customer taking up a more premium product. These models replaced the legacy logistic regression models by delivering better performance while simultaneously driving operational efficiency.
  • Migrating the legacy data transformation and feature engineering pipelines from sas to python to support the deployment of above mentioned neural network models in production. Enabled automated re-training pipelines to solve for data drift.

Data Analyst 2014 - 2015

I joined the customer marketing team focusing on international markets (non-US). I worked on:

  • Targeting strategy for dynamic email campaigns in partnership with movable ink. The focus was to increase customer spending on small merchants in the UK. I analyzed transaction data to understand the location and category preferences of the customers. The analysis generated content-based recommendations displayed to the customer via dynamic emails. The open and click rates for these campaigns were significantly higher than the long term average.
  • Supporting a joint venture with Gurunavi. Amex partnered with Gurunavi to offer dining recommendations to customers in Japan. I designed customer segments by clustering spending patterns across various industry verticals. The customer segments mapped to different personas, each of which received an exclusive set of restaurant recommendations.

Education

Georgia Institute of Technology

Master of Science in Analytics

2020 - 2025

Open Source & GitHub (@brightertiger)

Active contributor to the machine learning and data science community through open source projects and repositories.

GitHub Profile: github.com/brightertiger

Explore my repositories, contributions, and open source projects in machine learning, data science, and AI.

Kaggle Achievements (@brightertiger)

Competitions Master

3 Gold, 12 Silver and 4 Bronze medals across various machine learning competitions as @brightertiger

  • Ranked 3rd / 1621 in Jigsaw Multilingual Toxic Comment Classification Challenge
  • Ranked 6th / 3308 in SIIM-ISIC Melanoma Detection Challenge
  • Ranked 16th / 3943 in Talking Data AdTracking Fraud Detection Challenge

Technical Articles

Multi-Agent LLM System using Google-ADK

A comprehensive walkthrough of building a powerful multi-agent system with Google's Agent-Development-Kit (ADK) and Model-Context-Platform (MCP) for website crawling, address extraction, and geocoding applications.

Technical Skills

Programming Languages

Python, R, SQL, Apache Spark

Machine Learning Frameworks

Scikit Learn, XGBoost, PyTorch, TensorFlow, Hugging Face Transformers

Deployment & DevOps

Docker, FastAPI, Flask, MLFlow, AWS SageMaker, AWS Lambda, GitHub Actions

Data Visualization

Matplotlib, Seaborn, Plotly, Streamlit