Rishabh Garg

About

Welcome to my professional space!

I'm Rishabh Garg, a graduate student pursuing a Master's degree in Computer Engineering at the University of Illinois Urbana-Champaign with a focus in Data Science & Machine Learning, expected to graduate in summer 2024.

Leveraging my experience as a Data Scientist at HP Inc., I have honed my skills in analytics and problem-solving by implementing cutting-edge analytics to drive actionable insights, optimize performance, and deliver substantial cost savings.

With a Bachelor's degree (B.Tech.) in Electrical and Electronics Engineering from Manipal Institute of Technology, my coursework has spanned Data Structures, Algorithms, Linux Shell Scripting, Web Development, and Embedded Systems.

As I approach the culmination of my academic journey, I am eager to explore full-time opportunities across the globe in the realms of Data Science, Machine Learning, and Software Development. I am actively seeking exciting opportunities aligning with my expertise, I'd love to connect and contribute to impactful projects.

Beyond coding, I find joy in activities like table tennis , and bowling . I'm an avid Formula 1 fan, drawn to its speed and teamwork . Staying updated on AI breakthroughs complements my diverse interests.

Education

GPA: 3.6/4.0

Specialized Coursework: Deep Learning, Pattern Recognition, Applied Parallel Programming, Statistical Inference for Engineers and Data Scientists, Database Systems, Artificial Intelligence, Technology Entrepreneurship, Finance for Engineering Management, and Human-Computer Interaction

Offered valuable administrative and logistical support by providing mentorship and guidance to a cohort of 35 master's students in the CS411 - Database Systems course

Credential ID: 723850

Learned Fundamentals of Machine Learning Algorithms and gained hands-on experience with programming exercises and activities in a 10 weeks long program

Specialized Coursework: Deep Learning, Neural Networks, Unsupervised and Supervised Machine Learning Algorithms, and Statistics

GPA: 3.9/4.0

Additional Coursework: Basic Data Structures and Algorithms, Basic Web Design, and Linux Shell Scripting

Work Experience

Elevated remote complaint resolution in the EMEA and Greater Asia & India HP markets by an impressive 42%, employing Natural Language Processing (NLP) and a Bi-directional Encoder Representations from Transformers (BERT) model-based approach. The integration of Large Language Models (LLM) into the customer complaint resolution workflow proved instrumental in eliminating unnecessary engineer visits or parts consumption, addressing gaps in information or domain knowledge among call center agents
Tech Stack: Python, SQL, Vertica DB, MySQL DB, AWS Redshift DB, Large Language Models (LLM), PowerBI, Microsoft Azure, MLOps, MlFlow, Flask API, PySpark

Implemented an advanced text preprocessing and translation pipeline inspired by encoding-decoding techniques, enhancing the summarization, filtering, and management of non-English case notes. The new technique demonstrated remarkable efficiency, reducing 3rd party translation API service costs by 70%, ensuring reliability, and achieving lower latency

Utilized MlFlow on Microsoft Azure and finely-curated data from implemented pipeline to construct, train, and validate BERT base uncased and Distil-BERT base uncased models. Employed various text manipulation techniques, boosting accuracy from 78% to an impressive 84%

Continously worked on training different versions of BERT model with hyperparameter tuning and ensured smooth integration of code into main codebase. Closely worked with IT team and Data Scientists to deliver code changes to staging or pre-production environments, furthermore, carried out testing before deploying the solution into production

Retrained and served improved Distil-BERT model to replace an existing solution for India market as model drift was monitored and a decrease in accuracy was measured over the previous three months.

Spearheaded the expansion into the EMEA and Greater Asia & India markets, resulting in a monumental savings of $2.5 million in a single quarter. Recognized by the executive leadership team for the swift global market expansion

Devised an AI-driven fraud and compliance solution to identify illegitimate warranty claims, leveraging a dataset of 10+ million order details, including geography, product information, and timestamps. Specifically targeting fraud detection through rotating and sequentially editing serial numbers, the implementation is poised to deliver nearly $2 million in annual savings

Additional responsibilities encompass peer code reviews, managing customer and client requests, and disseminating knowledge within other analytics teams

Automated the monitoring and analysis of 5000+ call center agents and 150+ site performances by implementing Principal Component Analysis (PCA) and K-Means Clustering algorithms on multidimensional datasets containing features such as communication skill ratings, professionalism ratings, technical knowledge, Ease of Effort, etc. This led to the generation of accurate training and coaching insights
Tech Stack: Python, SQL, Pandas, Numpy, Scikit-learn, Matplotlib, Seaborn, Vertica DB, MySQL DB, PowerBI, Tableau, Flask API, Plotly, PyTorch, Tensorflow

Collaborated with Subject Matter Experts (SME) worldwide on 100K+ survey cases and massive operational datasets for analysis. Provided holistic insights by evaluating not only survey metrics but also operational metrics such as Turn Around Time (TAT), Re-Repair, resulting in a 20% increase in net promoter score and savings of $500K in operational costs over three quarters

Offered stakeholders, business team members, and site managers real-time insights through a PowerBI dashboard and Flask API for automated email notifications. Additionally, presented a white paper on this innovative data science strategy at the Global Data Science Knowledge Discovery Platform 2021

Established an early warning device failure detection tool to accurately forecast potential breakdowns in devices for 500+ global clients. Monitored features such as page print count, intervention count, age, and usage for Random Forest and Decision Trees models. Pioneered the implementation of the tool across 24 sectors and 1.2 million devices, generating $450K in annual savings

Internships

Material Kit Preparation: Orchestrated the preparation and design of an ergonomic material kit for the assembly line. Devised and implemented a meticulous kit dispatch schedule to optimize Work in Progress (WIP) on the shop floor, ensuring efficiency in production processes
Tech Stack: Microsoft Excel, Microsoft Powerpoint, AUTOCAD, TinkerCAD

Chemical Abnormality Reduction: Conducted a thorough investigation into the consumption of chemicals on the assembly lines. Devised and implemented an advanced Excel tracker, strategically installed on the shop floor, to enhance visibility into inventory volume and daily chemical consumption. Analyzed critical factors such as chemical lead time, lag time, and Turn Around Time (TAT) to optimize the availability of chemicals in inventory, mitigating excess storage, expiry risks, and minimizing non-consumption

Universal Fixture for RF Coils: Devised a universal test fixture for four Radio Frequency (RF) coils by studying and testing transmitter and receiver locations and functionality. This initiative aimed to reduce the number of testing coils on the assembly line

Selected as one of only four on-campus interns for GE Healthcare's prestigious Operations Management Leadership Program (OMLP)

Interned at the manufacturing plant of MEAI, conducting in-depth analysis and acquiring hands-on experience in the design of Printed Circuit Boards (PCBs), manufacturing and installation of Alternators, and the design of Electronic Power Steering (EPS) motors

Interned at ABB India Limited, gaining hands-on experience on the shop floor in the manufacturing of 3-phase induction motors. Observed and learned installation techniques for rotor, end-to-end assembly, testing, painting, labeling, and packaging of motors in various sizes

Skills

Projects

Developed a machine learning model trained on drone data
Tech Stack: Python, MATLAB, Microsoft Excel, Time Series Analysis

Conducted in-depth feature analysis by extracting trend, seasonality, and residuals from 12 time series features. Furthermore, employing advanced techniques like Canonical Correlation Analysis (CCA), and Vector Autoregression (VAR) for variable correlation determination in time

Analyzed residuals with Auto Correlation Function (ACF), Partial ACF, Empirical Mode Decomposition (EMD) and hypothesis testing to efficiently implement data compression via PCA and Auto-encoders

Code:

This endeavor involved accurate forecasting of gold prices based on Kaggle's dataset, showcasing the predictive capabilities of three distinct machine learning models, specifically the XGBoost regressor, Random Forest regressor, and Decision Tree Regressor, hence conducted a comparative analysis of machine learning models
Tech Stack: Python, Pandas, Numpy, Matplotlib, Seaborn, Scikit-learn, XGBoost, and model performance metrics such as R2, MAE, along with hyperparameter tuning using Grid Search

Conducted Exploratory Data Analysis (EDA) on a decade's worth of data, incorporating features like S&P, Oil prices, Silver prices, and Euro-USD currency rates recorded monthly. Additionally, assessed correlations between features and engaged in feature engineering to identify the most influential variables for the model

Trained the machine learning models on the pre-processed data, leveraging grid search specifically for the Random Forest model to optimize parameters like the number of trees, features, maximum depth, and samples in each node

Evaluated and ranked the models based on their performance, meticulously examining the variations in predicted output to gain insights into the model behavior

This project not only delivered accurate predictions for gold prices but also offered valuable insights into the comparative performance of XGBoost, Random Forest, and Decision Trees in the context of financial forecasting

Code:

Illustrated the prowess of Generative AI in dialogue summarization through the application of the Flan T5 model
Tech Stack: Python, PyTorch, Transformers, Tokenizers, LLM Model

Implemented zero, one, and few-shot inference strategies, showcasing the model's adaptability to varying degrees of contextual information

Utilized prompt engineering techniques to enhance the model's responsiveness to specific input cues, refining its ability to generate coherent and relevant summaries

The results underscored the FLAN-T5 model's efficacy in summarizing dialogues, offering a nuanced understanding of its capabilities across diverse inference scenarios and underlining its potential for real-world applications

Code:

Led the project on Database Design for an Activity and Recreational Club, architecting a robust database system. This initiative aimed to streamline financial and activity management for a recreational club boasting a membership exceeding 2000 individuals
Tech Stack: Python, SQL, Database Design, MySQL Server, GitHub, Git, Flask API, React.js

Developed a functional and scalable database structure tailored to the unique needs of the recreational club, ensuring optimal data organization and management

Focused on financial modules to facilitate transparent and efficient financial operations, contributing to a more streamlined budgeting and reporting process

The implemented database played a pivotal role in enhancing overall efficiency, providing the club with a comprehensive tool for managing its diverse activities and financial transactions seamlessly

Code:

The primary objective was to elevate the performance of the ResNet-18 model on PASCAL dataset by implementing various data augmentation techniques, achieving a significant boost of approximately 6.5% in accuracy
Tech Stack: Python, ResNet-18, Diffusion Models, PASCAL dataset, Data Augmentation Techniques

Constructed a diverse PASCAL dataset, ensuring comprehensive representation of image features and categories to enhance the model's ability to generalize

Implemented six sophisticated augmentation techniques based on diffusion processes including diffusion mixup, mixup, autoaugment, randaugment, cutmix, and cutout etc. injecting diversity into the dataset for improved model robustness

Focused on optimizing the ResNet-18 model, a widely-used convolutional neural network architecture, using hyperparameter tuning to leverage the augmented dataset effectively

The augmentation techniques applied not only added variety to the dataset but also aided in mitigating overfitting, contributing to the observed performance boost

The achieved 6.5% improvement in model accuracy underscores the effectiveness of diffusion-based augmentation strategies in enhancing image data for more robust and accurate deep learning models