Naga Pratyusha Duvuri

View Resume Contact Me

Data Science & AI Program

Turing College

Big Data Processing and Applications, Data Engineering

University of Oulu

Bachelor of Technology in Information Technology

Pragati Engineering College

2016-2020

8.69 CGPA

About Me Skills

Machine Learning

Supervised & Unsupervised Learning
Predictive Modeling & Forecasting
Classification & Clustering
Feature Engineering
Statistical Inference
Model Tuning (GridSearchCV)
Model Evaluation (Accuracy, ROC-AUC, Precision/Recall)

Programming & Tools

Python
SQL
Django
FastAPI
GitHub
Jira
RESTful APIs
Postman
SoapUI
Power BI (Basic)
Excel

Cloud Platforms

Oracle Cloud (OCI)
Microsoft Azure

Version Control & Workflow Tools

Git
Jupyter Notebook

Exploratory Data Analysis (EDA)

Data Wrangling & Cleaning
Descriptive & Inferential Statistics
Correlation Analysis
Outlier Detection
Feature Selection
Visualization (Matplotlib, Seaborn)
Cloud-based Data Processing

AI & Deep Learning

Large Language Models (LLMs)
Generative AI

A bit about me

I’m a Data Scientist and Python Developer with over 4 years of experience in software engineering, AI/ML development, and middleware integration. I specialize in building scalable, production-ready machine learning solutions, with a strong focus on large language models (LLMs), AI model optimization, and end-to-end data science workflows.

Throughout my career at Capgemini and G2i Inc., I have delivered impactful AI solutions that improve model performance, optimize enterprise systems, and drive data-driven decision-making. I have hands-on experience across Microsoft Azure and Oracle Cloud, including MLOps pipelines, API integration, and deployment of machine learning models in complex enterprise environments.

I am certified in Microsoft Azure, Oracle Cloud, and Generative AI, and I thrive at the intersection of AI innovation and practical implementation, helping organizations harness the power of data to automate processes and unlock actionable insights.

Key Highlights:

Expertise in machine learning, data preprocessing, model evaluation, and deployment.
Strong experience with cloud platforms, MLOps, and enterprise-grade AI solutions.
Hands-on work with LLMs, APIs, and AI-driven automation.
Recognized for enhancing model performance and optimizing systems for real-world business impact.

Work Experience

G2i Inc — Data Scientist (Freelancer)

October 2024 - Currently working

Improved LLM output quality by 30% through systematic evaluation, structured scoring frameworks, and targeted fine-tuning recommendations.
Reviewed and optimized AI-generated code across Python, SQL, Java, C, and networking scripts, increasing accuracy and maintainability for enterprise applications.
Identified defects and performance bottlenecks in generative AI workflows, reducing model response inconsistencies by 25%.
Collaborated with engineering teams using GitHub, Python, SQL, REST APIs, and LLM evaluation tools to enhance model alignment and code generation reliability.

Capgemini — Associate Consultant

September 2020 – April 2024

SCUBI Maintenance Support — Cloud & Batch Processing

Developed and maintained Python-based batch processes deployed on Oracle Cloud (OCI), improving system stability and supporting large-scale employment benefits data processing.
Automated recurring operational tasks and optimized batch scheduling, resulting in a 25% reduction in job runtime and a 40% improvement in pipeline reliability.
Monitored and validated cloud-based batch jobs, resolving failures and increasing data flow consistency across environments.
Collaborated with cross-functional teams using Azure, OCI, Python, SQL, Git, and automation tools to streamline production workflows.

Middleware Engineering — Atradius Credit Insurance

Designed and maintained enterprise integrations using Oracle SOA Suite, OSB, BPM, and JMS, enabling secure and efficient communication across insurance systems.
Built RESTful and SOAP web services that supported critical business workflows and improved integration performance by 35%.
Created BPMN workflows integrated with SOA services and backend databases, automating decision-making and reducing manual processing time.
Conducted performance tuning using Oracle Enterprise Manager (OEM), increasing throughput and reducing latency in middleware operations.

Projects

Travel Insurance Purchase Prediction

Machine Learning Project

Built a predictive ML model to estimate the likelihood of customers purchasing travel insurance using demographic and travel-related features. Performed extensive EDA, handled missing values, created visualizations, and engineered features to improve model signal quality. Trained multiple models including Logistic Regression, Decision Tree, Random Forest, and XGBoost, using GridSearchCV for hyperparameter tuning. Developed an ensemble classifier using VotingClassifier, achieving higher accuracy and more stable performance than individual models. Identified key factors influencing insurance purchase through feature importance analysis, enabling actionable business insights.
Technologies: Python, Pandas, Scikit-learn, XGBoost, Matplotlib, Seaborn, Jupyter Notebook

Home Credit Default Risk Prediction

End-to-End ML Project

Built a complete ML pipeline to predict loan default probability using real-world credit bureau and financial datasets. Performed advanced preprocessing including missing value treatment, categorical encoding, feature engineering, class imbalance handling, and outlier removal. Conducted in-depth EDA involving distribution analysis, correlation heatmaps, feature visualization, and socioeconomic insights. Trained and optimized XGBoost, LightGBM, and Gradient Boosting models, achieving a ROC-AUC of 0.77 using LightGBM. Deployed the model on Google Cloud Platform (GCP) with an HTTP endpoint for real-time predictions.
Technologies: Python, Pandas, NumPy, Scikit-learn, XGBoost, LightGBM, Seaborn, Matplotlib, GCP, Jupyter Notebook

AI Interview Simulator Web App

LLM-Powered Application

Developed an interactive Streamlit web application that simulates job interviews using AI-generated questions and personalized feedback. Integrated OpenAI/Gemini LLM APIs to generate tailored interview questions based on job title, job description, and uploaded resume. Implemented advanced prompt engineering techniques including Zero-Shot, Few-Shot, Chain-of-Thought, Role-Based, and Self-Critique prompting. Added automated candidate scoring with AI-driven strengths, weaknesses, model answers, and performance analysis. Built customization options such as difficulty levels, creativity controls, question skipping, and raw LLM output debugging. Implemented document parsing using PyPDF2 and python-docx for extracting resume content. Designed the app with a modular structure for scalability and maintainability.
Technologies: Python, Streamlit, OpenAI/Gemini API, PyPDF2, python-docx, Regex, Virtual Environments

Accenture North America Data Analytics and Visualization Job Simulation on Forage

Data Analytics and Visualization Project

In May 2024, I completed the Accenture North America Data Analytics and Visualization Job Simulation on Forage, where I engaged in a comprehensive project for a hypothetical social media client. During this simulation, I honed my skills in data cleaning, modeling, and analysis, ensuring that the data was accurate and ready for insightful examination. I leveraged advanced analytical techniques to uncover trends and patterns that could drive strategic decisions. Furthermore, I developed and presented clear, compelling visualizations and reports that communicated these insights effectively, demonstrating my ability to translate complex data into actionable business strategies.

Research on Application of Artificial Intelligence in Medical Education

Data Management Project

In my final year, I undertook a significant research project on the application of Artificial Intelligence in Medical Education, particularly focusing on distance learning. The project, spanning from December 2019 to May 2020, involved developing an Operational Data Management application using Python and Django. This innovative platform facilitated seamless interaction between students, trainers, and administrators, enabling efficient information sharing and knowledge exchange. By leveraging AI, the project aimed to enhance the adoption of medical education technologies among healthcare providers, thereby improving patient outcomes and operational efficiency on a larger scale. This experience not only honed my technical skills but also deepened my understanding of AI's transformative potential in the healthcare sector.

Spam SMS Filtering Using Machine Learning

Data Quality Improvement Project

In my mini project, "Spam SMS Filtering using Machine Learning," conducted from February to May 2019, I focused on enhancing data quality through real-time classification of spam SMS messages. The project aimed to immediately identify and filter out spam messages upon receipt on a mobile device, addressing the challenge of zero-hour attacks—newly created spam messages that traditional filters might miss. By leveraging machine learning algorithms, we developed a robust model capable of accurately distinguishing between legitimate and spam messages, even when encountering previously unseen spam content. This project underscored the importance of adaptive and proactive spam detection mechanisms in improving communication security and user experience.

Certifications

Oracle Cloud Infrastructure Foundations 2021
Oracle Cloud Infrastructure 2024 Generative AI Certified Professional
Oracle Cloud Infrastructure 2023 Application Integration Professional
Microsoft Azure Certifications: AZ-900, AZ-104, AZ-400
NPTEL Certifications in Programming, Data Structures, and Algorithms Using Python, Enhancing Softskills and Personality
Coursera Certifications in IoT Programming, Advanced Machine Learning with TensorFlow

Publications

Research on Application of Artificial Intelligence in Medical Education
The International Journal of Analytical and Experimental Modal Analysis
Artificial Intelligence (AI) is revolutionizing medical education by enhancing distance learning, virtual inquiry systems, and teaching efficiency. This research explores how AI-driven solutions, machine learning, and intelligent tutor systems improve medical training and personalize learning. Our findings highlight AI’s potential to bridge traditional education with digital innovation in healthcare.