overview
- Responsibilities
- Contributing to the development of key data integration and advanced analytics solutions leveraging Apache
- Involved in Agile methodologies, daily Scrum meetings, and Sprint planning
- Performed Data transformations in HIVE and used partitions, and buckets for performance improvements
- Created Hive external tables on the MapReduce output before partitioning
- Developed business-specific Custom UDFs in Hive and Pig
- Built scalable and deployable machine learning models
- Utilized Sqoop to ingest real-time data. Used analytics libraries Sci-Kit Learn, MLLIB
- Extensively used Python's multiple data science packages like Pandas, NumPy, matplotlib, Seaborn, SciPy, Scikit-learn, and NLTK
- Performed Exploratory Data Analysis, trying to find trends and clusters
- Built models using techniques like Regression, Tree-based ensemble methods, Time Series forecasting, KN, Clustering, and Isolation Forest methods
- Tasked with maintaining RDDs using SparkSQL
- Communicated and coordinated with other departments to collect business requirements
- Tackled highly imbalanced Fraud dataset using under-sampling with ensemble methods, oversampling, and cost-sensitive algorithms
- Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn
- Used Python to preprocess data and attempt to find insights
- Iteratively rebuild models dealing with changes in data and refining them over time
- Created and published multiple dashboards and reports using Tableau server
- Extensively used SQL queries for legacy data retrieval jobs
- Tasked with migrating the Django database from MySQL to PostgreSQL