overview
• Collect, clean, process, visualize and analyze data before modeling using python
• Develop ML(machine learning, Deep Learning) models to extract information from different document formats such as
• PDF, JPEG, and PNG using computer vision, NLP, and OCR with python
• Train models using Nvidia GPUs such as Tesla T4, Nvidia GTX 1060, and GPU instances available on AWS Sagemaker, and fine-tune these models to achieve higher accuracy
• Work with algorithms such as Faster-R-CNN, YOLO, and SSD for object detection, and use Tesseract OCR tool for text recognition and extraction
• Tag raw textual data into Part-Of-Speech and Inside-Outside-Beginning format using python
• Develop DL models using architectures such as BERT, RoBERTa, DistilBERT, RNN, LSTM, and GRU for text classification, sentiment analysis,
• Develop named entity recognition models using DL and libraries like spacy, nltk and flair, to extract specific textual data from documents
• Developed a synonyms recommendation application using GloVE 300d embeddings at the backend with python
• To update existing ML, DL models on new data, implemented online/incremental machine learning
• Work with TensorRT to optimize trained ML, DL models for run-time performance and handle database with MySQL and MongoDB