Data Scientist Internship - Natural Language Processing
Data Scientist Internship - Natural Language Processing Cambia Health 70 reviews - Seattle, WA Internship Overview Data Scientist Internship - Natural Language Processing Seattle, WA This internship position is scheduled to begin in May/June 2018 Responsibilities & Requirements Cambias Data and Technology Solutions Department delivers innovative data and technology products, services, and solutions that will help drive Cambia to its 2020 vision of person-focused health care transformation. Looking for a passionate, talented and inventive Data Scientist intern to help build industry-leading speech and language solutions. Together with a highly multi-disciplinary team of scientists, engineers, strategic partners and subject domain experts, you will work on building a real product with natural language processing and machine learning at its core. Essential Function of the NLP Data Scientist Internship: * Utilize statistical natural language processing to mine unstructured data and create insights * Build and optimize cutting-edge natural language understanding systems such as conversational agents (chatbots) * Build core in-house NLP components and analytical tools such as document clustering, topic analysis, text classification, named entity recognition, sentiment analysis, and part-of-speech tagging methods for unstructured and semi-structured data * Identify and deploy existing machine learning, natural language processing, and information retrieval techniques and systems for knowledge management and discovery, such as using Electronic Medical Records (EMR) data, progress notes, and discharge summaries to identify admitting diagnosis, reason for consultation, clinical history, etc. * Identify ways to analyze consumers experiences from various communication channels and improve customer satisfaction * Cluster and analyze large amounts of user generated content and process data in large-scale environments in Amazon AWS such as EC2, EMR, MapReduce, and PySpark * Integrate the NLP pipeline into the production environment, ensure its scalability, and leverage knowledge gained into other projects, modeling, and work practices * Design novel algorithms for problem solving, which may include data cleaning, feature selection, statistical modeling, data clustering and classification, text processing, and other machine learning techniques, to solve complex healthcare problems presented by healthcare organizations * Collaborate with different functional teams within Cambia and externally to find solutions to problems in healthcare Key Qualifications and Experience: * Currently enrolled in an undergraduate or graduate degree program focused on Big Data, Computer Science, Data Analytics, Engineering, Math, Statistics, Science or related degree program (preference will be given to graduate students) * Candidates who have completed their degree in the last six months are also encouraged to apply * Strong analytic and problem-solving skills, including the ability to apply quantitative analysis techniques to business situations including forecasting, descriptive statistics, statistical inference, and multivariate modeling techniques * Experience with a good range of NLP techniques, including text processing, tokenization, POS-tagging, parsing, annotation, regular expressions, language modeling, etc. * Ability to develop prototypes by manipulating and analyzing complex, high-volume, high-dimensionality data from various sources * Expertise in producing, processing, evaluating, and utilizing unstructured/semi-structured data * Proficiency in open-source NLP and machine learning toolkits such Stanford CoreNLP, NLTK, Gensim, Mallet, OpenNLP, LingPipe, cTAKES, scikit-learn, NumPy, LIBSVM, MLlib, Theano, TensorFlow, etc. * Solid background in statistical learning and clustering techniques for NLP such as HMM, CRF, SVM, MaxEnt, LDA, LSI, and K-Means * Must have ML/NLP algorithm implementation experience as well as the ability to modify standard algorithms, e.g., change objective functions, work out the math, and implement * Practical ability to visualize data, communicate about data, and utilize data effectively * Proficiency in SQL relational databases and/or NoSQL databases * Ability to think creatively and to work well both as part of a team and as an individual contributor * Eager to learn new algorithms, new application areas, and new tools * Excellent oral and written communication skills to effectively interface and communicate with a broad array of internal and external contacts including leadership * Strong programming skills in at least one object oriented programming language, e.g., Java, Python, C++, Scala, etc. * Fluency with Linux/Unix