Sorry, this listing is no longer accepting applications. Don’t worry, we have more awesome opportunities and internships for you.

Data Engineer Python PyTest PySpark

PRI Technology

Bethlehem, PA

Full Time

Paid

Similar Jobs

Responsibilities
Job Description

Job Description

Data Engineer Python PyTest PySpark

My name is Bill Stevens and I have a new six month plus Hybrid Scheduled Data Engineer, Data Analytics and Data Quality opportunity available for a major firm located in Bethlehem, Pennsylvania that could be of interest to you, please review my specification below and I am available at any time to speak with you so please feel free to call me. The work schedule will be a hybrid one, three days a week in the office and two days remote.

The firm will NOT entertain a remote candidate. The ideal candidate should also possess a green card or be of citizenship.

This position pays $75.00 per hour on a w-2 hourly basis or $85.00 per hour on a Corp basis. The Corp rate is for independent contractors only and not third-party firms.

Description:
The firm is seeking an experienced Data Engineer to be part of their Data and Analytics organization. You will be playing a key role in building and delivering best-in-class data and analytics solutions aimed at creating value and impact for the organization and our customers. As a member of the data engineering team, you will help developing and delivery of Data Products with quality backed by best-in-class engineering. You will collaborate with analytics partners, business partners and IT partners to enable the solutions.

The Qualified Candidate will:
Architect, build, and maintain scalable and reliable data pipelines including robust data quality as part of data pipeline which can be consumed by analytics and BI layer.
Design, develop and implement low-latency, high-availability, and performant data applications and recommend & implement innovative engineering solutions.
Design, develop, test and debug code in Python, SQL, PySpark, bash scripting as per the firms standards.
Design and implement data quality framework and apply it to critical data pipelines to make the data layer robust and trustworthy for downstream consumers.
Design and develop orchestration layer for data pipelines which are written in SQL, Python and PySpark.
Apply and provide guidance on software engineering techniques such as design patterns, code refactoring, framework design, code reusability, code versioning, performance optimization, and continuous build and Integration (CI/CD) to make the data analytics team robust and efficient.
Performing all job functions consistent with the firms policies and procedures, including those which govern handling PHI and PII.
Work closely with various IT and business teams to understand systems opportunities and constraints for maximally utilizing the firms Enterprise Data Infrastructure.
Develop relationships with business team members by being proactive, displaying an increasing understanding of the business processes and by recommending innovative solutions.
Communicate project output in terms of customer value, business objectives, and product opportunity.

The Qualified Candidate should possess:
5+ years of experience with Bachelors / master’s degree in computer science, Engineering, Applied mathematics or related field.
Extensive hands-on development experience in Python, SQL and Bash.
Extensive Experience in performance optimization of data pipelines.
Extensive hands-on experience working with cloud data warehouse and data lake platforms such as Databricks, Redshift or Snowflake.
Familiarity with building and deploying scalable data pipelines to develop and deploy Data Solutions using Python, SQL, PySpark.
Extensive experience in all stages of software development and expertise in applying software engineering best practices.
Experience in developing and implementing Data Quality framework either home grown or using any open-source frameworks such as Great Expectations, Soda, Deequ.
Extensive experience in developing end-to-end orchestration layer for data pipelines using frameworks such as Apache Airflow, Prefect, Databricks Workflow.
Familiar with RESTful Webservices (REST APIs) to be able to integrate with other services.
Familiarity with API Gateways such as APIGEE to secure webservice endpoints.
Familiarity with concurrency and parallelism.
Familiarity with Data pipelines and ML development cycle.
Experience in creating and configuring continuous integration/continuous deployment using pipelines to build and deploy applications in various environments and use best practices for DevOps to migrate code to Production environment.
Ability to investigate and repair application defects regardless of component: front-end, business logic, middleware, or database to improve code quality, consistency, delays and identify any bottlenecks or gaps in the implementation.
Ability to write unit tests in python using unit test library such as pytest.

Additional Qualifications (Nice to have and NOT required.):
Experience in using and implementing data observability platforms such as Monte Carlo Data, Metaplane, Soda, bigeye or any other similar products.
Expertise in debugging issues in Cloud environment by monitoring logs on the VM or use AWS features such as Cloudwatch.
Experience with DevOps tech stack such as Jenkins and Terraform.
Experience working with concept of Observability in software world and experience with tools such as Splunk, Zenoss, Datadog or similar.
Ability to learn and adopt to new concepts and frameworks and create proof of concept using newer technologies.
Ability to use agile methodology throughout the development lifecycle and provide update on regular basis, escalating issues or delays in a timely manner.

The interview process will include an initial telephone or Zoom screening.

Please let me know your interest for this position, availability to interview and start for this position along with a copy of your recent resume.

Data Engineer Python PyTest PySpark

PRI Technology

Job Description