Sorry, this listing is no longer accepting applications. Don’t worry, we have more awesome opportunities and internships for you.

Big Data Developer

Alis Software

Big Data Developer

Austin, TX
Full Time
Paid
  • Responsibilities
    •  Administer and maintain various Linux systems

      Hadoop Cluster capacity planning for Project - Number of servers, Storage, Memory, CPU, Cores, Network and Racks and switches estimations.

      SSH setup for all servers of the cluster and make sure servers are connected to each other.

      Installation of Java JDK 1.6.0 in all servers

      Apply RHEL(Red Hat Enterprise Linux ) patch for all Bigata servers

      Reboot and Restart all services of all BigData Servers in support of RedHat Linux patching.

       

    • • Install/Configure/Deploy of Bigdata Applications in Linux systems in a virtualized environment.

      Installation of Horton Data Platform (HDP 2.2) and Hortonworks Hadoop Distribution(HDFS and YARN 2.6.0) in Linux servers in Test, UAT and Production Environment.

      Installation of Ambari Server and Ambari Agents(Ambari 1.7.0) for managing Hadoop Servers and services.

      Configuration of HDFS, YARN and Zookeeper properties for the operation of Hadoop.

      Integration of Kerberos and LDAP with Hadoop for Authorization and Authentication of servers and file systems.

      Setting up Kerberos principals/creation of Keytabs.

    • • HDFS: Managing and Monitoring HDFS in Bigdata(HADOOP) cluster, cluster performance tuning, Cluster backup, managing and reviewing log files and alert set up for various services and ecosystems.

      Commissioning and Decommissioning of servers in Hadoop cluster as and when needed based on Business requirements

      Cluster performance tuning like adding Memory(RAM) and Cores to each of the servers in the cluster.

      Assign the ownership and permission of various HDFS Data folders and files based on the criticality of data

      Analyze the Hadoop server log files when Hadoop services are not running in the cluster or any servers and fixing the issue.

      Configuration of High availability for Hadoop services like Namenode(Active and Standby), Resource Manager(Active and Standby)

      Take back up of cluster files using 'Distcp' during maintenance and recovery activities before facing unexpected disaster.

      Modify various default properties in application properties files(.xml) in order to tune and run the applications with good performance

      Write bash script to alert if services like Namenode, Resource manager, Node manager, Zookeeper, and Quorum Journal Manager are not functioning on respective servers.

      Create bash script to alert when HDFS space in each Datanodes exceeds the maximum limit and scheduled through crontab

      Create bash script that will send a daily report of the data size of each application running in Hadoop i.e in HDFS, to analyze the performance

    • • FLUME: Installation and configuration of Flume, Sqoop and managing all Flume nodes. This includes Configuration of Sources, Channels, and Sinks in Flume agents. Recovery of Flume nodes and Flume jobs.

      Installation of Flume application (Flume 1.5.2) in Linux servers. This will be done in Test, UAT environment followed by Production environment.

      Configuration of Flume Sources, Channels and Sinks and related properties for various Flume agents that connects Kafka and Hadoop servers for processing Streaming and Batch Jobs.

      Configuration of system log file for each flume server and log files for each of flume agents running in each server

      Write monitoring script to alert if any Flume nodes and Flume agent are not running at any point of time and scheduled using crontab.

      Recovery: Analyze log files of each flume agents and restarting those agents to make sure applications are up and running.

      Perform CMR(Change Management request) to add a new set of Flume Agents in flume servers for new releases.

       

    • • KAFKA: Install and Deploy Kafka, Manage Kafka Brokers, Zookeeper, Broker Recovery and manage Topics. Also, this involves activities like Topic Partitioning, Replication and Offset Management.

      Installation of Kafka Streaming application(Kafka 0.10.0.2) in Linux servers in TEST, UAT and Production Environment.

      Configuration of Producers and Consumers in Kafka Brokers(servers) to pull data events from Flume and push them to Spark application

      Create Kafka Topics in Kafka Brokers to hold and process various application data that comes from flume servers

      Configure the partition for each Kafka topics in Kafka Brokers

      Assign the Replication factor(i.e copy of partition) for each partition to make Kafka as Fault Tolerant

      Configure zookeeper in each of Kafka Brokers to maintain the liveliness of Kafka.

      COnfigure offset to make sure each event stored in sequential order in Kafka partitions and to achieve data continuity over the lifecycle of the streaming process

      Kafka Broker Recovery: Written an automation script that will watch the status of Kafka and Zookeeper and will restart them as soon they went down.

      Monitor the disk space of all Partitions in each Kafka Broker.

    • • SPARK: Spark cluster setup on YARN cluster manager. Monitor and recovery spark applications

      Perform Spark cluster capacity planning along with the right amount of resources to schedule the spark jobs in the cluster.

      Installation of Spark Application(Spark 1.6.8) in Linux servers in TEST, UAT and Production Environment.

      Configuration of various properties of Spark to run in YARN cluster mode.

      Schedule Batch and Streaming jobs on various spark servers using crontab.

      Monitor both Streaming jobs and Batch jobs that run in Spark cluster and make sure jobs are running with performance.

      Perform CMR(Change Management request) to add a new set of Spark jobs in the cluster in all TEST, UAT and Production Environment.

      Migration of Spark jobs from the Spark cluster in the Dallas data center to the new Spark cluster in the San Antonio data center.

      Expand the existing environment by adding the number of spark servers to cluster when more jobs are deployed for business.

      Create Splunk Dashboard report of various Spark jobs like the number of jobs running currently, number of jobs failed daily, how much memory and cores currently in use.

      As part of efficient monitoring, create

      - A bash script that reports performance related errors and exceptions of each spark application running every day.

      - A bash script that reports duplicate running jobs concurrently for analyzing job performance.

      - A bash script for reporting latency between spark servers.

       

    • • Upgrade and Patching of Big Data applications like Kafka, Flume, Sqoop,

      Testing Compatibility of HDP latest version(Hortonwork Data Platform) with Spark, Kafka and Flume application current version in Test Environment.

      Perform Upgrade of HDP from HDP HDP 2.2 to HDP 2.6 and Java JDK 1.6.0 to JDK 1.7.0

      Perform Upgrade ofAmbari 1.7.0 to Ambari 2.5

      Stop the application/services/agents before the upgrade and restart the application

      Health checkup of applications after the upgrade activity in Production Environment.

    • • Resolve customer-facing issues which have been escalated from our Production Support team.

      Increase disk space in each of Hadoop/Flume/Kafka/Spark servers if Bigdata servers are running out of space.

      Balance the Kafka cluster, by adjusting both Partition size and Replication factor in each Kafka Broker.

      Analyze the Spark cluster performance if support team complaints of the Slow running of jobs

      Implement Backfill activities of Spark Jobs if any data is not processed on previous dates.

      Work with Cassandra Database team to check and fix, if Spark Data read and write operations is slow between Spark and Cassandra.

      Troubleshoot various bottlenecks and network issues in Hadoop and Spark Cluster.

    • • Contribute to the design, development, and execution of application changes.

      Perform research and evaluation of proposed technologies and present ideas in a concise manner to make Architecture work in an efficient way.

      Plan to replace Apache Nifi in place of Flume to make sure Data Ingestion platform as a Fault-Tolerant one.

      Modify/Adjust Spark-Submit parameters like memory, cores and maximum retries for each application to meet its running efficiency.

      Education Requirements:

      Bachelor’s degree in computer science, computer information systems, information technology, or a combination of education and experience equating to the U.S. equivalent of a Bachelor’s degree in one of the aforementioned subjects.