BIG DATA SOLUTIONS ENGINEER JAVA PYTHON SCALA HBASE HIVE MAPREDUCE ETL KAFKA MONGO POSTGRES BIG DATA SOLUTION ENGINEER
Experience
5 to 7 Years
Industry
Education / Training
Functional Area
Production / Service Engineering / Manufacturing / Maintenance
JOB DESCRIPTION
We are looking for highly experienced Java Hadoop Big Data Engineer with experience of working with large-scale, distributed data pipelines. Responsibilities span the full data engineering lifecycle from architecture and design, data analysis, software development, QA, capacity planning and managing the analytics environment as a whole.
JOB RESPONSIBILITIES
* Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time.
* Collaborate with other teams to design and develop and deploy data tools that support both operations and product use cases.
* Perform offline analysis of large data sets using components from the Hadoop ecosystem.
* Evaluate and advise on technical aspects of open work requests in the product backlog with the project lead.
* Own product features from the development phase through to production deployment.
* Evaluate big data technologies and prototype solutions to improve our data processing architecture.
CANDIDATE PROFILE
* BE in Computer Science or related area
* 5-7 year’s software development experience
* Minimum 2 Year Experience on Big Data Platform
* Proficiency with Java, Python, Scala, HBase, Hive, MapReduce, ETL, Kafka, Mongo, Postgres, Visualization technologies etc.
* Flair for data, schema, data model, how to bring efficiency in big data related life cycle
* Understanding of automated QA needs related to Big Data
* Understanding of various Visualization platform (D3JS, others)
* Proficiency with agile or lean development practices
* Strong object-oriented design and analysis skills
* Excellent technical and organizational skills
* Excellent written and verbal communication skills
SKILLS & REQUIREMENTS
Top skill sets / technologies in the ideal candidate:
* Programming language -- Java (must), Python, Scala, Ruby, NodeJs
* Batch processing -- Hadoop MapReduce, Cascading/Scalding, Apache Spark
* Stream processing -- Apache Storm, AKKA, Spark streaming
* NoSQL -- HBase, MongoDB, Cassandra, Riak
* ETL Tools – Data Stage, Informatica,
* Code/Build/Deployment -- git, hg, svn, maven, sbt, jenkins, bamboo
Technologies that we use include:
• R
• Java
• Hadoop/MapReduce
• Flume
• Storm
• Kafka
• MemSQL
• Pig
• Hive
• ETL/ELT