Big Data Analytics
Apache Hadoop:
- Design distributed systems that manage "Big data"
- Use HDFS and MapReduce for storing and analyzing data at scale
- Use Apache Spark to create scripts to process data on a Hadoop cluster in more complex ways
Hive:
- Querying and Managing large datasets that reside in HDFS
- Create Hive Datawarehouse, Databases and Loading different file formats in it
- Write SQL queries in a native language Hive Query Language (HQL)
- Querying complex structures to manipulate data
Apache Spark:
- Load distributed datasets from any storage system
- Work with RDD and perform transformations & actions
- Querying, Manipulate DataFrames & Datasets
- Building Applications with SBT (Spark Build Tool)
- Creating stream processing application
- Build Machine Learning models (MlLib)
Apache Kafka:
- Build real-time streaming pipelines to get data between systems
- Integrate Kafka with Spark Structured Streaming
- Write a Spark application to consume data from the Kafka server
• Power BI:
- Clean, stage, and integrate data sources with Power Bi
- Abstract data complexities and provide users with intuitive, self-service BI capabilities.
- Build business logic and analysis clients’ solutions via the DAX programming language
• Data analysis with Pandas:
- Analysing and manipulate tabular data
- Group, merge and aggregate data frames
- Fix messing and invalid values in data
- Querying, filtering and sorting data frames
• Deep learning with PyTorch:
- Exploring and preparing datasets
- Working with multiple data types and choose the best algorithms for the model
- Building models using multiple algorithms
- Model training, evaluation, and make predictions
• SSDT (SQL Server Data tools) SSIS, SSAS:
- Build data marts, OLAP cubes, and tabular models
- Load and cleanse data with SQL Server integration services
- Manipulate and analyse data using MDX and DAX queries
- Create KPIs and digital dashboards
- Implement time-based analysis