Scala and Pyspark specialization certification courses started

 

Scala & Spark Specilization
Scala & Spark Specialization

 

Data science is a promising field, Where you have to continuously update your skill set by learning the new technique, algorithms, and newly created tools. As the learning journey never ends, we would always seek to find the best resources to start learning these new skill sets. We should be thankful for the great MOOC course providers like Coursera, Edx, Udacity , where all these MOOC course providers main intention is to provide the high-quality content which explains the core concepts in standardized way  to create the virtual world around the user to feel himself like getting step by step to master in those skills.

In this particular post, we are going to share you 2 famous data science specialization certification courses offered from Edx  and Coursera.

  1. Data Science and Engineering with Apache Spark
  2. Functional Programming in Scala

These two specializations are a pack of some series of courses, Which start from basics to advanced level. Generally, it would take somewhere around 5 to 6 months  to get complete knowledge out off this specializations course. All the course videos , reference materials stuff are free of cost but if you indented for the specialization certificate it would cost you some decent dollars.

Data Science and Engineering with Apache Spark

Image Credit: edx.com
Image Credit: edx.org

The Data Science and Engineering with Spark XSeries, created in partnership with Databricks, will teach students how to perform data science and data engineering at scale using Spark, a cluster computing system well-suited for large-scale machine learning tasks. It will also present an integrated view of data processing by highlighting the various components of data analysis pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. Students will gain hands-on experience building and debugging Spark applications. Internal details of Spark and distributed machine learning algorithms will be covered, which will provide students with intuition about working with big data and developing code for a distributed environment.

This XSeries requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), but previous experience with Spark or distributed computing is NOT required. Familiarity with basic machine learning concepts and exposure to algorithms, probability, linear algebra, and calculus are prerequisites for two of the courses in this series.

1. Introduction to Apache Spark

About this course:

Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs. As companies realize this, Spark developers are becoming increasingly valued.

This statistics and data analysis course will teach you the basics of working with Spark and will provide you with the necessary foundation for diving deeper into Spark. You’ll learn about Spark’s architecture and programming model, including commonly used APIs. After completing this course, you’ll be able to write and debug basic Spark applications. This course will also explain how to use Spark’s web user interface (UI), how to recognize common coding errors, and how to proactively prevent errors. The focus of this course will be Spark Core and Spark SQL.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), but previous experience with Spark or distributed computing is NOT required. Students should take this Python mini-quiz before the course and take this Python mini-course if they need to learn Python or refresh their Python knowledge.

What you’ll learn:

  • Basic Spark architecture
  • Common operations
  • How to avoid coding mistakes
  • How to debug your Spark program

2. Distributed Machine Learning with Apache Spark

About this course:

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability, and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

What you’ll learn:

  • The underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines
  • Exploratory data analysis, feature extraction, supervised learning, and model evaluation
  • Application of these principles using Spark
  • How to implement distributed algorithms for fundamental statistical models.

3. Big Data Analysis with Apache Spark

About this course:

Organizations use their data to support and influence decisions and build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term ‘data science’.

This statistics and data analysis course will attempt to articulate the expected output of data scientists and then teach students how to use PySpark (part of Spark) to deliver against these expectations. The course assignments include log mining, textual entity recognition, and collaborative filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), and previous experience with Spark equivalent to Introduction to Apache Spark, is required.

What you’ll learn:

  • How to use Apache Spark to perform data analysis
  • How to use parallel programming to explore data sets
  • Apply log mining, textual entity recognition and collaborative filtering techniques to real-world data questions.

4. Advanced Apache Spark for Data Science and Data Engineering

About this course:

Gain a deeper understanding of Spark by learning about its APIs, architecture, and common use cases.  This statistics and data analysis course will cover material relevant to both data engineers and data scientists.  You’ll learn how Spark efficiently transfers data across the network via its shuffle, details of memory management, optimizations to reduce the compute costs, and more.  Learners will see several use cases for Spark and will work to solve a variety of real-world problems using public datasets.  After taking this course, you should have a thorough understanding of how Spark works and how you can best utilize its APIs to write efficient, scalable code.  You’ll also learn about a wide variety of Spark’s APIs, including the APIs in Spark Streaming.

What you’ll learn:

  • Common use cases for Spark
  • Details of internals like the shuffle, Spark SQL’s Catalyst Optimizer, and Project Tungsten
  • A deep architectural overview
  • Spark Streaming
  • Spark ML

5. Advanced Distributed Machine Learning with Apache Spark

About this course:

Building on the core ideas presented in Distributed Machine Learning with Spark, this course covers advanced topics for training and deploying large-scale learning pipelines. You will study state-of-the-art distributed algorithms for collaborative filtering, ensemble methods (e.g., random forests), clustering and topic modeling, with a focus on model parallelism and the crucial tradeoffs between computation and communication.

After completing this course, you will have a thorough understanding of the statistical and algorithmic principles required to develop and deploy distributed machine learning pipelines. You will further have the expertise to write efficient and scalable code in Spark, using MLlib and the spark.ml package in particular.

What you’ll learn:

  • Training and deploying large-scale learning pipelines for various supervised and unsupervised settings
  • Model parallelism and tradeoffs between computation and communication in distributed settings
  • Collaborative filtering, decision trees, random forests, clustering, topic modeling, hyperparameter tuning
  • Application of these principles using Spark, focusing on the spark.ml package.

Functional Programming in Scala

Image Credit:coursera.org
Image Credit:coursera.org

This Specialization provides a hands-on introduction to functional programming using the widespread programming language, Scala. It begins with the basic building blocks of the functional paradigm, first showing how to use these blocks to solve small problems, before building up to combining these concepts to architect larger functional programs. You’ll see how the functional paradigm facilitates parallel and distributed programming, and through a series hands-on on examples and programming assignments, you’ll learn how to analyze data sets small to large; from parallel programming on multicore architectures, to distributed programming on a cluster using Apache Spark. A final capstone project will allow you to apply the skills you learned by building a large data-intensive application using real-world data.

1. Functional Programming Principles in Scala

About this course:

Functional programming is becoming increasingly widespread in the industry. This trend is driven by the adoption of Scala as the main programming language for many applications. Scala fuses functional and object-oriented programming in a practical package. It interoperates seamlessly with both Java and Javascript. Scala is the implementation language of many important frameworks, including Apache Spark, Kafka, and Akka. It provides the core infrastructure for sites such as Twitter, Tumblr and also Coursera.

In this course you will discover the elements of the functional programming style and learn how to apply them usefully in your daily programming tasks. You will also develop a solid foundation for reasoning about functional programs, by touching upon proofs of invariants and the tracing of execution symbolically. The course is hands on; most units introduce short programs that serve as illustrations of important concepts and invite you to play with them, modifying and improving them. The course is complemented by a series programming projects as homework assignments.

Learning Outcomes:

  • understand the principles of functional programming,
  • write purely functional programs, using recursion, pattern matching, and higher-order functions,
  • combine functional programming with objects and classes,
  • design immutable data structures,
  • reason about properties of functions,
  • understand generic types for functional programs

2.Functional Program Design in Scala

About this course:

In this course you will learn how to apply the functional programming style in the design of larger applications. You’ll get to know important new functional programming concepts, from lazy evaluation to structuring your libraries using monads. We’ll work on larger and more involved examples, from state space exploration to random testing to discrete circuit simulators. You’ll also learn some best practices on how to write good Scala code in the real world.

Several parts of this course deal with the question how functional programming interacts with mutable state. We will explore the consequences of combining functions and state. We will also look at purely functional alternatives to mutable state, using infinite data structures or functional reactive programming.

Learning Outcomes:

  • recognize and apply design principles of functional programs,
  • design functional libraries and their APIs,
  • competently combine functions and state in one program,
  • understand reasoning techniques for programs that combine functions and state,
  • write simple functional reactive applications.

3. Parallel programming

About the Course:
With every smartphone and computer now boasting multiple processors, the use of functional ideas to facilitate parallel programming is becoming increasingly widespread. In this course, you’ll learn the fundamentals of parallel programming, from task parallelism to data parallelism. In particular, you’ll see how many familiar ideas from functional programming map perfectly to the data parallel paradigm. We’ll start the nuts and bolts how to effectively parallelize familiar collections operations, and we’ll build up to parallel collections, a production-ready data parallel collections library available in the Scala standard library. Throughout, we’ll apply these concepts through several hands-on examples that analyze real-world data, such as popular algorithms like k-means clustering.
Learning Outcomes:
  • reason about task and data parallel programs,
  • express common algorithms in a functional style and solve them in parallel,
  • competently microbenchmark parallel code,
  • write programs that effectively use parallel collections to achieve performance
  • References.

4. Big Data Analysis with Scala and Spark:

About this course:

Manipulating big data distributed over a cluster using functional concepts is rampant in the industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we’ll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. We’ll cover Spark’s programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, we’ll learn when important issues related to the distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance.

Learning Outcomes:

  • read data from persistent storage and load it into Apache Spark,
  • manipulate data with Spark and Scala,
  • express algorithms for data analysis in a functional style,
  • recognize how to avoid shuffles and recomputation in Spark

5.Functional Programming in Scala Capstone

About this course:

In the final capstone project you will apply the skills you learned by building a large data-intensive application using real-world data.

 

Follow us:

FACEBOOK| QUORA |TWITTERREDDIT | FLIPBOARD |LINKEDIN | MEDIUM| GITHUB

 

I hope you liked today’s post. If you have any questions then feel free to comment below.  If you want me to write on one specific topic then do tell it to me in the comments below.
Visit blogadda.com to discover Indian blogs

If you want to share your experience or opinions you can say.

Hello to hello@dataaspirant.com 

THANKS FOR READING…..

Home | About | Data scientists Interviews | For beginners | Join us |  Monthly newsletter

dataaspirant-april2016-newsletter

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

news_letter

Blog Posts:

1. Step by step kaggle competition tutorial:

In this article we are going to see how to go through a Kaggle competition step by step.The contest explored here is the San Francisco Crime Classification contest. The goal is to classify a crime occurrence knowing the time and place it happened.

Read Complete Post:  datanice Blog

2. Introduction to machine learning: 

It is an attempt to make things more intelligent. Most of us have come across terms like “Artificial Neural Networks”, it is an attempt to replicate the working of the human brain. Even something like this is not necessarily always complex. At its heart, it is just multiplication and differentiation. Yes, Maths at it again but it’s rather what you learned at school, no different (This coming from a guy who is petrified of maths)

Read Complete Post: medium.com/@mukulmalik

3. Baidu research chief Andrew NG fixed on self-taught computers self-driving cars:

Artificial-intelligence whiz Andrew Ng hangs his hat these days at a nondescript building in Sunnyvale that serves as the Silicon Valley outpost of the Chinese search giant Baidu.

Read Complete Post: Seattle times

4. Misleading modelling overfitting cross-validation and the bias-variance trade-off:

In this post you will get to grips with what is perhaps the most essential concept in machine learning: the bias-variance trade-off. The main idea here is that you want to create models that are as good at prediction as possible but that are still applicable to new data (i.e. they are generalizable). The danger is that you can easily create models that overfit to the local noise in your specific dataset, which isn’t too helpful and leads to poor generalizability since the noise is random and therefore different in each dataset. Essentially, you want to create models that capture only the useful components of a dataset. On the other hand, models that generalize very well but are too inflexible to generate good predictions are the other extreme you want to avoid (this is called underfitting).

Read Complete Post: Cambridge coding

5. Association rules and the apriori algorithm:

When we go grocery shopping, we often have a standard list of things to buy. Each shopper has a distinctive list, depending on one’s needs and preferences. A housewife might buy healthy ingredients for a family dinner, while a bachelor might buy beer and chips. Understanding these buying patterns can help to increase sales in several ways.

Read Complete Post: Annalyzin

6. Churn prediction pyspark using mllib and ml packages:

Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a subscription to a service. Though originally used within the telecommunications industry, it has become common practice across banks, ISPs, insurance firms, and other verticals.

The prediction process is heavily data driven and often utilizes advanced machine learning techniques. In this post, we’ll take a look at what types of customer data are typically used, do some preliminary analysis of the data, and generate churn prediction models – all with PySpark and its machine learning frameworks. We’ll also discuss the differences between two Apache Spark version 1.6.0 frameworks, MLlib and ML.

Read Complete Post: mapr

7. Data scientist keeps ranking top every best jobs list:

Data scientist is at, or near, the top of just about every “best jobs” survey, report, or study released in the past few years. Harvard Business Review named it the sexiest job of the 21st century. And with a median base salary of $96,000, data scientist and some engineering specialties are in a very small group of high-paying jobs that don’t require a medical or law degree. However, you’ll likely need more than a bachelor’s degree, as you’ll find out later in this article.

Read Complete Post: goodcall

8. Understand machine learning data descriptive statistics python: 

Looking at the raw data can reveal insights that you cannot get any other way. It can also plant seeds that may later grow into ideas on how to better preprocess and handle the data for machine learning tasks.

Read Complete Post: machinelearningmastery

9. Time series interventions and contribution:

This article illustrates principles of an analysis of the President George W. Bush’s job approval from January 2001 through Sep 2004 with disposable income excluded from the statistical model. To see a version complete with code and its description, visit bicorner.com.

Presidents with a job approval rating of less than 50 percent are unlikely to be re-elected. During June, Bush’s job approval rating averaged 47 percent in five major polls.

Read Complete Post: gladwinanalytics

10. Deep neural networks creative deep learning art:

Are deep neural networks creative? It seems like a reasonable question. Google’s “Inceptionism” technique transforms images, iteratively modifying them to enhance the activation of specific neurons in a deep net. The images appear trippy, transforming rocks into buildings or leaves into insects. Another neural generative model, introduced by Leon Gatys of the University of Tubingen in Germany, can extract the style from one image (say a painting by Van Gogh), and apply it to the content of another image (say a photograph).

Read Complete Post: kdnuggets

Video Courses:

  1. Introduction to python for data science
  2. Data exploration with kaggle scripts

That’s all for April-2016 newsletter. Please leave your suggestions on newsletter in the comment section. To get all  dataaspirant newsletters you can visit monthly newsletter page.

Follow us:

FACEBOOK| QUORA |TWITTERREDDIT | FLIPBOARD |LINKEDIN | MEDIUM| GITHUB

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

Four Coursera data science Specializations starts this month

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

move-arrow

Starting is the biggest step  to achieve dreams.  This is 200% true for the people how want to learn data science. The very first question comes in mind for data science beginners is Where to Start. If you  trying find the answer for this question your on the right track. You can find your answer in Coursera Specializations.

What Coursera Specialization will offer:

Coursera Data science Specializations and courses teach the fundamentals of interpreting data, performing analyses, and understanding and communicating actionable insights. Topics of study for beginning and advanced learners include qualitative and quantitative data analysis, tools and methods for data manipulation, and machine learning algorithms.

 

Big Data Specialization

big-data-cloud-e1383271750410-460x394

About This Specialization:

In this Specialization, you will develop a robust set of skills that will allow you to process, analyze, and extract meaningful information from large amounts of complex data. You will install and configure Hadoop with MapReduce, use Spark, Pig and Hive, perform predictive modelling with open source tools, and leverage graph analytics to model problems and perform scalable analytical tasks. In the final Capstone Project, developed in partnership with data software company Splunk, you’ll apply the skills you learned by building your own tools and models to analyze big data in the context of retail, sports, current events, or another area of your choice.

COURSE 1:  Introduction to Big Data

Course Started on : Oct 26   Ends on: Nov 23

About the Course:

What’s the “hype” surrounding the Big Data phenomenon? Who are these mysterious data scientists everyone is talking about? What kinds of problem-solving skills and knowledge should they have? What kinds of problems can be solved by Big Data technology? After this short introductory course you will have answers to all these questions. Additionally, you will start to become proficient with the key technical terms and big data tools and applications to prepare you for a deep dive into the rest of the courses in the Big Data specialization. Each day, our society creates 2.5 quintillion bytes of data (that’s 2.5 followed by 18 zeros). With this flood of data the need to unlock actionable value becomes more acute, rapidly increasing demand for Big Data skills and qualified data scientists.
Hands-On Assignment Hardware and Software Requirements
Software
Windows 7+, Mac OS X 10.10+, Ubuntu 14.04+ or CentOS 6+ VirtualBox 5+, VMWare Workstation 9+ or VMWare Fusion 7+
Hardware
Quad Core Processor (VT-x or AMD-V support recommended)
8 GB Ram 20 GB disk free

COURSE 2:  Hadoop Platform and Application Framework

Course Started on : Oct 20   Ends on: Nov 30

About the Course:

Are you looking for hands-on experience processing big data? After completing this course, you will be able to install, configure and implement an Apache Hadoop stack ranging from basic “Big Data” components to MapReduce and Spark execution frameworks. Moreover, in the exercises for this course you will solve fundamental problems that would require more computing power than a single computer. You will apply the most important Hadoop concepts in your solutions and use distributed/parallel processing in the Hadoop application framework. Get ready to be empowered to manipulate and analyze the significance of big data!

Course Link:  Hadoop Platform and Application Framework

COURSE 3: Introduction to Big Data Analytics

Course Starts : November 2015

About the Course:

Do you have specific business questions you want answered? Need to learn how to interpret results through analytics? This course will help you answer these questions by introducing you to HBase, Pig and Hive. In this course, you will take a real Twitter data set, clean it, bring it into an analytics engine, and create summary charts and drill-down dashboards. After completing this course, you will be able to utilize BigTable, distributed data store, columnar data, noSQL, and more!

Course Link: Introduction to Big Data Analytics

 

COURSE 4: Machine Learning With Big Data

Course Starts : December 2015

About the Course:

Want to learn the basics of large-scale data processing? Need to make predictive models but don’t know the right tools? This course will introduce you to open source tools you can use for parallel, distributed and scalable machine learning. After completing this course’s hands-on projects with MapReduce, KNIME and Spark, you will be able to train, evaluate, and validate basic predictive models. By the end of this course, you will be building a Big Data platform and utilizing several different tools and techniques.

Course Link:  Machine Learning With Big Data

 

COURSE 5:  Introduction to Graph Analytics

Course Starts : January 2016

About the Course:

Want to understand your data network structure and how it changes under different conditions? Curious to know how to identify closely interacting clusters within a graph? Have you heard of the fast-growing area of graph analytics and want to learn more? This course gives you a broad overview of the field of graph analytics so you can learn new ways to model, store, retrieve and analyze graph-structured data. After completing this course, you will be able to model a problem into a graph database and perform analytical tasks over the graph in a scalable manner. Better yet, you will be able to apply these techniques to understand the significance of your data sets for your own projects.

Course Link:  Introduction to Graph Analytics

 

Machine Learning Specialization

canstockphoto10834794

About This Specialization:

This Specialization provides a case-based introduction to the exciting, high-demand field of machine learning. You’ll learn to analyze large and complex datasets, build applications that can make predictions from data, and create systems that adapt and improve over time. In the final Capstone Project, you’ll apply your skills to solve an original, real-world problem through implementation of machine learning algorithms.

COURSE 1: Machine Learning Foundations: A Case Study Approach

Course Started on: Oct 26 Ends on: Dec 14

About the Course:
Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies. At the end of the first course you will have studied how to predict house prices based on house-level features, analyze sentiment from user reviews, retrieve documents of interest, recommend products, and search for images. Through hands-on practice with these use cases, you will be able to apply machine learning methods in a wide range of domains. This first course treats the machine learning method as a black box. Using this abstraction, you will focus on understanding tasks of interest, matching these tasks to machine learning tools, and assessing the quality of the output. In subsequent courses, you will delve into the components of this black box by examining models and algorithms. Together, these pieces form the machine learning pipeline, which you will use in developing intelligent applications.
Learning Outcomes: By the end of this course, you will be able to:
-Identify potential applications of machine learning in practice.
-Describe the core differences in analyses enabled by regression, classification, and clustering.
-Select the appropriate machine learning task for a potential application.
-Apply regression, classification, clustering, retrieval, recommender systems, and deep learning.
-Represent your data as features to serve as input to machine learning models.
-Assess the model quality in terms of relevant error metrics for each task.
-Utilize a dataset to fit a model to analyze new data.
-Build an end-to-end application that uses machine learning at its core.
-Implement these techniques in Python.

COURSE 2: Regression

Starts November 2015
About the Course:
Case Study – Predicting Housing Prices In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,…). This is just one of the many places where regression can be applied. Other applications range from predicting health outcomes in medicine, stock prices in finance, and power usage in high-performance computing, to analyzing which regulators are important for gene expression. In this course, you will explore regularized linear regression models for the task of prediction and feature selection. You will be able to handle very large sets of features and select between models of various complexity. You will also analyze the impact of aspects of your data — such as outliers — on your selected models and predictions. To fit these models, you will implement optimization algorithms that scale to large datasets.
Learning Outcomes: By the end of this course, you will be able to:
-Describe the input and output of a regression model.
-Compare and contrast bias and variance when modeling data.
-Estimate model parameters using optimization algorithms.
-Tune parameters with cross validation.
-Analyze the performance of the model.
-Describe the notion of sparsity and how LASSO leads to sparse solutions.
-Deploy methods to select between models.
-Exploit the model to form predictions.
-Build a regression model to predict prices using a housing dataset.
-Implement these techniques in Python.
Course Link: Regression

COURSE 3: Classification

Starts December 2015
About the Course:
Case Study: Analysing Sentiment In our second case study, analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,…). This task is an example of classification, one of the most widely used areas of machine learning, with a broad array of applications, including ad targeting, spam detection, medical diagnosis and image classification. In this course, you will create classifiers that provide state-of-the-art performance on a variety of tasks. You will become familiar with some of the most successful techniques, including logistic regression, boosted decision trees and kernelized support vector machines. In addition, you will be able to design and implement the underlying algorithms that can learn these models at scale. You will implement these technique on real-world, large-scale machine learning tasks.
Learning Objectives: By the end of this course, you will be able to:
-Describe the input and output of a classification model.
-Tackle both binary and multiclass classification problems.
-Implement a logistic regression model for large-scale classification.
-Create a non-linear model using decision trees.
-Improve the performance of any model using boosting.
-Construct non-linear features using kernels.
-Describe the underlying decision boundaries.
-Build a classification model to predict sentiment in a product review dataset.
-Implement these techniques in Python.
Course Link: Classification

COURSE 4: Clustering & Retrieval

Starts February 2016
About the Course:
Case Studies: Finding Similar Documents A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover? In this third case study, finding similar documents, you will examine similarity-based algorithms for retrieval. In this course, you will also examine structured representations for describing the documents in the corpus, including clustering and mixed membership models, such as latent Dirichlet allocation (LDA). You will implement expectation maximization (EM) to learn the document clusterings, and see how to scale the methods using MapReduce.
Learning Outcomes: By the end of this course, you will be able to:
-Create a document retrieval system using k-nearest neighbors.
-Describe how k-nearest neighbors can also be used for regression and classification.
-Identify various similarity metrics for text data.
-Cluster documents by topic using k-means.
-Perform mixed membership modeling using latent Dirichlet allocation (LDA).
-Describe how to parallelize k-means using MapReduce.
-Examine mixtures of Gaussians for density estimation.
-Fit a mixture of Gaussian model using expectation maximization (EM).
-Compare and contrast initialization techniques for non-convex optimization objectives.
-Implement these techniques in Python.

COURSE 5:  Recommender Systems & Dimensionality Reduction

Starts March 2016
About the Course:
Case Study:
Recommending Products How does Amazon recommend products you might be interested in purchasing? How does Netflix decide which movies or TV shows you might want to watch? What if you are a new user, should Netflix just recommend the most popular movies? Who might you form a new link with on Facebook or LinkedIn? These questions are endemic to most service-based industries, and underlie the notion of collaborative filtering and the recommender systems deployed to solve these problems. In this fourth case study, you will explore these ideas in the context of recommending products based on customer reviews. In this course, you will explore dimensionality reduction techniques for modeling high-dimensional data. In the case of recommender systems, your data is represented as user-product relationships, with potentially millions of users and hundred of thousands of products. You will implement matrix factorization and latent factor models for the task of predicting new user-product relationships. You will also use side information about products and users to improve predictions.
Learning Outcomes: By the end of this course, you will be able to:
-Create a collaborative filtering system.
-Reduce dimensionality of data using SVD, PCA, and random projections.
-Perform matrix factorization using coordinate descent.
-Deploy latent factor models as a recommender system.
-Handle the cold start problem using side information.
-Examine a product recommendation application.
-Implement these techniques in Python.

Data Science at Scale Specialization

data_science

About This Specialization:
This Specialization covers intermediate topics in data science. You will gain hands-on experience with scalable SQL and NoSQL data management solutions, data mining algorithms, and practical statistical and machine learning concepts. You will also learn to visualize data and communicate results, and you’ll explore legal and ethical issues that arise in working with big data. In the final Capstone Project, developed in partnership with the digital internship platform Coursolve, you’ll apply your new skills to a real-world data science project.

COURSE 1:  Data Manipulation at Scale: Systems and Algorithms

Upcoming session: Oct 26 — Nov 30
About the Course:
Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making — we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively. The abstractions that emerged in the last decade blend ideas from parallel databases, distributed systems, and programming languages to create a new class of scalable data analytics platforms that form the foundation for data science at realistic scales. In this course, you will learn the landscape of relevant systems, the principles on which they rely, their tradeoffs, and how to evaluate their utility against your requirements. You will learn how practical systems were derived from the frontier of research in computer science and what systems are coming on the horizon. Cloud computing, SQL and NoSQL databases, MapReduce and the ecosystem it spawned, Spark and its contemporaries, and specialized systems for graphs and arrays will be covered. You will also learn the history and context of data science, the skills, challenges, and methodologies the term implies, and how to structure a data science project. At the end of this course, you will be able to:
Learning Goals:
1. Describe common patterns, challenges, and approaches associated with data science projects, and what makes them different from projects in related fields.
2. Identify and use the programming models associated with scalable data manipulation, including relational algebra, mapreduce, and other data flow models.
3. Use database technology adapted for large-scale analytics, including the concepts driving parallel databases, parallel query processing, and in-database analytics
4. Evaluate key-value stores and NoSQL systems, describe their tradeoffs with comparable systems, the details of important examples in the space, and future trends.
5. “Think” in MapReduce to effectively write algorithms for systems including Hadoop and Spark. You will understand their limitations, design details, their relationship to databases, and their associated ecosystem of algorithms, extensions, and languages. write programs in Spark
6. Describe the landscape of specialized Big Data systems for graphs, arrays, and streams.

COURSE 2:  Practical Predictive Analytics: Models and Methods

Upcoming session: Oct 26 — Nov 30
About the Course:
Statistical experiment design and analytics are at the heart of data science. In this course you will design statistical experiments and analyze the results using modern methods. You will also explore the common pitfalls in interpreting statistical arguments, especially those associated with big data. Collectively, this course will help you internalize a core set of practical and effective machine learning methods and concepts, and apply them to solve some real world problems.
Learning Goals: After completing this course, you will be able to:
1. Design effective experiments and analyze the results
2. Use resampling methods to make clear and bulletproof statistical arguments without invoking esoteric notation
3. Explain and apply a core set of classification methods of increasing complexity (rules, trees, random forests), and associated optimization methods (gradient descent and variants)
4. Explain and apply a set of unsupervised learning concepts and methods
5. Describe the common idioms of large-scale graph analytics, including structural query, traversals and recursive queries, PageRank, and community detection.
About the Course:
Producing numbers is not enough; effective data scientists know how to interpret the numbers and communicate findings accurately to stakeholders to inform business decisions. Visualization is a relatively recent field of research in computer science that links perception, cognition, and algorithms to exploit the enormous bandwidth of the human visual cortex. In this course you will design effective visualizations and develop skills in recognizing and avoiding poor visualizations. Just because you can get the answer using big data doesn’t mean you should. In this course you will have the opportunity to explore the ethical considerations around big data and how these considerations are beginning to influence policy and practice.
Learning Goals: After completing this course, you will be able to:
1. Design and critique visualizations
2. Explain the state-of-the-art in privacy, ethics, governance around big data and data science
3. Explain the role of open data and reproducibility in data science.
The Data Analysis and Interpretation Specialization takes you from data novice to data analyst in just four project-based courses. You’ll learn to apply basic data science tools and techniques, including data visualization, regression modeling, and machine learning. Throughout the Specialization, you will analyze research questions of your choice and summarize your insights. In the final Capstone Project, you will use real data to address an important issue in society, and report your findings in a professional-quality report. These instructors are here to create a warm and welcoming place at the table for everyone. Everyone can do this, and we are building a community to show the way.

COURSE 1:  Data Management and Visualization

Upcoming session: Oct 26 — Nov 30
About the Course:
Have you wanted to describe your data in more meaningful ways? Interested in making visualizations from your own data sets? After completing this course, you will be able to manage, describe, summarize and visualize data. You will choose a research question based on available data and engage in the early decisions involved in quantitative research. Based on a question of your choosing, you will describe variables and their relationships through frequency tables, calculate statistics of center and spread, and create graphical representations. By the end of this course, you will be able to: – use a data codebook to decipher a data set – identify questions or problems that can be tackled by a particular data set – determine the data management steps that are needed to prepare data for analysis – write code to execute a variety of data management and data visualization techniques

Course Link:  Data Management and Visualization

COURSE 2: Data Analysis Tools

Current session: Oct 22 — Nov 30
About the Course:
Do you want to answer questions with data? Interested in discovering simple methods for answering these questions? Hypothesis testing is the tool for you! After completing this course, you will be able to: – identify the right statistical test for the questions you are asking – apply and carry out hypothesis tests – generalize the results from samples to larger populations – use Analysis of Variance, Chi-Square, Test of Independence and Pearson correlation – present your findings using statistical language.
Course Link: Data Analysis Tools

COURSE 3:  Regression Modeling in Practice

Starts November 2015
About the Course:
What kinds of statistical tools can you use to test your research question in more depth? In this course, you will go beyond basic data analysis tools to develop multiple linear regression and logistic regression models to address your research question more thoroughly. You will examine multiple predictors of your outcome and identify confounding variables. In this course you will be introduced to additional Python libraries for regression modeling. You will learn the assumptions underlying regression analysis, how to interpret regression coefficients, and how to use regression diagnostic plots and other tools to evaluate residual variability. Finally, through blogging, you will present the story of your regression model using statistical language.

COURSE 4: Machine Learning for Data Analysis

Starts January 2016
About the Course:
Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering. By completing this course, you will learn how to apply, test, and interpret machine learning algorithms as alternative methods for addressing your research questions.

Follow us:

FACEBOOK| QUORA |TWITTERREDDIT | FLIPBOARD |LINKEDIN | MEDIUM| GITHUB

I hope you liked todays post. If you have any questions then feel free to comment below.  If you want me to write on one specific topic then do tell it to me in the comments below.

If you want share your experience or opinions you can say.

Hello to hello@dataaspirant.com 

THANKS FOR READING…..

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter