dataaspirant-june2016-newsletter

 

dataaspirant_june_2016_newsletter
dataaspirant_june_2016_newsletter

 

Blog Posts:

[1] Top 10 Machine Learning Algorithms

[2] Explaining Deep Learning 

[3] Wide & Deep Learning: Better Together with TensorFlow

[4] Building intelligent applications with deep learning and TensorFlow

[5] k-Nearest Neighbors (k-NN)

[6] Hierarchical clustering

[7] Access your data in Amazon Redshift and PostgreSQL with Python and R

[8] Why E-Commerce Can’t Afford to Ignore Machine Learning

[9] Introduction to Apache Spark – Tutorial & Quick Start

[10] Creative Data Engineering Can Drive Data Science Insights: A Datapalooza Dispatch

[11] 19 Worst Mistakes at Data Science Job Interviews

Video Channel:

[1] Convolutional Neural Networks

Books:

[1] R Programming Book

[2] Neural Networks and Deep Learning

That’s all for June-2016 newsletter. Please leave your suggestions on newsletter in the comment section. To get all  dataaspirant newsletters you can visit monthly newsletter page.

Follow us:

FACEBOOK| QUORA |TWITTERREDDIT | FLIPBOARD |LINKEDIN | MEDIUM| GITHUB

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter
Advertisements

Scala and Pyspark specialization certification courses started

 

Scala & Spark Specilization
Scala & Spark Specialization

 

Data science is a promising field, Where you have to continuously update your skill set by learning the new technique, algorithms, and newly created tools. As the learning journey never ends, we would always seek to find the best resources to start learning these new skill sets. We should be thankful for the great MOOC course providers like Coursera, Edx, Udacity , where all these MOOC course providers main intention is to provide the high-quality content which explains the core concepts in standardized way  to create the virtual world around the user to feel himself like getting step by step to master in those skills.

In this particular post, we are going to share you 2 famous data science specialization certification courses offered from Edx  and Coursera.

  1. Data Science and Engineering with Apache Spark
  2. Functional Programming in Scala

These two specializations are a pack of some series of courses, Which start from basics to advanced level. Generally, it would take somewhere around 5 to 6 months  to get complete knowledge out off this specializations course. All the course videos , reference materials stuff are free of cost but if you indented for the specialization certificate it would cost you some decent dollars.

Data Science and Engineering with Apache Spark

Image Credit: edx.com
Image Credit: edx.org

The Data Science and Engineering with Spark XSeries, created in partnership with Databricks, will teach students how to perform data science and data engineering at scale using Spark, a cluster computing system well-suited for large-scale machine learning tasks. It will also present an integrated view of data processing by highlighting the various components of data analysis pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. Students will gain hands-on experience building and debugging Spark applications. Internal details of Spark and distributed machine learning algorithms will be covered, which will provide students with intuition about working with big data and developing code for a distributed environment.

This XSeries requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), but previous experience with Spark or distributed computing is NOT required. Familiarity with basic machine learning concepts and exposure to algorithms, probability, linear algebra, and calculus are prerequisites for two of the courses in this series.

1. Introduction to Apache Spark

About this course:

Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs. As companies realize this, Spark developers are becoming increasingly valued.

This statistics and data analysis course will teach you the basics of working with Spark and will provide you with the necessary foundation for diving deeper into Spark. You’ll learn about Spark’s architecture and programming model, including commonly used APIs. After completing this course, you’ll be able to write and debug basic Spark applications. This course will also explain how to use Spark’s web user interface (UI), how to recognize common coding errors, and how to proactively prevent errors. The focus of this course will be Spark Core and Spark SQL.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), but previous experience with Spark or distributed computing is NOT required. Students should take this Python mini-quiz before the course and take this Python mini-course if they need to learn Python or refresh their Python knowledge.

What you’ll learn:

  • Basic Spark architecture
  • Common operations
  • How to avoid coding mistakes
  • How to debug your Spark program

2. Distributed Machine Learning with Apache Spark

About this course:

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability, and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

What you’ll learn:

  • The underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines
  • Exploratory data analysis, feature extraction, supervised learning, and model evaluation
  • Application of these principles using Spark
  • How to implement distributed algorithms for fundamental statistical models.

3. Big Data Analysis with Apache Spark

About this course:

Organizations use their data to support and influence decisions and build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term ‘data science’.

This statistics and data analysis course will attempt to articulate the expected output of data scientists and then teach students how to use PySpark (part of Spark) to deliver against these expectations. The course assignments include log mining, textual entity recognition, and collaborative filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), and previous experience with Spark equivalent to Introduction to Apache Spark, is required.

What you’ll learn:

  • How to use Apache Spark to perform data analysis
  • How to use parallel programming to explore data sets
  • Apply log mining, textual entity recognition and collaborative filtering techniques to real-world data questions.

4. Advanced Apache Spark for Data Science and Data Engineering

About this course:

Gain a deeper understanding of Spark by learning about its APIs, architecture, and common use cases.  This statistics and data analysis course will cover material relevant to both data engineers and data scientists.  You’ll learn how Spark efficiently transfers data across the network via its shuffle, details of memory management, optimizations to reduce the compute costs, and more.  Learners will see several use cases for Spark and will work to solve a variety of real-world problems using public datasets.  After taking this course, you should have a thorough understanding of how Spark works and how you can best utilize its APIs to write efficient, scalable code.  You’ll also learn about a wide variety of Spark’s APIs, including the APIs in Spark Streaming.

What you’ll learn:

  • Common use cases for Spark
  • Details of internals like the shuffle, Spark SQL’s Catalyst Optimizer, and Project Tungsten
  • A deep architectural overview
  • Spark Streaming
  • Spark ML

5. Advanced Distributed Machine Learning with Apache Spark

About this course:

Building on the core ideas presented in Distributed Machine Learning with Spark, this course covers advanced topics for training and deploying large-scale learning pipelines. You will study state-of-the-art distributed algorithms for collaborative filtering, ensemble methods (e.g., random forests), clustering and topic modeling, with a focus on model parallelism and the crucial tradeoffs between computation and communication.

After completing this course, you will have a thorough understanding of the statistical and algorithmic principles required to develop and deploy distributed machine learning pipelines. You will further have the expertise to write efficient and scalable code in Spark, using MLlib and the spark.ml package in particular.

What you’ll learn:

  • Training and deploying large-scale learning pipelines for various supervised and unsupervised settings
  • Model parallelism and tradeoffs between computation and communication in distributed settings
  • Collaborative filtering, decision trees, random forests, clustering, topic modeling, hyperparameter tuning
  • Application of these principles using Spark, focusing on the spark.ml package.

Functional Programming in Scala

Image Credit:coursera.org
Image Credit:coursera.org

This Specialization provides a hands-on introduction to functional programming using the widespread programming language, Scala. It begins with the basic building blocks of the functional paradigm, first showing how to use these blocks to solve small problems, before building up to combining these concepts to architect larger functional programs. You’ll see how the functional paradigm facilitates parallel and distributed programming, and through a series hands-on on examples and programming assignments, you’ll learn how to analyze data sets small to large; from parallel programming on multicore architectures, to distributed programming on a cluster using Apache Spark. A final capstone project will allow you to apply the skills you learned by building a large data-intensive application using real-world data.

1. Functional Programming Principles in Scala

About this course:

Functional programming is becoming increasingly widespread in the industry. This trend is driven by the adoption of Scala as the main programming language for many applications. Scala fuses functional and object-oriented programming in a practical package. It interoperates seamlessly with both Java and Javascript. Scala is the implementation language of many important frameworks, including Apache Spark, Kafka, and Akka. It provides the core infrastructure for sites such as Twitter, Tumblr and also Coursera.

In this course you will discover the elements of the functional programming style and learn how to apply them usefully in your daily programming tasks. You will also develop a solid foundation for reasoning about functional programs, by touching upon proofs of invariants and the tracing of execution symbolically. The course is hands on; most units introduce short programs that serve as illustrations of important concepts and invite you to play with them, modifying and improving them. The course is complemented by a series programming projects as homework assignments.

Learning Outcomes:

  • understand the principles of functional programming,
  • write purely functional programs, using recursion, pattern matching, and higher-order functions,
  • combine functional programming with objects and classes,
  • design immutable data structures,
  • reason about properties of functions,
  • understand generic types for functional programs

2.Functional Program Design in Scala

About this course:

In this course you will learn how to apply the functional programming style in the design of larger applications. You’ll get to know important new functional programming concepts, from lazy evaluation to structuring your libraries using monads. We’ll work on larger and more involved examples, from state space exploration to random testing to discrete circuit simulators. You’ll also learn some best practices on how to write good Scala code in the real world.

Several parts of this course deal with the question how functional programming interacts with mutable state. We will explore the consequences of combining functions and state. We will also look at purely functional alternatives to mutable state, using infinite data structures or functional reactive programming.

Learning Outcomes:

  • recognize and apply design principles of functional programs,
  • design functional libraries and their APIs,
  • competently combine functions and state in one program,
  • understand reasoning techniques for programs that combine functions and state,
  • write simple functional reactive applications.

3. Parallel programming

About the Course:
With every smartphone and computer now boasting multiple processors, the use of functional ideas to facilitate parallel programming is becoming increasingly widespread. In this course, you’ll learn the fundamentals of parallel programming, from task parallelism to data parallelism. In particular, you’ll see how many familiar ideas from functional programming map perfectly to the data parallel paradigm. We’ll start the nuts and bolts how to effectively parallelize familiar collections operations, and we’ll build up to parallel collections, a production-ready data parallel collections library available in the Scala standard library. Throughout, we’ll apply these concepts through several hands-on examples that analyze real-world data, such as popular algorithms like k-means clustering.
Learning Outcomes:
  • reason about task and data parallel programs,
  • express common algorithms in a functional style and solve them in parallel,
  • competently microbenchmark parallel code,
  • write programs that effectively use parallel collections to achieve performance
  • References.

4. Big Data Analysis with Scala and Spark:

About this course:

Manipulating big data distributed over a cluster using functional concepts is rampant in the industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we’ll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. We’ll cover Spark’s programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, we’ll learn when important issues related to the distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance.

Learning Outcomes:

  • read data from persistent storage and load it into Apache Spark,
  • manipulate data with Spark and Scala,
  • express algorithms for data analysis in a functional style,
  • recognize how to avoid shuffles and recomputation in Spark

5.Functional Programming in Scala Capstone

About this course:

In the final capstone project you will apply the skills you learned by building a large data-intensive application using real-world data.

 

Follow us:

FACEBOOK| QUORA |TWITTERREDDIT | FLIPBOARD |LINKEDIN | MEDIUM| GITHUB

 

I hope you liked today’s post. If you have any questions then feel free to comment below.  If you want me to write on one specific topic then do tell it to me in the comments below.
Visit blogadda.com to discover Indian blogs

If you want to share your experience or opinions you can say.

Hello to hello@dataaspirant.com 

THANKS FOR READING…..

Home | About | Data scientists Interviews | For beginners | Join us |  Monthly newsletter

dataaspirant-april2016-newsletter

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

news_letter

Blog Posts:

1. Step by step kaggle competition tutorial:

In this article we are going to see how to go through a Kaggle competition step by step.The contest explored here is the San Francisco Crime Classification contest. The goal is to classify a crime occurrence knowing the time and place it happened.

Read Complete Post:  datanice Blog

2. Introduction to machine learning: 

It is an attempt to make things more intelligent. Most of us have come across terms like “Artificial Neural Networks”, it is an attempt to replicate the working of the human brain. Even something like this is not necessarily always complex. At its heart, it is just multiplication and differentiation. Yes, Maths at it again but it’s rather what you learned at school, no different (This coming from a guy who is petrified of maths)

Read Complete Post: medium.com/@mukulmalik

3. Baidu research chief Andrew NG fixed on self-taught computers self-driving cars:

Artificial-intelligence whiz Andrew Ng hangs his hat these days at a nondescript building in Sunnyvale that serves as the Silicon Valley outpost of the Chinese search giant Baidu.

Read Complete Post: Seattle times

4. Misleading modelling overfitting cross-validation and the bias-variance trade-off:

In this post you will get to grips with what is perhaps the most essential concept in machine learning: the bias-variance trade-off. The main idea here is that you want to create models that are as good at prediction as possible but that are still applicable to new data (i.e. they are generalizable). The danger is that you can easily create models that overfit to the local noise in your specific dataset, which isn’t too helpful and leads to poor generalizability since the noise is random and therefore different in each dataset. Essentially, you want to create models that capture only the useful components of a dataset. On the other hand, models that generalize very well but are too inflexible to generate good predictions are the other extreme you want to avoid (this is called underfitting).

Read Complete Post: Cambridge coding

5. Association rules and the apriori algorithm:

When we go grocery shopping, we often have a standard list of things to buy. Each shopper has a distinctive list, depending on one’s needs and preferences. A housewife might buy healthy ingredients for a family dinner, while a bachelor might buy beer and chips. Understanding these buying patterns can help to increase sales in several ways.

Read Complete Post: Annalyzin

6. Churn prediction pyspark using mllib and ml packages:

Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a subscription to a service. Though originally used within the telecommunications industry, it has become common practice across banks, ISPs, insurance firms, and other verticals.

The prediction process is heavily data driven and often utilizes advanced machine learning techniques. In this post, we’ll take a look at what types of customer data are typically used, do some preliminary analysis of the data, and generate churn prediction models – all with PySpark and its machine learning frameworks. We’ll also discuss the differences between two Apache Spark version 1.6.0 frameworks, MLlib and ML.

Read Complete Post: mapr

7. Data scientist keeps ranking top every best jobs list:

Data scientist is at, or near, the top of just about every “best jobs” survey, report, or study released in the past few years. Harvard Business Review named it the sexiest job of the 21st century. And with a median base salary of $96,000, data scientist and some engineering specialties are in a very small group of high-paying jobs that don’t require a medical or law degree. However, you’ll likely need more than a bachelor’s degree, as you’ll find out later in this article.

Read Complete Post: goodcall

8. Understand machine learning data descriptive statistics python: 

Looking at the raw data can reveal insights that you cannot get any other way. It can also plant seeds that may later grow into ideas on how to better preprocess and handle the data for machine learning tasks.

Read Complete Post: machinelearningmastery

9. Time series interventions and contribution:

This article illustrates principles of an analysis of the President George W. Bush’s job approval from January 2001 through Sep 2004 with disposable income excluded from the statistical model. To see a version complete with code and its description, visit bicorner.com.

Presidents with a job approval rating of less than 50 percent are unlikely to be re-elected. During June, Bush’s job approval rating averaged 47 percent in five major polls.

Read Complete Post: gladwinanalytics

10. Deep neural networks creative deep learning art:

Are deep neural networks creative? It seems like a reasonable question. Google’s “Inceptionism” technique transforms images, iteratively modifying them to enhance the activation of specific neurons in a deep net. The images appear trippy, transforming rocks into buildings or leaves into insects. Another neural generative model, introduced by Leon Gatys of the University of Tubingen in Germany, can extract the style from one image (say a painting by Van Gogh), and apply it to the content of another image (say a photograph).

Read Complete Post: kdnuggets

Video Courses:

  1. Introduction to python for data science
  2. Data exploration with kaggle scripts

That’s all for April-2016 newsletter. Please leave your suggestions on newsletter in the comment section. To get all  dataaspirant newsletters you can visit monthly newsletter page.

Follow us:

FACEBOOK| QUORA |TWITTERREDDIT | FLIPBOARD |LINKEDIN | MEDIUM| GITHUB

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

dataaspirant-march2016-newsletter

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

March-2016

Blog Posts:

1. Cricket survial Analysis:

Survival Analysis is used in areas where the time duration of a sample of observations is analysed until an event of death occurs. Survival analysis is applied to mechanical engineering to predict systems failures and in medical sciences to predict patient outcomes.

In this post I’ll be using Survival Analysis for a more lighthearted application–to analyze the career lengths of Cricket players.

Read Complete Post on: blog.yhat.com

2. Complete tutorial learn data science python scratch:

Python was originally a general purpose language. But, over the years, with strong community support, this language got dedicated library for data analysis and predictive modeling.

Due to lack of resource on python for data science, I decided to create this tutorial to help many others to learn python faster. In this tutorial, we will take bite sized information about how to use Python for Data Analysis, chew it till we are comfortable and practice it at our own end.

Read Complete Post on: analyticsvidhya

3. Markov Chains Explained Visually:

Markov chains, named after Andrey Markov, are mathematical systems that hop from one “state” (a situation or set of values) to another. For example, if you made a Markov chain model of a baby’s behavior, you might include “playing,” “eating”, “sleeping,” and “crying” as states, which together with other behaviors could form a ‘state space’: a list of all possible states. In addition, on top of the state space, a Markov chain tells you the probabilitiy of hopping, or “transitioning,” from one state to any other state—e.g., the chance that a baby currently playing will fall asleep in the next five minutes without crying first.

Read Complete Post on: setosa

4. Data scientist Role:

The age of big data is upon us, and it’s here to stay. With more data being collected than ever before, extracting value from this data is only going to become more intricate and demanding as time goes on. The logic behind the big data economy is shaping our personal lives in ways that we probably can’t even conceive or predict; every electronic move that we make produces a statistic and insight into our life.

As participants in the consumer economy, we are mined for data when we connect to any website or electronic service, and a data scientist is there to collect, clean, analyse and predict the data that we provide by using a combination of computer science, statistical analysis and intricate business knowledge.

As we can see, this responsibility is a combination of multiple skill sets and expertise compared to a typical Big Data Developer or Business Analyst.

Read Complete Post on: globalbigdataconference

5. An introduction to deep learning from perceptrons to deep networks:

In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyond the academic world with major players like Google, Microsoft, and Facebook creating their own research teams and making some impressive acquisitions.

Some this can be attributed to the abundance of raw data generated by social network users, much of which needs to be analyzed, as well as to the cheap computational power available via GPGPUs.

Read Complete Post on: toptal

6. Books for statistics mathematics data science:

The selection process of data scientists at Google gives higher priority to candidates with strong background in statistics and mathematics. Not just Google, other top companies (Amazon, Airbnb, Uber etc) in the world also prefer candidates with strong fundamentals rather than mere know-how in data science.

If you too aspire to work for such top companies in future, it is essential for you to develop a mathematical understanding of data science. Data science is simply the evolved version of statistics and mathematics, combined with programming and business logic. I’ve met many data scientists who struggle to explain predictive models statistically.

More than just deriving accuracy, understanding & interpreting every metric, calculation behind that accuracy is important. Remember, every single ‘variable’ has a story to tell. So, if not anything else, try to become a great story explorer!

Read Complete Post on: Analyticsvidhya

7. Hadoop installation using Ambari:

Apache Hadoop has become a de-facto software framework for reliable, scalable, distributed and large scale computing.  Unlike other computing system, it brings computation to data rather than sending data to computation. Hadoop was created in 2006 at Yahoo by Doug Cutting based on paper published by Google. As Hadoop has matured, over the years many new components and tools were added to its ecosystem to enhance its usability and functionality. Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop etc. to name a few.

Read Complete Post on: Edupristine

Videos:

  1. How to use data to make a hit tv show
  2. Solving  Big Problems with Julia
  3. Best Data visualization

That’s all for March-2016 newsletter. Please leave your suggestions on newsletter in the comment section. To get all  dataaspirant newsletters you can visit monthly newsletter page.

Follow us:

FACEBOOK| QUORA |TWITTERREDDIT | FLIPBOARD |LINKEDIN | MEDIUM| GITHUB

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

python datamining packages virtual environment setup in ubuntu

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

main1

virtual Environment:

Virtual Environment is to create an python environment which allows to use different python modules & versions, without messing up the system.

let’s understand virtual environment with it’s need in the project development. In the world of python projects it’s obvious to use different python libraries (wrappers) to get our work done, but It won’t be happy ending all the time. Most of the time we face environment issues where our python application won’t run on new machines(computer systems). This is because of the dependence issue of python libraries in the new machine (in which we try to run our python application). For better understanding, suppose in the development face in our python application we have used python pandas (Python data analysis library) version 0.18.0 function which was not there in pandas 0.17.1. In new machine pandas version 0.17.1 we installed. So the python application won’t run on new machine because of the version difference.

To over come this, we need to use an created python environment which contains everything that a Python project (application) needs in order to run in an organised, isolated fashion.

So Using virtual Environment is the recommended way for working with Python projects, regardless of how many you might be busy with.

In ubuntu creating virtual environment is much easy by using virtualenv (a tool to create isolated Python environments)

About virtualenv:image02

Virtualenv helps solve project dependency conflicts by creating isolated environments which can contain all the goodies Python programmers need to develop their projects. A virtual environment created using this tool includes a fresh copy of the Python binary itself as well as a copy of the entire Python standard library.

Installing virtualenv:


$ sudo pip install virtualenv

As of now we successfully installed virtualenv. Now let’s create a folder (Environment) where we will  install python data mining packages .

Create  virtual environment:


$ virtualenv dataaspirant_venv

virtualenv dataaspirant_venv will create a folder in the current directory which will contain the Python executable files, and a copy of the pip library which you can use to install other packages. The name of the virtual environment (in this case, it was dataaspirant_venv) can be anything; omitting the name will place the files in the current directory instead.

This creates a copy of Python in whichever directory you ran the command in, placing it in a folder named dataaspirant_venv

Using the virtual environment:

To use the virtual environment, it needs to be activated


$ source dataaspirant_venv/bin/activate

The name of the current virtual environment will now appear on the left of the prompt (e.g. (dataaspirant_venv)Your-Computer:your_project UserName$) to let you know that it’s active. From now on, any package that you install using pip will be placed in the dataaspirant_venv folder, isolated from the global Python installation.

Python Data mining packages Installation:

So let’s start installing data mining python pacakges one by one

Numpy installation Command:


$ pip install numpy

Scipy Installation Command:


$ pip install scipy

Matplotlib Installation Command:


$ pip install matplotlib

Ipython Installation Command:


$ pip install ipython[all]

Pandas Installation Command:


$ pip install pandas

Statsmodel Installation Command:


$ pip install statsmodels

Scikit-learn Installation Command:


$ pip install scikit-learn

Running Script File:

when virtualenv is in active mode you simple go the directiory where your python script file there and run your script file in general way.


$ python script_file.py

Deactivate the virtual environment:

If you done working in the virtual environment for the moment, you can deactivate it.


$ deactivate

This puts you back to the system’s default Python interpreter with all its installed libraries.

To delete a virtual environment, just delete its folder. (In this case, it would be rm -rf dataaspirant_venv.)

Reference Links:

[ 1 ] http://docs.python-guide.org/en/latest/dev/virtualenvs/

[ 2 ] https://en.wikipedia.org/wiki/Virtual_environment_software/

Follow us:

FACEBOOK| QUORA |TWITTERREDDIT | FLIPBOARD |LINKEDIN | MEDIUM| GITHUB

I hope you liked todays post. If you have any questions then feel free to comment below.  If you want me to write on one specific topic then do tell it to me in the comments below.

If you want share your experience or opinions you can say.

Hello to hello@dataaspirant.com 

THANKS FOR READING…..

Home | About | Data scientists Interviews | For beginners | Join us |  Monthly newsletter

dataaspirant-Nov2015-newsLetter

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

dataaspirant

Blog Posts:

1. Great resources for learning data mining concepts and techniques:

With today’s tools, anyone can collect data from almost anywhere, but not everyone can pull the important nuggets out of that data. Whacking your data into Tableau is an OK start, but it’s not going to give you the business critical insights you’re looking for. To truly make your data come alive you need to mine it. Dig deep. Play around. And tease out the diamond in the rough.

Read Complete Post on: blog.import.io

2.  Interactive Data Science with R in Apache Zeppelin Notebook:

The objective of this blog post is to help you get started with Apache Zeppelin notebook for your R data science requirements. Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with Scala(with Apache Spark), Python(with Apache Spark), SparkSQL, Hive, Markdown, Shell and more.

Read Complete Post on: sparkiq-labs

3. How to install Apache Hadoop 2.6.0 in Ubuntu:

Let’s get started towards setting up a fresh Multinode Hadoop (2.6.0) cluster.

Read Complete Post on: pingax

4. Running scalable Data Science on Cloud with R & Python:

So, why do we even need to run data science on cloud? You might raise this question that if a laptop can pack 64 GB RAM, do we even need cloud for data science? And the answer is a big YES for a variety of reasons. Here are a few of them.

Read Complete Post on: analyticsvidhya

5. How to Choose Between Learning Python or R First:

If you’re interested in a career in data, and you’re familiar with the set of skills you’ll need to master, you know that Python and R are two of the most popular languages for data analysis. If you’re not exactly sure which to start learning first, you’re reading the right article.

When it comes to data analysis, both Python and R are simple (and free) to install and relatively easy to get started with. If you’re a newcomer to the world of data science and don’t have experience in either language, or with programming in general, it makes sense to be unsure whether to learn R or Python first.

Read Complete Post on: Udacity Blog

LinkedIn Posts:

  1. 5 Best Machine Learning APIs for Data Science
  2. Big Data Top Trends In 2016
  3. Big Data: 4 Things You Can Do With It, And 3 Things You Can’t

Videos:

1. Machine Learning: Going Deeper with Python and Theano

2. Current State of Recommendation Systems

3. Pandas From The Ground Up

Library:

TensorFlow Google Machine Learning Library:

About TensorFlow:

TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

Get Started TensorFlow

 

That’s all for November 2015 newsletter. Please leave your suggestions on newsletter in the comment section. To get all  dataaspirant newsletters you can visit monthly newsletter page. Do please Subscribe to our blog so that every month you get our news letter in your inbox.

Follow us:

FACEBOOK| QUORA |TWITTERREDDIT | FLIPBOARD |LINKEDIN | MEDIUM| GITHUB

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

DataAspirant Sept-Oct2015 newsletter

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

Data scientist

 

Hi dataaspirant lovers we are sorry for not publishing dataaspirant September  newsletter. So for October newsletter we come up with September newsletter ingredients  too. We rounded up the best blogs for anyone interested in learning more about data science. Whatever your experience level in data science or someone who’s just heard of the field,  these blogs provide enough detail and context for you to understand what you’re reading. We also collected some videos too. Hope you  enjoy October  dataaspirant newsletter.

 

Blog Posts:

1 . How to do a Logistic Regression in R:

Regression is the statistical technique that tries to explain the relationship between a dependent variable and one or more independent variables. There are various kinds of it like simple linear, multiple linear, polynomial, logistic, poisson etc

Read Complete post on: datavinci

2 . Introduction of Markov State Modeling:

Modeling and prediction problems occur in different domain and data situations. One type of situation involves sequence of events.

For instance, you may want to model behaviour of customers on your website, looking at pages they land or enter by, links they click, and so on. You may want to do this to understand common issues and needs and may redesign your website to address that. You may, on the other hand, may want to promote certain sections or products on website and want to understand right page architecture and layout. In other example, you may be interested in predicting next medical visit of patient based on previous visits or next purchase product of customer based on previous products.

Read Complete post on: edupristine

3 . Five ways to improve the way you use Hadoop:

Apache Hadoop is an open source framework designed to distribute the storage and processing of massive data sets across virtually limitless servers. Amazon EMR (Elastic MapReduce) is a particularly popular service from Amazon that is used by developers trying to avoid the burden of set up and administration, and concentrate on working with their data.

Read Complete post on: cloudacademy

4. What is deep learning and why is it getting so much attention:

Deep learning is probably one of the hottest topics in Machine learning today, and it has shown significant improvement over some of its counterparts. It falls under a class of unsupervised learning algorithms and uses multi-layered neural networks to achieve these remarkable outcomes.

Read Complete post on: analyticsvidhya

5. Facebook data collection and photo network visualization with Gephi and R:

The first thing to do is get the Facebook data. Before being allowed to pull it from R, you’ll need to make a quick detour to developers.facebook.com/apps, register as a developer, and create a new app. Name and description are irrelevant, the only thing you need to do is go to Settings → Website → Site URL and fill in http://localhost:1410/ (that’s the port we’re going to be using). The whole process takes ~5 min and is quite painless

Read Complete post on: kateto

6. Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data:

The set of data storage and processing technologies that define the Apache Hadoop ecosystem are expansive and ever-improving, covering a very diverse set of customer use cases used in mission-critical enterprise applications. At Cloudera, we’re constantly pushing the boundaries of what’s possible with Hadoop—making it faster, easier to work with, and more secure.

Read Complete post on: cloudera

7. Rapid Development & Performance in Spark For Data Scientists:

Spark is a cluster computing framework that can significantly increase the efficiency and capabilities of a data scientist’s workflow when dealing with distributed data. However, deciding which of its many modules, features and options are appropriate for a given problem can be cumbersome. Our experience at Stitch Fix has shown that these decisions can have a large impact on development time and performance. This post will discuss strategies at each stage of the data processing workflow which data scientists new to Spark should consider employing for high productivity development on big data.

Read Complete post on: multithreaded

8. NoSQL: A Dog with Different Fleas:

The NoSQL movement is around providing performance, scale, and flexibility; where cost is sometimes part of the reasoning (e.g. Oracle Tax). Yet databases like MySQL, which provide all the Oracle features, is often considered before choosing NoSQL. And with respects to NoSQL flexibility. This also can be Pandora’s box. In other words, schema-less modeling has been shown to be a serious complication to data management. I was at the MongoDB Storage Engine Summit this year and the number one ask to the storage engine providers is “how to discover schema in a schema-less architecture?” In other words, managing models over time is a serious matter to consider too.

Read Complete post on: deepis

9. Apache Spark: Sparkling star in big data firmament:

The underlying data needed to be used to gain right outcomes for all above tasks is comparatively very large. It cannot be handled efficiently (in terms of both space and time) by traditional systems. These are all big data scenarios. To collect, store and do computations on this kind of voluminous data we need a specialized cluster computing system. Apache Hadoop has solved this problem for us.

Read Complete post on: edupristine

10. Sqoop vs. Flume – Battle of the Hadoop ETL tools:

Apache Hadoop is synonymous with big data for its cost-effectiveness and its attribute of scalability for processing petabytes of data. Data analysis using hadoop is just half the battle won. Getting data into the Hadoop cluster plays a critical role in any big data deployment. Data ingestion is important in any big data project because the volume of data is generally in petabytes or exabytes. Hadoop Sqoop and Hadoop Flume are the two tools in Hadoop which is used to gather data from different sources and load them into HDFS. Sqoop in Hadoop is mostly used to extract structured data from databases like Teradata, Oracle, etc., and Flume in Hadoop is used to sources data which is stored in various sources like and deals mostly with unstructured data.

Read Complete post on: dezyre

 

Videos:

1. Spark and Spark Streaming at Uber :

2. How To Stream Twitter Data Into Hadoop Using Apache Flume:

 

That’s all for October 2015 newsletter. Please leave your suggestions on newsletter in the comment box. To get all  dataaspirant newsletters you can visit monthly newsletter page. Do please Subscribe to our blog so that every month you get our news letter in your inbox.

 

Follow us:

FACEBOOK| QUORA |TWITTERREDDIT | FLIPBOARD |LINKEDIN | MEDIUM| GITHUB

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter