Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter


Blog Posts:

1. Cricket survial Analysis:

Survival Analysis is used in areas where the time duration of a sample of observations is analysed until an event of death occurs. Survival analysis is applied to mechanical engineering to predict systems failures and in medical sciences to predict patient outcomes.

In this post I’ll be using Survival Analysis for a more lighthearted application–to analyze the career lengths of Cricket players.

Read Complete Post on:

2. Complete tutorial learn data science python scratch:

Python was originally a general purpose language. But, over the years, with strong community support, this language got dedicated library for data analysis and predictive modeling.

Due to lack of resource on python for data science, I decided to create this tutorial to help many others to learn python faster. In this tutorial, we will take bite sized information about how to use Python for Data Analysis, chew it till we are comfortable and practice it at our own end.

Read Complete Post on: analyticsvidhya

3. Markov Chains Explained Visually:

Markov chains, named after Andrey Markov, are mathematical systems that hop from one “state” (a situation or set of values) to another. For example, if you made a Markov chain model of a baby’s behavior, you might include “playing,” “eating”, “sleeping,” and “crying” as states, which together with other behaviors could form a ‘state space’: a list of all possible states. In addition, on top of the state space, a Markov chain tells you the probabilitiy of hopping, or “transitioning,” from one state to any other state—e.g., the chance that a baby currently playing will fall asleep in the next five minutes without crying first.

Read Complete Post on: setosa

4. Data scientist Role:

The age of big data is upon us, and it’s here to stay. With more data being collected than ever before, extracting value from this data is only going to become more intricate and demanding as time goes on. The logic behind the big data economy is shaping our personal lives in ways that we probably can’t even conceive or predict; every electronic move that we make produces a statistic and insight into our life.

As participants in the consumer economy, we are mined for data when we connect to any website or electronic service, and a data scientist is there to collect, clean, analyse and predict the data that we provide by using a combination of computer science, statistical analysis and intricate business knowledge.

As we can see, this responsibility is a combination of multiple skill sets and expertise compared to a typical Big Data Developer or Business Analyst.

Read Complete Post on: globalbigdataconference

5. An introduction to deep learning from perceptrons to deep networks:

In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyond the academic world with major players like Google, Microsoft, and Facebook creating their own research teams and making some impressive acquisitions.

Some this can be attributed to the abundance of raw data generated by social network users, much of which needs to be analyzed, as well as to the cheap computational power available via GPGPUs.

Read Complete Post on: toptal

6. Books for statistics mathematics data science:

The selection process of data scientists at Google gives higher priority to candidates with strong background in statistics and mathematics. Not just Google, other top companies (Amazon, Airbnb, Uber etc) in the world also prefer candidates with strong fundamentals rather than mere know-how in data science.

If you too aspire to work for such top companies in future, it is essential for you to develop a mathematical understanding of data science. Data science is simply the evolved version of statistics and mathematics, combined with programming and business logic. I’ve met many data scientists who struggle to explain predictive models statistically.

More than just deriving accuracy, understanding & interpreting every metric, calculation behind that accuracy is important. Remember, every single ‘variable’ has a story to tell. So, if not anything else, try to become a great story explorer!

Read Complete Post on: Analyticsvidhya

7. Hadoop installation using Ambari:

Apache Hadoop has become a de-facto software framework for reliable, scalable, distributed and large scale computing.  Unlike other computing system, it brings computation to data rather than sending data to computation. Hadoop was created in 2006 at Yahoo by Doug Cutting based on paper published by Google. As Hadoop has matured, over the years many new components and tools were added to its ecosystem to enhance its usability and functionality. Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop etc. to name a few.

Read Complete Post on: Edupristine


  1. How to use data to make a hit tv show
  2. Solving  Big Problems with Julia
  3. Best Data visualization

That’s all for March-2016 newsletter. Please leave your suggestions on newsletter in the comment section. To get all  dataaspirant newsletters you can visit monthly newsletter page.

Follow us:


Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

python datamining packages virtual environment setup in ubuntu

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter


virtual Environment:

Virtual Environment is to create an python environment which allows to use different python modules & versions, without messing up the system.

let’s understand virtual environment with it’s need in the project development. In the world of python projects it’s obvious to use different python libraries (wrappers) to get our work done, but It won’t be happy ending all the time. Most of the time we face environment issues where our python application won’t run on new machines(computer systems). This is because of the dependence issue of python libraries in the new machine (in which we try to run our python application). For better understanding, suppose in the development face in our python application we have used python pandas (Python data analysis library) version 0.18.0 function which was not there in pandas 0.17.1. In new machine pandas version 0.17.1 we installed. So the python application won’t run on new machine because of the version difference.

To over come this, we need to use an created python environment which contains everything that a Python project (application) needs in order to run in an organised, isolated fashion.

So Using virtual Environment is the recommended way for working with Python projects, regardless of how many you might be busy with.

In ubuntu creating virtual environment is much easy by using virtualenv (a tool to create isolated Python environments)

About virtualenv:image02

Virtualenv helps solve project dependency conflicts by creating isolated environments which can contain all the goodies Python programmers need to develop their projects. A virtual environment created using this tool includes a fresh copy of the Python binary itself as well as a copy of the entire Python standard library.

Installing virtualenv:

$ sudo pip install virtualenv

As of now we successfully installed virtualenv. Now let’s create a folder (Environment) where we will  install python data mining packages .

Create  virtual environment:

$ virtualenv dataaspirant_venv

virtualenv dataaspirant_venv will create a folder in the current directory which will contain the Python executable files, and a copy of the pip library which you can use to install other packages. The name of the virtual environment (in this case, it was dataaspirant_venv) can be anything; omitting the name will place the files in the current directory instead.

This creates a copy of Python in whichever directory you ran the command in, placing it in a folder named dataaspirant_venv

Using the virtual environment:

To use the virtual environment, it needs to be activated

$ source dataaspirant_venv/bin/activate

The name of the current virtual environment will now appear on the left of the prompt (e.g. (dataaspirant_venv)Your-Computer:your_project UserName$) to let you know that it’s active. From now on, any package that you install using pip will be placed in the dataaspirant_venv folder, isolated from the global Python installation.

Python Data mining packages Installation:

So let’s start installing data mining python pacakges one by one

Numpy installation Command:

$ pip install numpy

Scipy Installation Command:

$ pip install scipy

Matplotlib Installation Command:

$ pip install matplotlib

Ipython Installation Command:

$ pip install ipython[all]

Pandas Installation Command:

$ pip install pandas

Statsmodel Installation Command:

$ pip install statsmodels

Scikit-learn Installation Command:

$ pip install scikit-learn

Running Script File:

when virtualenv is in active mode you simple go the directiory where your python script file there and run your script file in general way.

$ python

Deactivate the virtual environment:

If you done working in the virtual environment for the moment, you can deactivate it.

$ deactivate

This puts you back to the system’s default Python interpreter with all its installed libraries.

To delete a virtual environment, just delete its folder. (In this case, it would be rm -rf dataaspirant_venv.)

Reference Links:

[ 1 ]

[ 2 ]

Follow us:


I hope you liked todays post. If you have any questions then feel free to comment below.  If you want me to write on one specific topic then do tell it to me in the comments below.

If you want share your experience or opinions you can say.

Hello to 


Home | About | Data scientists Interviews | For beginners | Join us |  Monthly newsletter