Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter


Blog Posts:

1. Step by step kaggle competition tutorial:

In this article we are going to see how to go through a Kaggle competition step by step.The contest explored here is the San Francisco Crime Classification contest. The goal is to classify a crime occurrence knowing the time and place it happened.

Read Complete Post:  datanice Blog

2. Introduction to machine learning: 

It is an attempt to make things more intelligent. Most of us have come across terms like “Artificial Neural Networks”, it is an attempt to replicate the working of the human brain. Even something like this is not necessarily always complex. At its heart, it is just multiplication and differentiation. Yes, Maths at it again but it’s rather what you learned at school, no different (This coming from a guy who is petrified of maths)

Read Complete Post:

3. Baidu research chief Andrew NG fixed on self-taught computers self-driving cars:

Artificial-intelligence whiz Andrew Ng hangs his hat these days at a nondescript building in Sunnyvale that serves as the Silicon Valley outpost of the Chinese search giant Baidu.

Read Complete Post: Seattle times

4. Misleading modelling overfitting cross-validation and the bias-variance trade-off:

In this post you will get to grips with what is perhaps the most essential concept in machine learning: the bias-variance trade-off. The main idea here is that you want to create models that are as good at prediction as possible but that are still applicable to new data (i.e. they are generalizable). The danger is that you can easily create models that overfit to the local noise in your specific dataset, which isn’t too helpful and leads to poor generalizability since the noise is random and therefore different in each dataset. Essentially, you want to create models that capture only the useful components of a dataset. On the other hand, models that generalize very well but are too inflexible to generate good predictions are the other extreme you want to avoid (this is called underfitting).

Read Complete Post: Cambridge coding

5. Association rules and the apriori algorithm:

When we go grocery shopping, we often have a standard list of things to buy. Each shopper has a distinctive list, depending on one’s needs and preferences. A housewife might buy healthy ingredients for a family dinner, while a bachelor might buy beer and chips. Understanding these buying patterns can help to increase sales in several ways.

Read Complete Post: Annalyzin

6. Churn prediction pyspark using mllib and ml packages:

Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a subscription to a service. Though originally used within the telecommunications industry, it has become common practice across banks, ISPs, insurance firms, and other verticals.

The prediction process is heavily data driven and often utilizes advanced machine learning techniques. In this post, we’ll take a look at what types of customer data are typically used, do some preliminary analysis of the data, and generate churn prediction models – all with PySpark and its machine learning frameworks. We’ll also discuss the differences between two Apache Spark version 1.6.0 frameworks, MLlib and ML.

Read Complete Post: mapr

7. Data scientist keeps ranking top every best jobs list:

Data scientist is at, or near, the top of just about every “best jobs” survey, report, or study released in the past few years. Harvard Business Review named it the sexiest job of the 21st century. And with a median base salary of $96,000, data scientist and some engineering specialties are in a very small group of high-paying jobs that don’t require a medical or law degree. However, you’ll likely need more than a bachelor’s degree, as you’ll find out later in this article.

Read Complete Post: goodcall

8. Understand machine learning data descriptive statistics python: 

Looking at the raw data can reveal insights that you cannot get any other way. It can also plant seeds that may later grow into ideas on how to better preprocess and handle the data for machine learning tasks.

Read Complete Post: machinelearningmastery

9. Time series interventions and contribution:

This article illustrates principles of an analysis of the President George W. Bush’s job approval from January 2001 through Sep 2004 with disposable income excluded from the statistical model. To see a version complete with code and its description, visit

Presidents with a job approval rating of less than 50 percent are unlikely to be re-elected. During June, Bush’s job approval rating averaged 47 percent in five major polls.

Read Complete Post: gladwinanalytics

10. Deep neural networks creative deep learning art:

Are deep neural networks creative? It seems like a reasonable question. Google’s “Inceptionism” technique transforms images, iteratively modifying them to enhance the activation of specific neurons in a deep net. The images appear trippy, transforming rocks into buildings or leaves into insects. Another neural generative model, introduced by Leon Gatys of the University of Tubingen in Germany, can extract the style from one image (say a painting by Van Gogh), and apply it to the content of another image (say a photograph).

Read Complete Post: kdnuggets

Video Courses:

  1. Introduction to python for data science
  2. Data exploration with kaggle scripts

That’s all for April-2016 newsletter. Please leave your suggestions on newsletter in the comment section. To get all  dataaspirant newsletters you can visit monthly newsletter page.

Follow us:


Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter


Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter


Blog Posts:

1. Cricket survial Analysis:

Survival Analysis is used in areas where the time duration of a sample of observations is analysed until an event of death occurs. Survival analysis is applied to mechanical engineering to predict systems failures and in medical sciences to predict patient outcomes.

In this post I’ll be using Survival Analysis for a more lighthearted application–to analyze the career lengths of Cricket players.

Read Complete Post on:

2. Complete tutorial learn data science python scratch:

Python was originally a general purpose language. But, over the years, with strong community support, this language got dedicated library for data analysis and predictive modeling.

Due to lack of resource on python for data science, I decided to create this tutorial to help many others to learn python faster. In this tutorial, we will take bite sized information about how to use Python for Data Analysis, chew it till we are comfortable and practice it at our own end.

Read Complete Post on: analyticsvidhya

3. Markov Chains Explained Visually:

Markov chains, named after Andrey Markov, are mathematical systems that hop from one “state” (a situation or set of values) to another. For example, if you made a Markov chain model of a baby’s behavior, you might include “playing,” “eating”, “sleeping,” and “crying” as states, which together with other behaviors could form a ‘state space’: a list of all possible states. In addition, on top of the state space, a Markov chain tells you the probabilitiy of hopping, or “transitioning,” from one state to any other state—e.g., the chance that a baby currently playing will fall asleep in the next five minutes without crying first.

Read Complete Post on: setosa

4. Data scientist Role:

The age of big data is upon us, and it’s here to stay. With more data being collected than ever before, extracting value from this data is only going to become more intricate and demanding as time goes on. The logic behind the big data economy is shaping our personal lives in ways that we probably can’t even conceive or predict; every electronic move that we make produces a statistic and insight into our life.

As participants in the consumer economy, we are mined for data when we connect to any website or electronic service, and a data scientist is there to collect, clean, analyse and predict the data that we provide by using a combination of computer science, statistical analysis and intricate business knowledge.

As we can see, this responsibility is a combination of multiple skill sets and expertise compared to a typical Big Data Developer or Business Analyst.

Read Complete Post on: globalbigdataconference

5. An introduction to deep learning from perceptrons to deep networks:

In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyond the academic world with major players like Google, Microsoft, and Facebook creating their own research teams and making some impressive acquisitions.

Some this can be attributed to the abundance of raw data generated by social network users, much of which needs to be analyzed, as well as to the cheap computational power available via GPGPUs.

Read Complete Post on: toptal

6. Books for statistics mathematics data science:

The selection process of data scientists at Google gives higher priority to candidates with strong background in statistics and mathematics. Not just Google, other top companies (Amazon, Airbnb, Uber etc) in the world also prefer candidates with strong fundamentals rather than mere know-how in data science.

If you too aspire to work for such top companies in future, it is essential for you to develop a mathematical understanding of data science. Data science is simply the evolved version of statistics and mathematics, combined with programming and business logic. I’ve met many data scientists who struggle to explain predictive models statistically.

More than just deriving accuracy, understanding & interpreting every metric, calculation behind that accuracy is important. Remember, every single ‘variable’ has a story to tell. So, if not anything else, try to become a great story explorer!

Read Complete Post on: Analyticsvidhya

7. Hadoop installation using Ambari:

Apache Hadoop has become a de-facto software framework for reliable, scalable, distributed and large scale computing.  Unlike other computing system, it brings computation to data rather than sending data to computation. Hadoop was created in 2006 at Yahoo by Doug Cutting based on paper published by Google. As Hadoop has matured, over the years many new components and tools were added to its ecosystem to enhance its usability and functionality. Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop etc. to name a few.

Read Complete Post on: Edupristine


  1. How to use data to make a hit tv show
  2. Solving  Big Problems with Julia
  3. Best Data visualization

That’s all for March-2016 newsletter. Please leave your suggestions on newsletter in the comment section. To get all  dataaspirant newsletters you can visit monthly newsletter page.

Follow us:


Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter

DataAspirant July2015 newsletter

Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter



This July newsletter our team rounded up the best blogs for anyone interested in learning more about data science. Whatever your experience level in data science or someone who’s just heard of the field  these blogs provide enough detail and context for you to understand what you’re reading. We hope you will enjoy July dataaspirant newsletter.

Blog Posts:

1. Can deep learning help find perfect girl:

When a Machine Learning PhD student at University of Montreal starts using Tinder, he soon realizes that something is missing in the dating app – the ability to predict to which girls he is attracted. Harm de Vries applies Deep Learning to assist in the pursuit of the perfect match.

Read Complete Post On : kdnuggets Blog… 

2. Data Scientist to follow on Github:

The lives of people on GitHub doesn’t appear to as tempting as you would observe on other platforms, but if you love coding, programming and data science, you’ll surely enjoy the company of 9 million users on this platform!

Following influencers is usually a good practice. It has helped me in multiple ways.

Read Complete Post On :

3. What is data science? Why study data science?

“Data Science is extraction of knowledge from data”.

“By combining aspects of statistics, computer science, applied mathematics, and visualization, data science can turn the vast amounts of data the digital age generates into new insights and new knowledge”.

Read Complete Post On :  welcomedata

4. What is deep learning?

Imagine a baby coming into the world. From the moment she opens her eyes, there is a world of learning to absorb. Every action will create an imprint on her brain, and she will not stop learning until the moment she leaves the world. She develops because of her environment, because of the things she studied, the books she read, the work she carried out, the people she talked with, etc.

Now, imagine an artificial neural network – that learns and draws its own conclusions. Similar in some ways to a human brain, but far easier to program, and with a limitless potential for growth. It analyses arrays of data – looking at inputs and outputs. If a cat is thrown into some water, it won’t be too happy. A deep learning array could work that one out without breaking a sweat.

Read Complete Post On :

5. Introduction to big data with apache spark:

With the advent of new technologies, there has been an increase in the number of data sources. Web server logs, machine log files, user activity on social media, recording a user’s clicks on the website and many other data sources have caused an exponential growth of data. Individually this content may not be very large, but when taken across billions of users, it produces terabytes or petabytes of data. For example, Facebook is collecting 500 terabytes(TB) of data everyday with more than 950 million users. Such a massive amount of data which is not only structured but also unstructured and semi-structured  is considered under the roof known as Big Data.

Read Complete Post On : rideondata blog

6. Data manipulation primitives in R and Python:

Both R and Python are incredibly good tools to manipulate your data and their integration is becoming increasingly important1. The latest tool for data manipulation in R is Dplyr2 whilst Python relies onPandas3.

This blog post  show you the fundamental primitives to manipulate your dataframes using both libraries highlighting their major advantages and disadvantages.

Read Complete Post On : blog

7. Brain vs deep learning singularity:

This blog post is complex as it arcs over multiple topics in order to unify them into a coherent framework of thought. I have tried to make this article as readable as possible, but I might have not succeeded in all places. Thus, if you find yourself in an unclear passage it might become clearer a few paragraphs down the road where I pick up the thought again and integrate it with another discipline.

Read Complete Post On : timdettmers blog

8. Classification trees using R:

The goal of classification trees is to predict or explain responses on a categorical dependent variable, such as species of trees or customer segmentation classes. As such, the available techniques have much in common with the techniques used in the more traditional methods of Discriminant Analysis, Cluster Analysis, Nonparametric Statistics, and Nonlinear Estimation. The flexibility of classification trees make them a very attractive analysis option. However, like all methods, I do not recommend their use to the exclusion of more traditional methods. Indeed, when the typically more stringent theoretical and distributional assumptions of more traditional methods are met, the traditional methods may be preferable. But as an exploratory technique, or as a technique of last resort when traditional methods fail, classification trees are, in the opinion of many researchers, unsurpassed.

Read Complete Post On :


1. Sparkling pandas

2. Data mining technique clustering for bank data set

3. How to Become a Data Scientist

4. How to Interview a Data Scientist

5. Scalable, Distributed, Machine Learning for Big Data


That’s all for July 2015 newsletter. Please leave your suggestions on newsletter in the comment box so that we improve for next month newsletter. To get total dataaspirant newsletters you can visit monthly newsletter page. Subscribe to our blog so that every month you get our news letter in your inbox.


Follow us:


Home | About | Data scientists Interviews | For beginners | Join us | Monthly newsletter