The Important Skill in Machine Learning & Data Analysis

What is Exploratory Data Analysis?

EDA is a visual and statistical process that allows us to take a glimpse into the data before the analysis. It lets us test the assumptions that we might have about the data, proving or disproving our prior beliefs and biases. It lays the foundation for the analysis, so our results go along with our expectations. In a way, it’s a quality check for our predictions.

Exploratory Data Analysis is the first course in the Machine Learning Program that introduces learners to a broad range of Machine Learning concepts, applications, challenges, and solutions - while utilizing interesting real-life datasets.

Why is it important?

As any data scientist would agree, the most challenging part of any data analysis is to obtain good quality data to work with. Nothing is served to us on a silver plate, data comes in different shapes and formats. It can be structured and unstructured, it may contain errors or be biased, it may have missing fields, and it can have different formats than what an untrained eye would perceive. For example, when we import some data, very often it would contain a time stamp. To a human, it is an understandable format that can be interpreted. But to a machine, it is not interpretable, so it needs to be told what that means, the data needs to be transformed into simple numbers first. There are also different date-time conventions depending on a country (i.e., Canadian versus USA), metric versus imperial systems, and many other data features that need to be recognized before we start doing the analysis. Therefore, the first step before performing any analysis – is get really aquatinted with your data!

The Secret to Become an Expert with Data

IBM Exploratory Data Analysis for Machine Learning

This course will teach you to ‘see’ and to ‘feel’ the data as well as to transform it into an analysis-ready format. It is an introductory level course, so no prior knowledge is required, and it is a good starting point if you are interested in getting into the world of Machine Learning. The only thing that is needed is a computer with internet, your curiosity and eagerness to learn and to apply acquired knowledge.

If you live in Canada, you might be interested in gasoline prices in different cities or if you are an insurance actuary you need to analyze the financial risks that you will take based on your client's information. Whatever the case, you will be able to do your own analysis and confirm or disprove some of the existing information.

What's Included?

The course contains videos and reading material, as well as a lot of interactive practice labs where learners can explore and apply the skills learned. It will allow you to use Python language in Jupyter Notebook, a cloud-based skills network environment that is pre-set for you with all available to be downloaded packages and libraries. It will introduce you to the most common visualization libraries such as Pandas, Seaborn, and Matplotlib to demonstrate various EDA techniques with some real-life datasets.

Start Now:

IBM Exploratory Data Analysis for Machine Learning

About The Author

Svitlana (Lana) Kramar

I am a Data Science Intern at IBM and a Master's student in Data Science and Analytics at the University of Calgary, who enjoys travelling, learning new languages and cultures and loves spreading her passion for Data Science.