Python for Data Science - November 2-6, 2020


November 2-6, 2020

Duration: 5 days


Live Virtual Class

  • Overview

    This course introduces the Python language to students who want to use Python as a tool for their data science initiatives. The goal is to become proficient enough with the Python language to leverage powerful Data Science packages such as Pandas and matplotlib.

    This is a comprehensive introduction to Python programming with a focus on understanding and using the Pandas library for storing data in DataFrames and plotting portions of the data with matplotlib. In addition to data visualization, you will learn how to use the Pandas library to import and filter data. Typical data science skills such as data interpretation and analysis will be addressed.

  • Who Should Take This Course


    This course is suitable for: Data analysts, Data scientists, Data engineers, and Developers.


    Students should have a basic proficiency in some programming language. Prerequisite language skills include understanding of datatypes, Boolean logic, control flow and basics of collections, such as arrays or hash tables. An understanding of using Excel for data manipulation is helpful.

  • Course Outline

    Getting Around in Python

    · Using Python at the Command Line

    · Running the Interactive Shell

    · Using Jupyter Notebooks

    Jupyter Notebook Basics

    · Cell Types

    · Edit and Command Mode

    · Running cells

    · Output

    · Restarting the Kernel

    · Exporting the Notebook

    · Cell and Line Magics

    Python Basics

    · Comments, Indenting, print()

    · Variables

    · Types

    · Operators

    · Control Flow


    · Lists

    · Tuples

    · Sets

    · Dictionaries


    · List and Set Type Comprehensions

    · Comprehensions as Generator Expressions

    Functions and Lambda Expressions

    · Built-in Functions

    · User-defined functions

    · Anonymous in-line functions

    Using Modules

    · Importing and Selective Importing

    · Properties

    · Methods

    · random and math Modules

    Data Sources and Formats

    · CSV, TSV

    · JSON

    · SQL

    · Others: XML, YAML, Splunk

    Using NumPy

    · ndarray

    · Indexing and Slicing

    · Masking and Broadcasting

    · Sorting

    Pandas Basics

    · Why Pandas?

    · Series

    · DataFrames

    · Populating DataFrames

    · Importing CSV, Excel, SQL Data

    · DataFrame Columns and Cells

    · DataFrame Retrieval

    Pandas and Data Analysis

    · Functions on DataFrames

    · Mapping

    · Using Lambdas

    · Sorting

    · Statistics

    · Merging and Concatenating DataFrames

    · Data Cleaning

    · Data Analysis

    · Groupby

    · Aggregate Functions

    Data Visualization

    · Plotting with matplotlib

    · Enhancing Visualizations with seaborn