Pandas - DataFrame

By xngo on March 3, 2020

Pandas's dataframe is a 2-dimensional labelled data structure with columns of potentially different types. Think of it as a spreadsheet table or a SQL table. However, it requires 1 of the columns to be an index column.

For the example below, the Date column will be used as an index column.

Date Stock Open High Low Close Volume
2016-09-29 KESM 7.92 7.98 7.92 7.97 149400
2016-09-30 KESM 7.96 7.97 7.84 7.9 29900
2016-10-04 KESM 7.8 7.94 7.8 7.93 99900
2016-10-05 KESM 7.93 7.95 7.89 7.93 77500
2016-10-06 KESM 7.93 7.93 7.89 7.92 130600
2016-10-07 KESM 7.91 7.94 7.91 7.92 103000

Feed data to DataFrame

In the example below, I use read_csv() to read data from a csv file. However, I use StringIO() to simulate my data string as a file. Then, I specify that the index column is the Date column.

#!/usr/bin/python3
 
from io import StringIO
import pandas as pd
 
data="""Date,Stock,Open,High,Low,Close,Volume
2016-09-29,KESM,7.92,7.98,7.92,7.97,149400
2016-09-30,KESM,7.96,7.97,7.84,7.9,29900
2016-10-04,KESM,7.8,7.94,7.8,7.93,99900
2016-10-05,KESM,7.93,7.95,7.89,7.93,77500
2016-10-06,KESM,7.93,7.93,7.89,7.92,130600
2016-10-07,KESM,7.91,7.94,7.91,7.92,103000"""
 
df = pd.read_csv(StringIO(data), index_col='Date', parse_dates=True)
 
# Print data frame.
print(df)
 
# Print data type of all columns.
print(df.dtypes)
           Stock  Open  High   Low  Close  Volume
Date                                             
2016-09-29  KESM  7.92  7.98  7.92   7.97  149400
2016-09-30  KESM  7.96  7.97  7.84   7.90   29900
2016-10-04  KESM  7.80  7.94  7.80   7.93   99900
2016-10-05  KESM  7.93  7.95  7.89   7.93   77500
2016-10-06  KESM  7.93  7.93  7.89   7.92  130600
2016-10-07  KESM  7.91  7.94  7.91   7.92  103000
 
Stock      object
Open      float64
High      float64
Low       float64
Close     float64
Volume      int64
dtype: object

Access DataFrame data

You can access each column by calling:

  • df['Stock']
  • df['Open']
  • df['High']
  • df[' Low']
  • df['Close']
  • df['Volume']
# Print Low column values.
print(df['Low']['2016-09-30'])
print(df['Low'][1])
print(df['Low'])
7.84
7.84
 
Date
2016-09-29    7.92
2016-09-30    7.84
2016-10-04    7.80
2016-10-05    7.89
2016-10-06    7.89
2016-10-07    7.91
Name: Low, dtype: float64

However, for the index column(Date), you have to use df.index.

# Print index name.
print(df.index.name)
 
# Print index value at specific position.
print(df.index[3])
 
# Print index values.
print(df.index)
Date
 
2016-10-05 00:00:00
 
DatetimeIndex(['2016-09-29', '2016-09-30', '2016-10-04', '2016-10-05',
               '2016-10-06', '2016-10-07'],
              dtype='datetime64[ns]', name='Date', freq=None)

Prepare DataFrame for candlestick_ochl()

The candlestick_ochl(ax, quotes, ...) function can create candlestick chart but the quotes parameter requires time elements to be in float days format. So, we need to convert the Dates index column into float days format using date2num.

#!/usr/bin/python3
from io import StringIO
import pandas as pd
 
import matplotlib.dates as mdates
 
## Load data to DataFrame.
data='''Date,Stock,Open,High,Low,Close,Volume
2016-09-29,KESM,7.92,7.98,7.92,7.97,149400
2016-09-30,KESM,7.96,7.97,7.84,7.9,29900
2016-10-04,KESM,7.8,7.94,7.8,7.93,99900
2016-10-05,KESM,7.93,7.95,7.89,7.93,77500
2016-10-06,KESM,7.93,7.93,7.89,7.92,130600
2016-10-07,KESM,7.91,7.94,7.91,7.92,103000'''
 
df = pd.read_csv(StringIO(data), index_col='Date', parse_dates=True)
 
## Convert dates to be in float days format.
##  This is required for candlestick_ochl().
df.index = df.index.map(mdates.date2num)
 
## Re-order columns.
quotes = df[['Open', 'Close', 'High', 'Low', 'Volume']]
 
# Print the outcome.
print(quotes)
          Open  Close  High   Low  Volume
736236.0  7.92   7.97  7.98  7.92  149400
736237.0  7.96   7.90  7.97  7.84   29900
736241.0  7.80   7.93  7.94  7.80   99900
736242.0  7.93   7.93  7.95  7.89   77500
736243.0  7.93   7.92  7.93  7.89  130600
736244.0  7.91   7.92  7.94  7.91  103000

Github

  • https://github.com/xuanngo2001/python-examples/blob/master/pandas/pandas-dataframe-load-data.py
  • https://github.com/xuanngo2001/python-examples/blob/master/pandas/pandas-dataframe-access-data.py
  • https://github.com/xuanngo2001/python-examples/blob/master/pandas/pandas-dataframe.py

About the author

Xuan Ngo is the founder of OpenWritings.net. He currently lives in Montreal, Canada. He loves to write about programming and open source subjects.