Pandas's dataframe is a 2-dimensional labelled data structure with columns of potentially different types. Think of it as a spreadsheet table or a SQL table. However, it requires 1 of the columns to be an index column.
For the example below, the Date column will be used as an index column.
Date | Stock | Open | High | Low | Close | Volume |
---|---|---|---|---|---|---|
2016-09-29 | KESM | 7.92 | 7.98 | 7.92 | 7.97 | 149400 |
2016-09-30 | KESM | 7.96 | 7.97 | 7.84 | 7.9 | 29900 |
2016-10-04 | KESM | 7.8 | 7.94 | 7.8 | 7.93 | 99900 |
2016-10-05 | KESM | 7.93 | 7.95 | 7.89 | 7.93 | 77500 |
2016-10-06 | KESM | 7.93 | 7.93 | 7.89 | 7.92 | 130600 |
2016-10-07 | KESM | 7.91 | 7.94 | 7.91 | 7.92 | 103000 |
Feed data to DataFrame
In the example below, I use read_csv() to read data from a csv file. However, I use StringIO() to simulate my data string as a file. Then, I specify that the index column is the Date column.
#!/usr/bin/python3 from io import StringIO import pandas as pd data="""Date,Stock,Open,High,Low,Close,Volume 2016-09-29,KESM,7.92,7.98,7.92,7.97,149400 2016-09-30,KESM,7.96,7.97,7.84,7.9,29900 2016-10-04,KESM,7.8,7.94,7.8,7.93,99900 2016-10-05,KESM,7.93,7.95,7.89,7.93,77500 2016-10-06,KESM,7.93,7.93,7.89,7.92,130600 2016-10-07,KESM,7.91,7.94,7.91,7.92,103000""" df = pd.read_csv(StringIO(data), index_col='Date', parse_dates=True) # Print data frame. print(df) # Print data type of all columns. print(df.dtypes)
Stock Open High Low Close Volume Date 2016-09-29 KESM 7.92 7.98 7.92 7.97 149400 2016-09-30 KESM 7.96 7.97 7.84 7.90 29900 2016-10-04 KESM 7.80 7.94 7.80 7.93 99900 2016-10-05 KESM 7.93 7.95 7.89 7.93 77500 2016-10-06 KESM 7.93 7.93 7.89 7.92 130600 2016-10-07 KESM 7.91 7.94 7.91 7.92 103000 Stock object Open float64 High float64 Low float64 Close float64 Volume int64 dtype: object
Access DataFrame data
You can access each column by calling:
- df['Stock']
- df['Open']
- df['High']
- df[' Low']
- df['Close']
- df['Volume']
# Print Low column values. print(df['Low']['2016-09-30']) print(df['Low'][1]) print(df['Low'])
7.84 7.84 Date 2016-09-29 7.92 2016-09-30 7.84 2016-10-04 7.80 2016-10-05 7.89 2016-10-06 7.89 2016-10-07 7.91 Name: Low, dtype: float64
However, for the index column(Date), you have to use df.index.
# Print index name. print(df.index.name) # Print index value at specific position. print(df.index[3]) # Print index values. print(df.index)
Date 2016-10-05 00:00:00 DatetimeIndex(['2016-09-29', '2016-09-30', '2016-10-04', '2016-10-05', '2016-10-06', '2016-10-07'], dtype='datetime64[ns]', name='Date', freq=None)
Prepare DataFrame for candlestick_ochl()
The candlestick_ochl(ax, quotes, ...) function can create candlestick chart but the quotes parameter requires time elements to be in float days format. So, we need to convert the Dates index column into float days format using date2num.
#!/usr/bin/python3 from io import StringIO import pandas as pd import matplotlib.dates as mdates ## Load data to DataFrame. data='''Date,Stock,Open,High,Low,Close,Volume 2016-09-29,KESM,7.92,7.98,7.92,7.97,149400 2016-09-30,KESM,7.96,7.97,7.84,7.9,29900 2016-10-04,KESM,7.8,7.94,7.8,7.93,99900 2016-10-05,KESM,7.93,7.95,7.89,7.93,77500 2016-10-06,KESM,7.93,7.93,7.89,7.92,130600 2016-10-07,KESM,7.91,7.94,7.91,7.92,103000''' df = pd.read_csv(StringIO(data), index_col='Date', parse_dates=True) ## Convert dates to be in float days format. ## This is required for candlestick_ochl(). df.index = df.index.map(mdates.date2num) ## Re-order columns. quotes = df[['Open', 'Close', 'High', 'Low', 'Volume']] # Print the outcome. print(quotes)
Open Close High Low Volume 736236.0 7.92 7.97 7.98 7.92 149400 736237.0 7.96 7.90 7.97 7.84 29900 736241.0 7.80 7.93 7.94 7.80 99900 736242.0 7.93 7.93 7.95 7.89 77500 736243.0 7.93 7.92 7.93 7.89 130600 736244.0 7.91 7.92 7.94 7.91 103000
Github
- https://github.com/xuanngo2001/python-examples/blob/master/pandas/pandas-dataframe-load-data.py
- https://github.com/xuanngo2001/python-examples/blob/master/pandas/pandas-dataframe-access-data.py
- https://github.com/xuanngo2001/python-examples/blob/master/pandas/pandas-dataframe.py