Overview
In mobile device, disk space is limited. Therefore, it is important to reduce the size of every files in your applications. In my case, for my Android app iTransit, the culprit is the SQLite database file size. It is 254MB. It contains all the time schedules of Montreal bus and metro.
Compact SQLite
By default, SQLite doesn't automatically compact itself. This means that even if you delete a big table, the database file size will not shrink. The space occupied by the big table is simply marked as available for reuse. This in turn allow you to recover deleted data later on, if needed.
To ask SQLite to compact itself automatically, run the following parameters:
PRAGMA auto_vacuum = FULL; VACUUM;
Convert to the correct data type
When I imported all my CSV files into SQLite, all columns are defined as TEXT. Then, I converted each column to the correct data type. The database file shrink to 174MB from 254MB, saving 80MB. TEXT field requires more space than INTEGER field.
List total row of every tables.
Let's find out which tables are the biggest. So, we can focus on reducing them. The following Bash script will show a list of total row of every tables
# NOTE: Change MY_DATABASE_FILE.db to your database filename. db_file="MY_DATABASE_FILE.db" row_count_results="" while IFS='' read -r table_name || [[ -n "${table_name}" ]]; do row_count=$(echo -e "SELECT COUNT(*) FROM ${table_name}" | sqlite3 "${db_file}") row_count_results="${row_count_results}${table_name}|${row_count}\n" done < <( echo ".tables" | sqlite3 "${db_file}" | xargs -n1 echo ) echo -e "${row_count_results}" | sort -t '|' -k2,2n | awk NF |\ awk -F '|' '{printf "%-15s%9'"'"'d\n", $1, $2}'
Here is an example of the output from my database file:
agency 1 feed_info 1 fare_attributes 2 calendar 39 calendar_dates 58 fare_rules 226 routes 226 frequencies 867 stops 9,208 trips 44,985 stop_times 1,705,117
My biggest table is stop_times
, with 1.7 million rows. So, I will focus to reduce this table.
Change design to reduce number of rows
Each bus and metro departure time is stored as 1 row in stop_times
table. Here is an excerpt of the data of stop_times
table.
trip_id departure_time stop_id stop_sequence ---------- -------------- ---------- ------------- 186813279 06:00:00 51095 1 186813280 06:54:00 51095 36 186813281 07:00:00 51095 1 186813282 07:54:00 51095 36 186813283 08:00:00 51095 1 186813284 08:56:00 51095 36
For 6AM, we 1 row for 0-minute and another 1 for 54-minute. Why not merging both rows into 1. And ditch the number of seconds part. I don't need to display the seconds.
To merge rows, I have to:
- Breakdown
departure_time
into 2 columns:hour
andminutes
- Merge rows using
GROUP BY hour
and concatenate all minutes into 1 row usingGROUP_CONCAT()
function.
The stop_times
table shrink to 650K rows from 1.7 millions rows. For step-by-step instructions, see Merge rows in SQLite.
Final size
I also delete useless tables and columns. My final database file shrink to 34MB from 254MB.