Reduce SQLite database size for Android

By xngo on March 4, 2019

Overview

In mobile device, disk space is limited. Therefore, it is important to reduce the size of every files in your applications. In my case, for my Android app iTransit, the culprit is the SQLite database file size. It is 254MB. It contains all the time schedules of Montreal bus and metro.

Compact SQLite

By default, SQLite doesn't automatically compact itself. This means that even if you delete a big table, the database file size will not shrink. The space occupied by the big table is simply marked as available for reuse. This in turn allow you to recover deleted data later on, if needed.

To ask SQLite to compact itself automatically, run the following parameters:

PRAGMA auto_vacuum = FULL;
VACUUM;

Convert to the correct data type

When I imported all my CSV files into SQLite, all columns are defined as TEXT. Then, I converted each column to the correct data type. The database file shrink to 174MB from 254MB, saving 80MB. TEXT field requires more space than INTEGER field.

List total row of every tables.

Let's find out which tables are the biggest. So, we can focus on reducing them. The following Bash script will show a list of total row of every tables

# NOTE: Change MY_DATABASE_FILE.db to your database filename.
db_file="MY_DATABASE_FILE.db"
row_count_results=""
while IFS='' read -r table_name || [[ -n "${table_name}" ]]; do
 
    row_count=$(echo -e "SELECT COUNT(*) FROM ${table_name}" | sqlite3 "${db_file}")
    row_count_results="${row_count_results}${table_name}|${row_count}\n"
done < <( echo ".tables" | sqlite3 "${db_file}" | xargs -n1 echo )
 
echo -e "${row_count_results}" | sort -t '|' -k2,2n | awk NF |\
        awk -F '|' '{printf "%-15s%9'"'"'d\n", $1, $2}'

Here is an example of the output from my database file:

agency                 1
feed_info              1
fare_attributes        2
calendar              39
calendar_dates        58
fare_rules           226
routes               226
frequencies          867
stops              9,208
trips             44,985
stop_times     1,705,117

My biggest table is stop_times, with 1.7 million rows. So, I will focus to reduce this table.

Change design to reduce number of rows

Each bus and metro departure time is stored as 1 row in stop_times table. Here is an excerpt of the data of stop_times table.

trip_id     departure_time  stop_id     stop_sequence
----------  --------------  ----------  -------------
186813279   06:00:00        51095       1            
186813280   06:54:00        51095       36           
186813281   07:00:00        51095       1            
186813282   07:54:00        51095       36           
186813283   08:00:00        51095       1            
186813284   08:56:00        51095       36           

For 6AM, we 1 row for 0-minute and another 1 for 54-minute. Why not merging both rows into 1. And ditch the number of seconds part. I don't need to display the seconds.

To merge rows, I have to:

  1. Breakdown departure_time into 2 columns: hour and minutes
  2. Merge rows using GROUP BY hour and concatenate all minutes into 1 row using GROUP_CONCAT() function.

The stop_times table shrink to 650K rows from 1.7 millions rows. For step-by-step instructions, see Merge rows in SQLite.

Final size

I also delete useless tables and columns. My final database file shrink to 34MB from 254MB.

About the author

Xuan Ngo is the founder of OpenWritings.net. He currently lives in Montreal, Canada. He loves to write about programming and open source subjects.