searchcode.com's SQLite database is probably bigger than yours


-rw-r--r-- 1 searchcode searchcode 6.4T Feb 17 04:30 searchcode.db

History

SQLite "database is locked"

dbRead, _ := connectSqliteDb("dbname.db")
defer dbRead.Close()
dbRead.SetMaxOpenConns(runtime.NumCPU())

dbWrite, _ := connectSqliteDb("dbname.db")
defer dbWrite.Close()
dbWrite.SetMaxOpenConns(1)

SQLite "database is locked"

func connectSqliteDb(location, name string) (*sql.DB, error) {
	db, err := sql.Open("sqlite", fmt.Sprintf("%s.db?_busy_timeout=5000", path.Join(location, name)))
	if err != nil {
		return nil, err
	}

	_, err = db.Exec(`
pragma journal_mode = wal;
pragma synchronous = normal;
pragma temp_store = memory;
pragma mmap_size = 268435456;
pragma foreign_keys = on;
pragma busy_timeout = 5000;`)
	if err != nil {
		slog.Warn("pragma issue", "err", err.Error())
	}

	return db, nil
}
You can just do things, such as lowercase SQL

SQLite cross compiling

GOOS=linux GOARCH=amd64 go build -ldflags="-s -w"

https://github.com/mattn/go-sqlite3

https://modernc.org/sqlite <---- use this

SQLC: The BEST ORM for Go

https://sqlc.dev/


Conversion

Converting 6 TB of data...

increment := 10_000
for i := 0; i < totalRowCount; i += increment {
	between, _ := db.GetBetween(i, i+increment)
	
	tx, _ := sqliteDb.BeginTx(context.Background(), nil)
	defer tx.Rollback()

	withTx := db.WithTx(tx)

	for _, b := range between {
		_ = withTx.Insert(context.Background(), insertParams{
			...
		})
	}

	_ = tx.Commit()
}
				

Compression...

Compress and Uncompress

select uncompress(content) from code;
insert into code (content) values (compress(?));
PRAGMA compression = 'zip';

sqlite-zstd "WARNING: I wouldn't trust it with my data (yet)."

BTRFS

sudo apt update
sudo apt install btrfs-progs
# help identify all the disks
lsblk

# format
mkfs.btrfs /dev/disk/DISK_ID

# make mount point
mkdir -p /mnt/MY_DISK

# mount
sudo mount -o compress=zstd:5 /dev/disk/by-id/DISK_ID /mnt/MY_DISK

# add to fstab
echo '/dev/disk/by-id/DISK_ID /mnt/MY_DISK btrfs compress=zstd:3 0 2' | sudo tee -a /etc/fstab

Compression Results

48 hours to transfer DB to new server!

$ compsize /mnt/data/searchcode.db
Processed 1 file, 16481352 regular extents (16481360 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       76%      4.8T         6.3T         6.3T       
none       100%      4.3T         4.3T         4.3T       
zstd        23%      470G         1.9T         1.9T

Backups.

b2 upload_file searchcode ./searchcode_analyse.db

Future state https://litestream.io/

Results...

It just works.

Was a little worried at first, but now I am using it for a lot more than it could previously cope with.

More flexible, can create more databases to handle other things, just another file.

Databases can be treated like tables.

Thank You!

Presentation located at https://boyter.org/ or just go to boyter.org and I will link it up once I fix hugo.

Or read the blog post https://boyter.org/posts/searchcode-bigger-sqlite-than-you/