Backup a MiaRec Cluster

The following section describes the process of backing up a MiaRec database and audio files.

A few considerations about backing up data:

Backups should not be stored on the instances you are backing up. The Database backup and Audio File backup should be stored off-site. This guide will describe storing them securely in Amazon S3.
Backups are only valid at the time they are taken, for that reason, a backup process should be run periodically to maintain up-to-date records. At a minimum, it is recommended to back up daily.

Prepare AWS Configuration

To support the offload of MiaRec backups, two S3 buckets must be provisioned. One bucket will be used as a destination for database backups, another bucket will be used as remote storage for audio files.

Why can't audio files and database backups share the same bucket? Database backups should be stored using the WORM (write-once-read-many) model to prevent corruption or tampering, whereas audio files will need to be periodically modified or removed depending on the retention policies. To support WORM storage, S3 Object Lock has to be enabled, this is defined at the bucket level, requiring separate buckets.

More information about S3 Object Lock can be found at the following link. https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html

AWS Config for database backups

Create a bucket

An S3 bucket with the Object lock must be created as a destination target for database backups.

From Amazon S3 console at https://console.aws.amazon.com/s3/.

Choose Create bucket from the console top menu to create a new Amazon S3 bucket.
On the Create bucket setup page, perform the following actions:
1. For General configuration:
  1. Provide a unique name for the new bucket in the Bucket name box.
  2. From the AWS Region dropdown list, select the AWS cloud region where the new S3 bucket will be created.
For Block Public Access settings for this bucket, select Block all public access to ensure that all public access to this bucket and its objects is blocked.
(Optional) Tags, use the Add tag button to create and apply user-defined tags to the S3 bucket. You can track storage costs and other criteria by tagging your bucket.
(Recommended) For Default encryption, select Enable under Server-side encryption, and choose Amazon S3-managed keys (SSE-S3).
For Advanced settings, choose Enable under Object Lock to enable the Object Lock feature. Choose I acknowledge that enabling Object Lock will permanently allow objects in this bucket to be locked for confirmation.
Choose Create bucket to create your new Amazon S3 bucket.

Result

S3 bucket will be created and displayed in the console.

S3 Bucket Created

Set Object Lock Retention policy

Object Lock Retention policy needs to be configured to ensure items can be eventually removed and guarantees data integrity for a defined period.

From Amazon S3 console at https://console.aws.amazon.com/s3/.

Click on the name of the newly created Amazon S3 bucket.
Select the Properties tab from the console menu to access the bucket properties.
In the Object Lock section, choose Edit to configure the feature settings available for the S3 objects that are uploaded without Object Lock configuration.
Within the Object Lock configuration section.
1. Choose Enable under Default retention.
2. Select Compliance so that a protected object version cannot be overwritten or deleted by any user, including the root account user. Once an S3 object is locked in Compliance mode, its retention mode cannot be reconfigured and its retention period cannot be shortened. This retention mode ensures that an object version can't be overwritten or deleted for the duration of the retention period
3. Define the Default retention period.
4. Click Save changes to apply the configuration changes.

Result

Object Lock Configuration will be modified.

Object Lock Configuration

Note, that the Object Log Retention configuration doesn't delete files after the specified retention period. It just prevents the deletion of the files during the specified retention period. To delete the old backup files, you must use Amazon S3 lifecycle policies.

Delete files in S3 bucket when Object lock is present

It is possible to "delete" an object that is currently in Object lock in the same manner you normally delete a file.

Select the object, and click Delete, then confirm the action.

Delete files in S3 Bucket

This will display a successful delete, however, this is misleading, by toggling the Show Version, you can see what happened is that a Delete marker was applied, and the previous version is still available for download.

Show Version

However, this is temporary, after the object lock retention period expires the object will be deleted.

IAM policy for access to the database backup bucket

An IAM Policy has to be created and assigned to an IAM user so that objects can be added to the S3 bucket by that IAM user.

From Amazon IAM console at https://console.aws.amazon.com/iam/

From the Policies menu. Choose Create Policy to create a new IAM Policy.
Select JSON tab, copy the following access policy and paste it into the JSON field. Do not forget to replace miarec-db-backup with your bucket name!!!.

{
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "s3:ListBucket"
        ],
        "Resource": [
            "arn:aws:s3:::miarec-db-backup\"
        ]
    },
    {
        "Effect": "Allow",
        "Action": [
            "s3:PutObject",
            "s3:GetObject"
        ],
        "Resource": [
            "arn:aws:s3:::miarec-db-backup/*"
        ]
    }
]
}

JSON Access Policy

(Optional) Tags, use the Add tag button to create and apply user-defined tags to the resource.

Review policy, choose a descriptive name and description for the policy and click Create policy button.

Review Policy

Result

The policy will be created and ready to be assigned.

Policy Result

IAM User for database backup bucket

IAM user has to be created that can be used later to push database backups to the S3 bucket.

From Amazon IAM console at https://console.aws.amazon.com/iam/

From the Users menu, choose Add User to create a new IAM User.
Details, choose User name and enable Programmatic access.
Permissions, select Attach existing policies directly and then select the previously created policy from the list. Use the search box to find the policy by name.
(Optional) Tags, use the Add tag button to create and apply user-defined tags to the resource.
Review, confirm the settings and click Create user.
On the Complete screen, copy Access Key ID and Secret access key and store them in a secure place. This will be used later to push database backup to S3.

Access Key ID

Result

The user will be added, access key and the secret access key will be available to use to access the S3 bucket.

User Added to S3 Bucket

S3 bucket for audio files

Create a bucket

An S3 bucket must be created as a storage target for the audio files.

From Amazon S3 console at https://console.aws.amazon.com/s3/.

Choose Create bucket from the console top menu to create a new Amazon S3 bucket.
On the Create bucket setup page, perform the following actions:
1. For General configuration:
  1. Provide a unique name for the new bucket in the Bucket name box.
  2. From the AWS Region dropdown list, select the AWS cloud region where the new S3 bucket will be created
2. For Block Public Access settings for bucket, select Block all public access to ensure that all public access to this bucket and its objects is blocked.
3. (Optional) Tags, use the Add tag button to create and apply user-defined tags to the S3 bucket. You can track storage costs and other criteria by tagging your bucket.
4. For Default encryption, select Enable under Server-side encryption, and choose one of the encryption key types available. If you don't know what to choose, then choose Amazon S3-managed keys (SSE-S3).
Choose Create bucket to create your new Amazon S3 bucket.

Result

S3 bucket will be created and displayed in the console.

S3 Bucket Displayed in Console

IAM policy for access to the audio bucket

An IAM Policy has to be created and assigned to an IAM user so that objects can be added to the S3 bucket by that IAM user

From Amazon IAM console at https://console.aws.amazon.com/iam/.

From the Policies menu. Choose Create Policy to create a new IAM Policy.
Select JSON tab, copy the following access policy and paste it into the JSON field. Do not forget to replace miarec-audio-storage with your bucket name!!!.

{
"Version\": "2012-10-17",
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "s3:ListBucket"
        ],
        "Resource": [
            "arn:aws:s3:::miarec-audio-storage"
        ]
    },
    {
        "Effect": "Allow",
        "Action": [
            "s3:PutObject",
            "s3:GetObject",
            "s3:DeleteObject"
        ],
        "Resource": [
            "arn:aws:s3:::miarec-audio-storage/*"
        ]
    }
]
}

(Optional) Tags, use the Add tag button to create and apply user-defined tags to the resource. You can track cost and other criteria by tagging your resource.

Review policy, choose a descriptive name and description for the policy and click the Create policy button.

Result

The policy will be created and ready to be assigned

Policy Ready To Be Assigned

IAM User for audio bucket

IAM user has to be created that can be used to relocate audio files from Miarec to S3 storage. We recommend using a separate user account rather than granting the same user access to both database backup and audio file buckets.

From Amazon IAM console at https://console.aws.amazon.com/iam/.

From the Users menu, choose Add User to create a new IAM User.
Details, choose User name and enable Programmatic access.
Permissions, select Attach existing policies directly and then select the previously created policy from the list. Use the search box to find the policy by name.
(Optional) Tags, use the Add tag button to create and apply user-defined tags to the resource.
Review, confirm the settings and click Create user.
On the Complete screen, copy Access Key ID and Secret access key and store them in a secure place. This will be used later to configure a Storage target in the MiaRec application.

Access Key ID

Result

User will be added, access key and the secret access key will be available to use to access the S3 bucket.

IAM User Ready

Database backup using pg_dump utility.

The following commands need to be executed from the Database instance. In an all-in-one configuration, the PostgreSQL database service coexists with other MiaRec components on the same server instance, however, in larger deployments (i.e. decoupled architecture) the database service is run on a dedicated server, in which case it should be listed in the [db\ group in your ansible inventory.

How do I know I am on the right server? You can check for the postgresql service:

[centos@miarecdb ~]$ systemctl | grep postgresql
  postgresql-12.service                                                   loaded active running   PostgreSQL 12 database server
[centos@miarecdb ~]$

Install AWS CLI

awscli package is required to transfer database backups to an S3 bucket.

Install unzip

sudo yum install -y unzip

Install aws-cli

For the latest version of the AWS CLI, use the following command block:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

Verification

[centos@miarecdb ~]$ aws --version
aws-cli/2.7.25 Python/3.9.11 Linux/3.10.0-1160.el7.x86_64 exe/x86_64.centos.7 prompt/off

Manual backup MiaRec database using `pg_dump`

A backup of the MiaRec database can be executed on demand using the pg_dump utility. This is a good idea to execute on initial deployment, to verify operation. The pg_dump utility is installed as part of the postgresql package, so it does not need to be installed.

Execute the pg_dump utility

sudo -iu postgres pg_dump -F c -f /tmp/miarecdb-$(date "+%Y.%m.%d-%H.%M.%S").backup miarecdb

Let's break down those options.

sudo -iu postgres instructs the shell to execute the trailing command on behalf of user postgres, this is required for authentication to the database.
pg_dump calls the pg_dump utility.
-F c Sets an output file format, in this case, a custom archive suitable for input into pg_restore. This is the most flexible format in that it allows reordering of loading data as well as object definitions. This format is also compressed by default.
-f /path/to/outputfile Send output to the specified path, this directory will need to be accessed by the postgres user, a directory like /tmp would be a suitable destination.
$(date "+%Y.%m.%d-%H.%M.%S") will be replaced with the current timestamp, like 2022.08.02-12.13.14.
miarecdb the target database for dump, this should always be miarecdb.

Verification

An archive will be produced at the specified path

[centos@miarecdb ~]$ ls -l /tmp
-rw-rw-r--. 1 postgres postgres 1102336 Aug  2 16:35 miarecdb-2022.08.02-12.13.14.backup
[centos@miarecdb ~]$

Manually copy database backup to S3

Set AWS credentials

Use the previously created user credentials for the the dabase backup bucket. Using environmental variables, these values will be set for this session, but not be retained after the session is ended.

export AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
export AWS_SECRET_ACCESS_KEY=<AWS_AC_KEY_ID>

Execute Copy to S3

aws s3 cp /path/to/source s3://{s3-bucket-name}

Let's break down those options.

aws s3 cp calls the aws cli utility and instructs it to copy a file to s3.
/path/to/source path and file name of source database dump file.
s3://{s3-bucket-name} Sets a destination of s3 transfer, this should be the bucket created in an earlier step.

Example Copies a file from the /tmp directory to S3 bucket miarec-db-backup

aws s3 cp /tmp/miarecdb-2022.08.17-21.05.24.backup s3://miarec-db-backup

Result

The object will be moved to the defined bucket

[centos@miarecdb ~]$ aws s3 cp /tmp/miarecdb-2022.08.25-19.47.14.backup s3://miarec-db-backup
upload: ../../tmp/miarecdb-2022.08.25-19.47.14.backup to s3://miarec-db-backup/miarecdb-2022.08.25-19.47.14.backup
[centos@miarecdb ~]$

Object Moved To Defined Bucket

Automate creation and transfer of database backups using crontab

The pg_dump and S3 transfer actions can be automated by creating a bash script and scheduling the execution of that script using the crontab utility.

Since this is an automated script, the best practice is to create a user with just the permissions needed that can be used to execute the script.

Create a user account backup with specific privileges

Add user backup

sudo useradd backup

Give backup privileges to run pg_dump as user postgres

Modify the sudoers file

sudo visudo

Add the following at the bottom of the file:

backup ALL=(postgres) NOPASSWD:/usr/bin/bash,/usr/bin/pg_dump

Verification

You should be able to assume user backup, verify the aws version and execute pg_dump as user postgres. Any other commands will prompt for a password and fail to execute with Permission denied.

[centos@miarecdb ~]$ sudo -iu backup
...
[backup@miarecdb ~]$ /usr/local/bin/aws --version
aws-cli/2.7.25 Python/3.9.11 Linux/3.10.0-1160.el7.x86_64 exe/x86_64.centos.7 prompt/off
...
[backup@miarecdb ~]$ sudo -Hiu postgres pg_dump --version
pg_dump (PostgreSQL) 12.12

Create Bash Script

Create secret file

This file will only be accessible by the backup user and super users, it will contain credentials generated in the above step.

sudo -u backup vi /home/backup/.backup_secret

Insert the following, and be sure to change the information for your deployment.

FILEPREFIX=<backup_prefix>
BUCKETNAME=<S3_BUCKET_NAME>
AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>

FILEPREFIX, file name prefix that will be used to name all backup files in AWS S3 storage, this should be unique to each instance, suggestion is to include \$HOSTNAME var
BUCKETNAME, name of s3 bucket where database backup will be stored
AWS_ACCESS_KEY_ID, AWS Secret Key ID generated earlier
AWS_SECRET_ACCESS_KEY, AWS Secret Access Key generated earlier

Verify

[centos@miarecdb ~]$ sudo -u backup cat /home/backup/.backup_secret
FILEPREFIX=miarecdb-$HOSTNAME
BUCKETNAME=miarec-db-backup
AWS_ACCESS_KEY_ID=....
AWS_SECRET_ACCESS_KEY=....
[centos@miarecdb ~]$

Create script

sudo vi /usr/local/bin/miarec_backup.sh

Insert the following

#!/bin/bash

# Read Variables from secret file
set -o allexport
source ~/.backup_secret
set +o allexport

BACKUPDIR=/tmp
TMPFILE=miarecdb.backup
DATE=$(date "+%Y.%m.%d-%H.%M.%S")

echoerr() { echo "$@" 1>&2; }

# Generate DB dump
backup_db (){
  echo "Dumping database to $BACKDIR/$TMPFILE"
  sudo -Hiu postgres pg_dump -F c -f $BACKDIR/$TMPFILE miarecdb
  if [ $? -eq 0 ]
  then
    echo "The database dump was successful!"
  else
    echoerr "There was a problem with the database dump, stopping"
    exit 1
  fi
}

relocate_s3 (){
  echo "Moving files to S3"
  /usr/local/bin/aws s3 cp $BACKUPDIR/$TMPFILE s3://$BUCKETNAME/$FILEPREFIX-$DATE.backup
  if [ $? -eq 0 ]
  then
    echo "Backup was successfully transferred to AWS S3 $BACKDIR/$TMPFILE"
  else
    echoerr "There was a problem with the transfer to S3, stopping"
    exit 1
  fi
}

backup_db
relocate_s3

echo "Completed in ${SECONDS}s"

Make the script executable

sudo chmod u+x /usr/local/bin/miarec_backup.sh

Change ownership to backup user

sudo chown backup:backup /usr/local/bin/miarec_backup.sh

Result

[centos@miarecdb ~]$ ls -l /usr/local/bin/
total 652
...
-rwxr--r--. 1 backup backup    902 Aug 24 17:32 miarec_backup.sh

Verify

Manually call script on behalf of user backup

[centos@miarecdb ~]$ sudo -iu backup /usr/local/bin/miarec_backup.sh
Dumping Database
The database dump was successful
Moving files to S3
upload: ../../tmp/miarecdb.backup to s3://miarec-db-backup/miarecdb-miarecdb.example.com-2022.08.24-17.36.40.backup
Backup was transferred to AWS S3
Completed in 2s

Create a `crontab` jobs to execute bash script

sudo crontab -u backup -e

An editor will be started (vi by default. The file being edited will have one job per line. Empty lines are allowed, and comments start their line with a hash symbol (#).)

Insert the following

0 1 * * * /usr/local/bin/miarec_backup.sh

Let's break down those options

0 1 * * * cron expression determines when the job will run, which is 1:00am every day. Help with creating Cron expressions can be found here https://crontab.guru/
/usr/local/bin/miarec_backup.sh location of script

Verification

Display cron jobs

[centos@miarecdb ~]$ sudo crontab -u backup -l
0 1 * * * /usr/local/bin/miarec_backup.sh

An archive will be produced at the specified path every night at 1:00am.

[centos@marecdb ~]$ ls -l /tmp
-rw-rw-r--. 1 postgres postgres 1102336 Aug  2 16:35 miarecdb.backup
[centos@miarecdb ~]$

Navigate to Amazon AWS Console and check the presense of new backup files in the S3 bucket.

Check Backup Files

Audio File Remote Storage

The MiaRec admin portal offers a feature to automatically relocate files to an external storage target, in this case, an AWS S3 bucket. This function moves an audio file from a local file system to an S3 bucket and updates a file path in the database for each call. Note, that function moves the file rather than creating a copy of it. A redundancy for audio files is achieved by the nature of the Amazon S3 service that provides 99.999999999% durability and 99.99% availability. Optionally, an automatic replication in the S3 bucket can be enabled, that copies asynchronous objects across S3 buckets. To enable such a replication, check the Amazon S3 User Guide.

Create a Storage Target

Navigate to Administration > Storage > Storage Target.
Select Add.
Populate the fields that are appropriate for your deployment.
Name Unique Identifier for this storage target.
Type Amazon S3.
S3 Bucket Bucket name defined earlier.
AWS Access Key ID and AWS Secret Access Key Access keys created for IAM user earlier.
Region bucket region defined earlier.
Select Save and Test.
Verify all tests pass.

Verify Tests

Schedule Relocate Audio Files Job

Navigate to Administration > Storage > Relocate Audio Files.
Select Add.
Define the Access Scope, Mode and Destination storage target(defined in the previous step)
- Access Scope In most cases this will be Unrestricted seperate relocation jobs can be scheduled for individual tenants if needed.
- Mode Incremental, system will only target files it has not previously relocated.
- Destination storage target Defined in the previous step, is where files will be moved.
Apply any filter criteria (optional)
Select a Schedule to execute
Select Save and Start

Verification

Calls will be relocated to the external storage target and the file path will be updated in the database.

Verify Relocation Job

See job run results at Administration > Storage > Relocate Audio Files.

Verify Relocation Job

Verify File Path

File Path can be displayed in Full Call details.

Select a recording from the recording tab and select More Detail.
At the bottom of the page, select Full call details.
The Files section should display the path reflecting the external storage.

Verify File Path

Backup a MiaRec Cluster

Prepare AWS Configuration

AWS Config for database backups

S3 bucket for audio files

Database backup using pg_dump utility.

Install AWS CLI

Manual backup MiaRec database using pg_dump

Manually copy database backup to S3

Automate creation and transfer of database backups using crontab

Create a user account backup with specific privileges

Create Bash Script

Create a crontab jobs to execute bash script

Audio File Remote Storage

Create a Storage Target

Schedule Relocate Audio Files Job

Verification

Manual backup MiaRec database using `pg_dump`

Create a `crontab` jobs to execute bash script