Backup a MiaRec Cluster
The following section describes the process of backing up a MiaRec database and audio files.
A few considerations about backing up data:
-
Backups should not be stored on the instances you are backing up. The Database backup and Audio File backup should be stored off-site. This guide will describe storing them securely in Amazon S3.
-
Backups are only valid at the time they are taken, for that reason, a backup process should be run periodically to maintain up-to-date records. At a minimum, it is recommended to back up daily.
Prepare AWS Configuration
To support the offload of MiaRec backups, two S3 buckets must be provisioned. One bucket will be used as a destination for database backups, another bucket will be used as remote storage for audio files.
Why can't audio files and database backups share the same bucket? Database backups should be stored using the WORM (write-once-read-many) model to prevent corruption or tampering, whereas audio files will need to be periodically modified or removed depending on the retention policies. To support WORM storage, S3 Object Lock has to be enabled, this is defined at the bucket level, requiring separate buckets.
More information about S3 Object Lock can be found at the following link. https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html
AWS Config for database backups
Create a bucket
An S3 bucket with the Object lock must be created as a destination target for database backups.
From Amazon S3 console at https://console.aws.amazon.com/s3/.
-
Choose Create bucket from the console top menu to create a new Amazon S3 bucket.
-
On the Create bucket setup page, perform the following actions:
-
For General configuration:
-
Provide a unique name for the new bucket in the Bucket name box.
-
From the AWS Region dropdown list, select the AWS cloud region where the new S3 bucket will be created.
-
-
-
For Block Public Access settings for this bucket, select Block all public access to ensure that all public access to this bucket and its objects is blocked.
-
(Optional) Tags, use the Add tag button to create and apply user-defined tags to the S3 bucket. You can track storage costs and other criteria by tagging your bucket.
-
(Recommended) For Default encryption, select Enable under Server-side encryption, and choose Amazon S3-managed keys (SSE-S3).
-
For Advanced settings, choose Enable under Object Lock to enable the Object Lock feature. Choose I acknowledge that enabling Object Lock will permanently allow objects in this bucket to be locked for confirmation.
-
Choose Create bucket to create your new Amazon S3 bucket.
Result
S3 bucket will be created and displayed in the console.
Set Object Lock Retention policy
Object Lock Retention policy needs to be configured to ensure items can be eventually removed and guarantees data integrity for a defined period.
From Amazon S3 console at https://console.aws.amazon.com/s3/.
-
Click on the name of the newly created Amazon S3 bucket.
-
Select the Properties tab from the console menu to access the bucket properties.
-
In the Object Lock section, choose Edit to configure the feature settings available for the S3 objects that are uploaded without Object Lock configuration.
-
Within the Object Lock configuration section.
-
Choose Enable under Default retention.
-
Select Compliance so that a protected object version cannot be overwritten or deleted by any user, including the root account user. Once an S3 object is locked in Compliance mode, its retention mode cannot be reconfigured and its retention period cannot be shortened. This retention mode ensures that an object version can't be overwritten or deleted for the duration of the retention period
-
Define the Default retention period.
-
Click Save changes to apply the configuration changes.
-
Result
Object Lock Configuration will be modified.
Note, that the Object Log Retention configuration doesn't delete files after the specified retention period. It just prevents the deletion of the files during the specified retention period. To delete the old backup files, you must use Amazon S3 lifecycle policies.
Delete files in S3 bucket when Object lock is present
It is possible to "delete" an object that is currently in Object lock in the same manner you normally delete a file.
Select the object, and click Delete, then confirm the action.
This will display a successful delete, however, this is misleading, by toggling the Show Version, you can see what happened is that a Delete marker was applied, and the previous version is still available for download.
However, this is temporary, after the object lock retention period expires the object will be deleted.
IAM policy for access to the database backup bucket
An IAM Policy has to be created and assigned to an IAM user so that objects can be added to the S3 bucket by that IAM user.
From Amazon IAM console at https://console.aws.amazon.com/iam/
-
From the Policies menu. Choose Create Policy to create a new IAM Policy.
-
Select JSON tab, copy the following access policy and paste it into the JSON field. Do not forget to replace miarec-db-backup with your bucket name!!!.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::miarec-db-backup\"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::miarec-db-backup/*"
]
}
]
}
(Optional) Tags, use the Add tag button to create and apply user-defined tags to the resource.
Review policy, choose a descriptive name and description for the policy and click Create policy button.
Result
The policy will be created and ready to be assigned.
IAM User for database backup bucket
IAM user has to be created that can be used later to push database backups to the S3 bucket.
From Amazon IAM console at https://console.aws.amazon.com/iam/
-
From the Users menu, choose Add User to create a new IAM User.
-
Details, choose User name and enable Programmatic access.
-
Permissions, select Attach existing policies directly and then select the previously created policy from the list. Use the search box to find the policy by name.
-
(Optional) Tags, use the Add tag button to create and apply user-defined tags to the resource.
-
Review, confirm the settings and click Create user.
-
On the Complete screen, copy Access Key ID and Secret access key and store them in a secure place. This will be used later to push database backup to S3.
Result
The user will be added, access key and the secret access key will be available to use to access the S3 bucket.
S3 bucket for audio files
Create a bucket
An S3 bucket must be created as a storage target for the audio files.
From Amazon S3 console at https://console.aws.amazon.com/s3/.
-
Choose Create bucket from the console top menu to create a new Amazon S3 bucket.
-
On the Create bucket setup page, perform the following actions:
-
For General configuration:
-
Provide a unique name for the new bucket in the Bucket name box.
-
From the AWS Region dropdown list, select the AWS cloud region where the new S3 bucket will be created
-
-
For Block Public Access settings for bucket, select Block all public access to ensure that all public access to this bucket and its objects is blocked.
-
(Optional) Tags, use the Add tag button to create and apply user-defined tags to the S3 bucket. You can track storage costs and other criteria by tagging your bucket.
-
For Default encryption, select Enable under Server-side encryption, and choose one of the encryption key types available. If you don't know what to choose, then choose Amazon S3-managed keys (SSE-S3).
-
-
Choose Create bucket to create your new Amazon S3 bucket.
Result
S3 bucket will be created and displayed in the console.
IAM policy for access to the audio bucket
An IAM Policy has to be created and assigned to an IAM user so that objects can be added to the S3 bucket by that IAM user
From Amazon IAM console at https://console.aws.amazon.com/iam/.
-
From the Policies menu. Choose Create Policy to create a new IAM Policy.
-
Select JSON tab, copy the following access policy and paste it into the JSON field. Do not forget to replace miarec-audio-storage with your bucket name!!!.
{
"Version\": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::miarec-audio-storage"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::miarec-audio-storage/*"
]
}
]
}
(Optional) Tags, use the Add tag button to create and apply user-defined tags to the resource. You can track cost and other criteria by tagging your resource.
Review policy, choose a descriptive name and description for the policy and click the Create policy button.
Result
The policy will be created and ready to be assigned
IAM User for audio bucket
IAM user has to be created that can be used to relocate audio files from Miarec to S3 storage. We recommend using a separate user account rather than granting the same user access to both database backup and audio file buckets.
From Amazon IAM console at https://console.aws.amazon.com/iam/.
-
From the Users menu, choose Add User to create a new IAM User.
-
Details, choose User name and enable Programmatic access.
-
Permissions, select Attach existing policies directly and then select the previously created policy from the list. Use the search box to find the policy by name.
-
(Optional) Tags, use the Add tag button to create and apply user-defined tags to the resource.
-
Review, confirm the settings and click Create user.
-
On the Complete screen, copy Access Key ID and Secret access key and store them in a secure place. This will be used later to configure a Storage target in the MiaRec application.
Result
User will be added, access key and the secret access key will be available to use to access the S3 bucket.
Database backup using pg_dump utility.
The following commands need to be executed from the Database instance.
In an all-in-one configuration, the PostgreSQL database service coexists with
other MiaRec components on the same server instance, however, in larger
deployments (i.e. decoupled architecture) the database service is run on
a dedicated server, in which case it should be listed in the [db\
group in your ansible inventory.
How do I know I am on the right server? You can check for the postgresql
service:
[centos@miarecdb ~]$ systemctl | grep postgresql
postgresql-12.service loaded active running PostgreSQL 12 database server
[centos@miarecdb ~]$
Install AWS CLI
awscli
package is required to transfer database backups to an S3 bucket.
Install unzip
sudo yum install -y unzip
Install aws-cli
For the latest version of the AWS CLI, use the following command block:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
Verification
[centos@miarecdb ~]$ aws --version
aws-cli/2.7.25 Python/3.9.11 Linux/3.10.0-1160.el7.x86_64 exe/x86_64.centos.7 prompt/off
Manual backup MiaRec database using pg_dump
A backup of the MiaRec database can be executed on demand using the pg_dump
utility. This is a good idea to execute on initial deployment, to verify
operation. The pg_dump
utility is installed as part of the postgresql
package, so it does not need to be installed.
Execute the pg_dump
utility
sudo -iu postgres pg_dump -F c -f /tmp/miarecdb-$(date "+%Y.%m.%d-%H.%M.%S").backup miarecdb
Let's break down those options.
-
sudo -iu postgres
instructs the shell to execute the trailing command on behalf of userpostgres
, this is required for authentication to the database. -
pg_dump
calls thepg_dump
utility. -
-F c
Sets an output file format, in this case, a custom archive suitable for input intopg_restore
. This is the most flexible format in that it allows reordering of loading data as well as object definitions. This format is also compressed by default. -
-f /path/to/outputfile
Send output to the specified path, this directory will need to be accessed by thepostgres
user, a directory like/tmp
would be a suitable destination. -
$(date "+%Y.%m.%d-%H.%M.%S")
will be replaced with the current timestamp, like 2022.08.02-12.13.14. -
miarecdb
the target database for dump, this should always bemiarecdb
.
Verification
An archive will be produced at the specified path
[centos@miarecdb ~]$ ls -l /tmp
-rw-rw-r--. 1 postgres postgres 1102336 Aug 2 16:35 miarecdb-2022.08.02-12.13.14.backup
[centos@miarecdb ~]$
Manually copy database backup to S3
Set AWS credentials
Use the previously created user credentials for the the dabase backup bucket. Using environmental variables, these values will be set for this session, but not be retained after the session is ended.
export AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
export AWS_SECRET_ACCESS_KEY=<AWS_AC_KEY_ID>
Execute Copy to S3
aws s3 cp /path/to/source s3://{s3-bucket-name}
Let's break down those options.
-
aws s3 cp
calls the aws cli utility and instructs it to copy a file to s3. -
/path/to/source
path and file name of source database dump file. -
s3://{s3-bucket-name}
Sets a destination of s3 transfer, this should be the bucket created in an earlier step.
Example Copies a file from the /tmp directory to S3 bucket miarec-db-backup
aws s3 cp /tmp/miarecdb-2022.08.17-21.05.24.backup s3://miarec-db-backup
Result
The object will be moved to the defined bucket
[centos@miarecdb ~]$ aws s3 cp /tmp/miarecdb-2022.08.25-19.47.14.backup s3://miarec-db-backup
upload: ../../tmp/miarecdb-2022.08.25-19.47.14.backup to s3://miarec-db-backup/miarecdb-2022.08.25-19.47.14.backup
[centos@miarecdb ~]$
Automate creation and transfer of database backups using crontab
The pg_dump
and S3 transfer actions can be automated by creating a bash
script and scheduling the execution of that script using the crontab
utility.
Since this is an automated script, the best practice is to create a user with just the permissions needed that can be used to execute the script.
Create a user account backup with specific privileges
Add user backup
sudo useradd backup
Give backup privileges to run pg_dump
as user postgres
Modify the sudoers
file
sudo visudo
backup ALL=(postgres) NOPASSWD:/usr/bin/bash,/usr/bin/pg_dump
Verification
You should be able to assume user backup, verify the aws
version and
execute pg_dump
as user postgres
. Any other commands will prompt for a
password and fail to execute with Permission denied
.
[centos@miarecdb ~]$ sudo -iu backup
...
[backup@miarecdb ~]$ /usr/local/bin/aws --version
aws-cli/2.7.25 Python/3.9.11 Linux/3.10.0-1160.el7.x86_64 exe/x86_64.centos.7 prompt/off
...
[backup@miarecdb ~]$ sudo -Hiu postgres pg_dump --version
pg_dump (PostgreSQL) 12.12
Create Bash Script
Create secret file
This file will only be accessible by the backup user and super users, it will contain credentials generated in the above step.
sudo -u backup vi /home/backup/.backup_secret
Insert the following, and be sure to change the information for your deployment.
FILEPREFIX=<backup_prefix>
BUCKETNAME=<S3_BUCKET_NAME>
AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>
-
FILEPREFIX
, file name prefix that will be used to name all backup files in AWS S3 storage, this should be unique to each instance, suggestion is to include\$HOSTNAME
var -
BUCKETNAME
, name of s3 bucket where database backup will be stored -
AWS_ACCESS_KEY_ID
, AWS Secret Key ID generated earlier -
AWS_SECRET_ACCESS_KEY
, AWS Secret Access Key generated earlier
Verify
[centos@miarecdb ~]$ sudo -u backup cat /home/backup/.backup_secret
FILEPREFIX=miarecdb-$HOSTNAME
BUCKETNAME=miarec-db-backup
AWS_ACCESS_KEY_ID=....
AWS_SECRET_ACCESS_KEY=....
[centos@miarecdb ~]$
Create script
sudo vi /usr/local/bin/miarec_backup.sh
Insert the following
#!/bin/bash
# Read Variables from secret file
set -o allexport
source ~/.backup_secret
set +o allexport
BACKUPDIR=/tmp
TMPFILE=miarecdb.backup
DATE=$(date "+%Y.%m.%d-%H.%M.%S")
echoerr() { echo "$@" 1>&2; }
# Generate DB dump
backup_db (){
echo "Dumping database to $BACKDIR/$TMPFILE"
sudo -Hiu postgres pg_dump -F c -f $BACKDIR/$TMPFILE miarecdb
if [ $? -eq 0 ]
then
echo "The database dump was successful!"
else
echoerr "There was a problem with the database dump, stopping"
exit 1
fi
}
relocate_s3 (){
echo "Moving files to S3"
/usr/local/bin/aws s3 cp $BACKUPDIR/$TMPFILE s3://$BUCKETNAME/$FILEPREFIX-$DATE.backup
if [ $? -eq 0 ]
then
echo "Backup was successfully transferred to AWS S3 $BACKDIR/$TMPFILE"
else
echoerr "There was a problem with the transfer to S3, stopping"
exit 1
fi
}
backup_db
relocate_s3
echo "Completed in ${SECONDS}s"
Make the script executable
sudo chmod u+x /usr/local/bin/miarec_backup.sh
Change ownership to backup user
sudo chown backup:backup /usr/local/bin/miarec_backup.sh
Result
[centos@miarecdb ~]$ ls -l /usr/local/bin/
total 652
...
-rwxr--r--. 1 backup backup 902 Aug 24 17:32 miarec_backup.sh
Verify
Manually call script on behalf of user backup
[centos@miarecdb ~]$ sudo -iu backup /usr/local/bin/miarec_backup.sh
Dumping Database
The database dump was successful
Moving files to S3
upload: ../../tmp/miarecdb.backup to s3://miarec-db-backup/miarecdb-miarecdb.example.com-2022.08.24-17.36.40.backup
Backup was transferred to AWS S3
Completed in 2s
Create a crontab
jobs to execute bash script
sudo crontab -u backup -e
An editor will be started (vi by default. The file being edited will have one job per line. Empty lines are allowed, and comments start their line with a hash symbol (#).)
Insert the following
0 1 * * * /usr/local/bin/miarec_backup.sh
Let's break down those options
-
0 1 * * *
cron expression determines when the job will run, which is 1:00am every day. Help with creating Cron expressions can be found here https://crontab.guru/ -
/usr/local/bin/miarec_backup.sh
location of script
Verification
Display cron
jobs
[centos@miarecdb ~]$ sudo crontab -u backup -l
0 1 * * * /usr/local/bin/miarec_backup.sh
An archive will be produced at the specified path every night at 1:00am.
[centos@marecdb ~]$ ls -l /tmp
-rw-rw-r--. 1 postgres postgres 1102336 Aug 2 16:35 miarecdb.backup
[centos@miarecdb ~]$
Navigate to Amazon AWS Console and check the presense of new backup files in the S3 bucket.
Audio File Remote Storage
The MiaRec admin portal offers a feature to automatically relocate files to an external storage target, in this case, an AWS S3 bucket. This function moves an audio file from a local file system to an S3 bucket and updates a file path in the database for each call. Note, that function moves the file rather than creating a copy of it. A redundancy for audio files is achieved by the nature of the Amazon S3 service that provides 99.999999999% durability and 99.99% availability. Optionally, an automatic replication in the S3 bucket can be enabled, that copies asynchronous objects across S3 buckets. To enable such a replication, check the Amazon S3 User Guide.
Create a Storage Target
-
Navigate to Administration > Storage > Storage Target.
-
Select Add.
-
Populate the fields that are appropriate for your deployment.
-
Name
Unique Identifier for this storage target. Type
Amazon S3.S3 Bucket
Bucket name defined earlier.AWS Access Key ID
andAWS Secret Access
Key Access keys created for IAM user earlier.-
Region
bucket region defined earlier. -
Select Save and Test.
-
Verify all tests pass.
Schedule Relocate Audio Files Job
-
Navigate to Administration > Storage > Relocate Audio Files.
-
Select Add.
-
Define the Access Scope, Mode and Destination storage target(defined in the previous step)
-
Access Scope
In most cases this will be Unrestricted seperate relocation jobs can be scheduled for individual tenants if needed. -
Mode
Incremental, system will only target files it has not previously relocated. -
Destination storage target
Defined in the previous step, is where files will be moved.
-
-
Apply any filter criteria (optional)
-
Select a Schedule to execute
-
Select Save and Start
Verification
Calls will be relocated to the external storage target and the file path will be updated in the database.
Verify Relocation Job
See job run results at Administration > Storage > Relocate Audio Files.
Verify File Path
File Path can be displayed in Full Call details.
-
Select a recording from the recording tab and select More Detail.
-
At the bottom of the page, select Full call details.
-
The Files section should display the path reflecting the external storage.