Load testing: Recording production transactions and replaying on a test server

Load testing not always the same as stress testing. Stress testing seeks to maximize the load on a server to see when and where it breaks. In theory, you can escalate the load until you isolate those failure points, even to a point where the stress exerted on the server may far exceed anything that would be experienced on production

A production load test seeks to replicate actual, real world production loads and then run on them on a test server. The goal here, typically, is to isolate any regression bugs or issues that may have been introduced in a build. Typically, a staging environment is created where production loads can automatically be replayed. By recording production loads once e.g. peak load, normal load the recorded loads can be stored then replayed, at will, multiple times, for example against each new build of the database, and against multiple target databases, simulating different environments

This makes this process, use case particularly well adapted to continuous integration. We can build a new staging environment continuously, nightly or even on each new commit, and then once created we can start replaying production loads against it and monitor the performance. If we get any alerts or more than a typical number of alerts we can see on production, that may be an indication of a “breaking change” that would need to be cleared before the next build is approved for production.

This process isn’t exclusive of stress testing. Stress tests can also be run against the newly created staging database consecutively or even concurrently, if multiple staging databases have been created.

Production load recording concept and technologies

The initial challenge is to record a production load in a way that allows it to be persisted and replayed, ideally multiple times if needed. That way we can make a few “recordings” e.g. peak load, normal load, off-peak load and then persist them to a location, medium that they can be consumed, on demand to execute production load testing on any staging server at any time.

To record production loads we will use the transaction log, and specifically ApexSQL Log, a 3rd part SQL Server transaction log reader, that can read transactions from online logs, backed up or detached log files, convert the log data into their actual transactions and persist the transactions to file as a T-SQL script that can be opened and re-executed any number of times on any number of targets.

To read the transaction log and convert the transactions into SQL files, we will use ApexSQL Log to open a transaction log backup file, covering the period we want, read the transactions for the period we set (see next) and write to file.

To simulate/approximate the load, we are going to read and pump transactions to a file every minute. This won’t replicate the cadence of our production server perfectly but it will be sufficient for this example. To improve on this, you can reduce the iteration period to as close to 1 second as possible, depending on the throughput of your system. You can also write the transactions to an XML file or even a database, with a timestamp and replay them at the exact same time, on a different day, as they were executed. This would require some PowerShell scripting that is beyond the scope of this article

ApexSQL Log has a key feature that makes this not only possible, but easy: Continuous auditing. This will track the LSN of the last transaction read, so the next time ApexSQL Log “wakes up” in an automated and iterative process/cycle, it will pick up exactly where it left off. This means there will be no gaps of missing transactions or overlaps with duplicate transactions.

Schematic overview

We’ll provide some schematics to illustrate the use case described above. We’ll start on the production side where we will record the production load.

In the image below we’ll illustrate the production side of the task. We start with a production database. We’ll create a batch file that will call ApexSQL Log, on a schedule of 1x every minute, to record/replicate the load on our production server. The process will result in a collection of SQL files, one for every period of the production load that we will record, in our case one minute intervals. We’ll create this set of files once and we’ll be able to run them many times and on many different environments, if needed, to replicate the period of time, in production, that we want to simulate when we replay it

Next, let’s look at the staging side of things where we will replay the production load …

Here, we are envisioning a Continuous integration (CI) sort of process where our staging database is cyclically destroyed and recreated, perhaps every night or even with every new commit to source control.

The image shows the existing staging database destroyed, then re-created, then processed with a single PowerShell script, to load all of our redo SQL files, in the order that they were created, waiting 1 minute in between each file, to match the cadence in which they were written, and executing them against the staging database.

Steps

To review our steps we’ll do the following

  1. Configure ApexSQL Log to read the transaction log for our Source (aka Publisher) database, using the Continuous auditing feature to make sure we set break points and read seamless blocks of transactions with no gaps or duplicates
  2. Create a batch file to automate the previous task
  3. Create a scheduled job to run the batch file every minute for the period we want to record the transaction load e.g. 1 day
  4. Create a PowerShell script so at some later point we can iterate through all of our scripts and run them at the same cadence they were recorded, simulating the production load

Recording a load

Let’s begin by opening ApexSQL Log and connecting to our production SQL Server. We’ll use a transaction log backup that has transactions for the period that we are looking for.

To generate the redo script, run through the following screens

  1. Open ApexSQL Log
  2. Click on the New button

  3. Supply the login credentials to the Source (Publisher in our example) database Next
  4. In the Select data sources, we’ll click Next which will use the Online transaction log by default

  5. For Select output select Undo/Redo, this will allow us to create a redo/replay script

  6. Select Continuous auditing which will remember the last LSN read to ensure we don’t miss transactions or have duplicates. This will create a tracking file that will be automatically saved and appended to the batch file that will be created at the end of this process. Then click Next

  7. In this form we’ll specify Redo as our Output option, to create the replay script (vs an undo/reverse script) and use the Batch script feature to copy the Windows Shell script to the batch file

Automating and scheduling the job

Now that the replication task is created and automated, via the Batch script feature we want to schedule it to start at midnight and run all day at 1 minute intervals. This will produce thousands of individual SQL script files that will contain all of the transactions that were run on our Publisher database. We can do that by creating a SQL Server Agent Job to run every minute and execute our batch file

Reading and replaying a load at a later date

Now that we’ve successfully recorded a load and persisted it to file in one minute increments, that part of the task is complete

What we need now is to create a script that will iterate through all of our files and execute them at the same cadence, one minute in our case, that they were created in.

This job should be able to accept the SQL Server and Database name of our staging database (we’ll assume Integrated security is used so we can forego user name and password). For our example, we’ll just hardcode the SQL Server and database names

It should be able to be initiated with one-click or included as part of a continuous process, called from another routine/script and finally, it should throw some sort of return code if the process experienced any sort of error, or failed.

Let us now create a PowerShell script to loop through all of these files, open them and execute them in turn. Specifically, …

  1. Load the SQL Server module
  2. Get a list of the generated files in the order that they were created
  3. Execute the sql files in the order in which they were created
Import-Module -Name SqlServer -WarningAction SilentlyContinue

$files=Get-ChildItem C:\redo*.sql

Foreach ($file in $files) {

Invoke-SQLcmd -inputfile $file.FullName -serverinstance "Reporting"
-database "WorldWideImporter"

Start-Sleep -s 60

}

Summary

There you have it. We used ApexSQL Log to read the Publisher transaction file, over a period of time, and persist all of the transactions to file in a manner that would allow us later to open and rerun them, at a similar cadence, simulating the production load, on our staging server. And we did it all with less than 10 lines of code

August 13, 2018