8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23ai | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
Oracle Databases and CI/CD Pipelines
This article contains a general discussion about how Oracle Databases can fit into Continuous Integration/Continuous Delivery (CI/CD) pipelines.
This is not an article about how to build the pipeline itself. It focuses on the Oracle database piece of the pipeline, and the features and approaches you might take to live up to the requirements. If you were looking for a how-to guide, you've come to the wrong place.
- Context Is Everything
- Containers (Docker)
- CloneDB
- Pluggable Databases (PDBs)
- Recovery/Flashback
- Other Options
- Recreating Deterministic Test Data
- Feedback/Questions
Context Is Everything
When we talk about CI/CD pipelines, we are often talking about a series of environments that allow an application to be pushed from initial development to production in an automated manner. This automation may include automation of unit testing and integration testing. In some cases these environments only exist for the lifespan of the testing, so we see people take a "build and burn" attitude towards these environments.
There are a number of ways to approach CI/CD, and of course a number of technologies you can use to create pipelines. We don't often think of Oracle databases for short-lived environments, due to the complexity and length of time associated with Oracle installation, database creation and data loads, but that doesn't mean it's not possible.
We also have to consider what constitutes a meaningful test environment. For some tests we might feel like nothing other than an exact duplicate of production is a valid test. For other tests we might be happy with a cut-down environment to test basic functionality. When judging a solution, you have to consider the implications of testing against a non-production like environment.
In this article we'll just be looking at a variety of solutions to achieve some of the goals of creating short-lived environments that give repeatable results. There is no right or wrong way. As always, it depends what you are trying to achieve. We will mostly focus on the speed of setup for a new test and how to guarantee a consistent starting point for all tests, allowing us to have deterministic results from our tests. These are not recommendations for "normal databases".
Containers (Docker)
It's impossible to talk about CI/CD pipelines without mentioning containers, as they are at the heart of most existing pipelines. Oracle databases work insides containers, and it's possible this is the best use case for an Oracle database inside a container.
Containers allow you to build and burn environments really easily and quickly, but the traditional approach to building a Docker image is to have a software only Oracle installation in the read-only image, then the database is created on the first run of a container based on that image. This makes the startup time of the container unacceptably long. So what alternatives do we have?
The first, and possibly simplest option is to use a persistent volume based on a host directory. In the initial setup we create a container based on a normal Oracle database image, but the database files are placed in the persistent volume. Once the database creation is complete, we destroy the container, leaving the database files in the persistent volume. This will be our "testing" database, and we want every test to begin at the same starting point, so we need to protect these database files. This might mean taking a manual copy of them, or using a storage snapshot to keep a copy of them. The typical testing scenario would look something like this.
- Fire up a new database container using the external volume. Since the database files are already present, this will probably take a couple of minutes.
- Do the testing.
- Destroy the database container.
- Overwrite the database files with the original copy to reset the start point, ready for the next test.
The efficiency of this approach depends on how quickly we can replace the original copy of the files on the host server. If we are using storage snapshots, this could be really quick. If we are using a conventional file copy, the speed will depend on how big the seed database is, and how quick our storage is.
An alternative to this is to build the seed database into the read-only image. This will also allow the database to start quickly as the database is pre-created, and of course each time we run a container we will get a fresh copy of the database, as it looked when the image was first created. This gives us the consistent starting point for all testing, allowing our test results to be deterministic, but it does come with some issues.
- Every system we test will need its own image. This can be simplified somewhat by using build arguments to define the database, but if we are testing 10 different systems, we will need 10 different images.
- The initial image will be very big. We typically try to keep images small, but having them contain the database files will make them very large indeed.
- The database files in the image are read-only, so changed blocks need to be written to the copy-on-write layer in the container. That's not a problem for short-lived containers, but the longer the container lives and the more data that is changed, the bigger the copy-on-write layer will grow. Leave it long enough and all the datafiles will be resident in the copy-on-write layer, making the container double the initial size.
- Because of the nature of how the image and copy-on-write location are handled, this is probably only viable for a small database.
The folks who work on utPLSQL use this second method for their test environments.
Depending on your needs, either of these approaches are viable solutions. I guess the biggest question we should ask ourselves is, is testing in a Docker container representative of how things will be running on our production system? If we are testing in Docker, then running production on an Exadata machine, how do we feel about that? Clearly performance testing would be silly, but for functional testing we might find this an acceptable compromise.
Here are some resources related to Oracle in containers.
CloneDB
CloneDB was introduced in the 11.2.0.2 database patchset. Rather than using the traditional RMAN database duplication, CloneDB uses dNFS technology to instantly fire up a clone using an existing backup of a database as the data store. The clone uses copy-on-write technology, so only changed blocks need to be stored locally, while the unchanged data is referenced directly from the backup datafile copies. This drastically increases the speed of cloning a system and means that several separate clones can function against a single set of backup datafile copies, thus saving considerable amounts of space.
This approach would allow a "build and burn" approach to test databases, allowing them to be cloned almost instantly and removed just as fast. Here are some things to consider when using CloneDB.
- CloneDB requires NFS for the copy-on-write location, but the backups can sit on any type of storage.
- The backup files must be image copy backups, not backup sets.
- The backup files must remain in place and unchanged for the lifespan of the instance. Remember, they are referenced for any unchanged blocks.
- A single set of backup datafile copies can be used for many instances.
- Here the infrastructure would be fixed, but the test databases would be transient.
Because every new clone is based on the backup image copies, each new clone will always start from the same point in time. Refreshing a database from production is simply a case of dropping any CloneDB instances, bringing across a fresh set of image copy backups and firing the instances up again.
Once again, you have to ask yourself if running a test environment using CloneDB and NFS is representative of your production system.
Here are some resources related to CloneDB.
Pluggable Databases (PDBs)
The multitenant architecture gives a range of options for producing new test pluggable databases.
The creation of an empty pluggable database from the seed is quite quick, so if your requirement is to start with an empty database, this could be an option for the deployment of a new database in your pipeline.
More likely, you will want a test database with some data in place to begin with. If the database were sufficiently small, you could create your starter pluggable database, leave it in read-only mode and use this as the source to clone fresh copies of your database for testing.
For larger pluggable databases you have the option of using Snapshot Copy PDBs. Snapshot copies take advantage of the storage system to create sparse PDBs, where the snapshot is used as read-only source data and changed blocks are written to a copy-on-write location. This is similar to the CloneDB option discussed above. In fact, you can optionally use dNFS for snapshot copy PDBs, so you are using CloneDB under the hood. Snapshot copy PDBs are described here.
There is also an option to use Split Mirror Clone PDBs. In this case ASM splits a mirror, effectively making a point-in-time media copy from a parent PDB. From that point forward, changes to the parent PDB do not get reflected in the clone PDB. Split Mirror Clone PDBs are described here.
Here are some resources related to the multitentant architecture.
Recovery/Flashback
An alternative to constantly dropping and recreating a database is to use some form of point in time recovery between tests, so each set of tests get the same clean start. The typical testing scenario would look something like this.
- Connect to the exiting test database.
- Do the testing.
- Do a point-in-time-recovery (PITR) or a flashback database to the time before the testing, ready for the next test. These could be for the whole CDB, or at the PDB level depending on the requirement.
Here are some resources related to database recovery and flashback database.
- Multitenant : Backup and Recovery of a Container Database (CDB) and a Pluggable Database (PDB) in Oracle Database 12c Release 1 (12.1) Onward
- Flashback Database in Oracle Database 10g Onward
- Multitenant : Flashback Pluggable Database (PDB) in Oracle Database 12c Release 2 (12.2) Onward
Other Options
There are a number of other options that could be used to create temporary databases, but they may be too slow to give us the performance we need for creating new environments.
- Cloning an existing database Virtual Machines (VM).
- Create new test databases using conventional RMAN clones.
- There are a number of third party products for data virtualization that may be useful for quickly creating test instances.
Recreating Deterministic Test Data
An important point of automated testing is being able to predict the result of a test. This presents a problem where databases are concerned, because of two conflicting requirements.
- We want our test environment to be representative of our production environment, which is itself changing constantly.
- We want our test system to be static, so our tests are deterministic. We want to know the expected outcome of each test.
Refreshing a database from production is relatively easy, regardless of the infrastructure you are using, but what if that affects your testing?
It probably makes sense to automate the creation of specific test data for the scenarios you are testing. In simple systems, this could be as easy as running scripts that contain DML to set up the test data. In more complex systems, especially those involving third party apps, you may be forced to use the applications themselves to create data for your test scenarios. In these situations you may be able to take advantage of tools like UiPath and Leapwork to automate the manual data entry operations.
Feedback/Questions
I am not an expert at CI/CD, so when I wrote this article I passed it out to some people for feedback, to make sure I had not missed the mark. Here are some of the points raised and my responses.
- Q: What’s the level of technical skill required to set this up?
- A: Quite a lot. Some of the Oracle features discussed here will not be very familiar to many DBAs, and possibly out of the reach of many developers unless they want to invest a lot of time. Containers have become a staple of software development, but are still new to many DBAs, so there is a learning curve here. With the exception of using something like Flashback Database, which should be understood by all DBAs, many of the solutions could be taxing.
- Q: What resources are we talking if we wanted to do this setup in the cloud?
- A: Most of the database cloud services focus on replacing traditional relatively static databases. Yes, you can create and drop databases at will, but they don't focus on speed of provisioning the service, and they give you little in the way of managing the "seed" data. My current attitude would be this would require Infrastructure as a Service (IaaS), with the DBAs and developers taking on the role of setting this up and managing it. I don't think the current batch of container cloud services bring anything to the mix to help you here, but I am not an expert so I could be wrong.
- Q: What can’t I do if I’m using something like autonomous databases?
- A: I think the best you can hope for is the recovery approach. This is not being critical of this service, or any other cloud service. They just weren't designed for this purpose.
- Q: How do I manage my test data?
- A: Yes. That's really hard. Keeping the testing environment up to date with production, whilst still retaining reasonable and deterministic test cases can be a challenge. I touched on this briefly above, and that is about all I'm qualified to say.
- Q: Do we need this for unit testing or only larger scale system/integration testing?
- A: It depends. I work on one project where all database calls are replaced by hard-coded stubs for unit tests, so no actual database calls are made. It's a small project and this makes unit testing easier, and allows it to scale really well. Many developers could unit test at the same time without using up masses of resources on lots of databases. Most importantly, all the tests are repeatable. Clearly this would be meaningless for integration testing. This is why a number of approaches have been presented. You clearly have to approach the problem differently when comparing the unit testing of a small utility and the integration testing of a large application that works on lots of data.
- Q: Could this be used to run tests in parallel by having multiple environments spun up at once?
- A: Some of the options, like the self-contained docker image, lend themselves to having multiple environments simultaneously. Others are are more like the traditional Dev/Test/UAT/Live approach. A lot depends on the requirements from the database, and importantly the application layer. If you are testing a monolithic application layer, it is unlikely firing up a 1000 databases in Docker containers will be useful to you.
- Q: How do you manage database patching and specifically upgrades, what impact does that have on your test setup?
- A: For all the options listed, database upgrades and patches would require a recreation of the seed database. Either a new copy of production that was upgraded, or a manual upgrade of the existing seed database. This would then form the new seed used for subsequent testing until the next patch/upgrade. Management of this would be interesting.
- Q: Is approaching database testing and deployment like it’s any other type of software a mistake?
- A: Clearly there are a lot of problems that are very data-specific. It is very different to running up and app server container and dropping in a new version of an application. Some people would argue that another engine might be easier to deal with, but most of this is really a matter of scale. I can do a convincing demo of a two table application in any engine, but what matters is how *you* need to use it. Most CI/CD pipelines are presented and discussed from the application layer perspective, and they tend to skirt around the data side of things. I believe this is why many developers like No/SQL databases. It allows them to delay the hard decisions.
- Q: At what point in your deployment does this add the most value?
- A: For large databases I think you will go for a more traditional Dev/Test/UAT/Live approach, and the pipeline will be more focused on automating and improving the efficiency of that. In this case, we are focusing more on the deployment mechanism itself and the integration testing later on in the testing cycle.
- Q: CI/CD would work for small self-contained applications or loosely coupled applications, but how would it work for larger monolithic applications, especially third party applications?
- A: I think every application would have to be dealt with in a case-by-case basis. If you are in total control of your environment CI/CD becomes a lot easier, as you can pick the technology stacks that work for you, and design your applications with this in mind. If you are working with legacy and/or third party applications you will be focusing on improving the flow through a more traditional environment setup. One size does not fit all.
- Q: What do you do with legacy systems?
- A: See answer to the previous question.
- Q: What scale does the option best apply too, 1 deploy a month or thousands a day?
- A: The "thousand deployments a day" approach is focused on the development of loosely coupled applications. If you could create a subset of a database that allows you to test that piece of the application and house it in a Docker image, this would work for that type of scenario. Having said that, I think that's a really small proportion of the customer base for your typical Oracle development. I suspect the requirements would be much more limited for many people. As mentioned before, many people will focus on improving the flow in traditional environment setups.
- Q: Is this the dirty secret of development that no one wants to talk about?
- A: It really does feel that way. Over the last few years I've questioned a lot of people about this and I'm yet to hear a satisfactory answer beyond, "Yeah. Data is difficult."
- Q: In "traditional" software development, i.e. separate from the database, you are actively trying to reduce state and inter-dependency. For databases they represent the antithesis of this. They are all state and inter-dependency. Comment?
- A: Yes. When I discuss Oracle in containers I say the database is the ultimate in "state", which you are trying to cram into a container that is focused on being stateless. It's no surprise these are odd bed-fellows. Even so, this doesn't mean you can't take advantage of containers, and use that knowledge to alter how you approach more traditional systems. In many cases, providing APIs as web services can allow these types of environments to co-exist. Neither is right or wrong. Neither is the best or worst. They often serve different purposes.
- Q: A person could do pretty well for themselves as a consultant just in this domain, right?
- A: A person could drive themselves insane trying!
For more information see:
- Docker/Container Articles
- GitHub : My Docker Builds
- Direct NFS (DNFS) CloneDB in Oracle Database 11g Release 2 (Patchset 11.2.0.2 onward)
- Multitenant Articles
- Multitenant : Backup and Recovery of a Container Database (CDB) and a Pluggable Database (PDB) in Oracle Database 12c Release 1 (12.1) Onward
- Flashback Database in Oracle Database 10g Onward
- Multitenant : Flashback Pluggable Database (PDB) in Oracle Database 12c Release 2 (12.2) Onward
- SQLcl : Automating Your SQL and PL/SQL Deployments using the SQLcl implementation of Liquibase
Hope this helps. Regards Tim...