Solving Carbonate Chemistry Problems in R: An Introduction using CO2SYS

Objectives

Review equations for aqueous carbonate chemistry, emphasizing points of interest for Earth scientists
Give an overview of graphical representations of the carbonate system
Introduce packages that can solve carbonate chemistry using R
Learn to make (and evaluate) several plots that commonly appear in the literature

Before we start

Make sure you have the following packages installed in R:

library(seacarb)
library(ggplot2)
library(plotly)

A review of carbonate equilibria

“From two, six”

At minimum, the aqueous carbonate system has six variables governed by four possible equations, leaving two degrees of freedom. You may have heard the expression “from two, six”; if any two parameters are known, the system of equations can be re-arranged to solve the remaining four (Zeebe and Wolf-Gladrow, 2001). For most systems, inorganic carbon will be distributed among CO$_{2(aq)}$, HCO$^-_{3}$, and CO$^{2-}_{3}$:

\[ CO_{2(aq)} + H_2O \rightleftharpoons HCO^-_{3} + H^+\] \[HCO^-_{3} \rightleftharpoons CO^{2-}_{3} +H^+ \] We can see that reactions among inorganic carbon species involve proton equivalents. That brings us to four variables: the three species and pH. The last two quantities are alkalinity and dissolved inorganic carbon (DIC). DIC, sometimes called total carbon, is the sum of carbon across all species:

\[ DIC = CO_{2(aq)} + HCO_3^- + CO_3^{2-}\]

The definition of alkalinity is a bit harder to nail down but it is related to charge balance within the system (See section 1.2 in Zeebe and Wolf-Gladrow, 2001 for a full discussion). A typical definition is:

\[ Alkalinity = HCO_3^- + 2CO_3^{2-} + B(OH)_4^{-} + OH^- - H^+ + ...\] where the ellipses denote minor components. In systems like alkaline lakes and seawater, the carbonate system is by far the most important contributor to total alkalinity (but still, don’t assume it’s the only contributor!). This brings us to six variables with four equations.

Connections with the atmosphere and carbonate minerals

Two more relationships commonly appear in the Earth Sciences. The first is the exchange of CO$_2$ between gaseous (g) and dissolved (aq) phases across an air-water interface:

\[ CO_{2(g)} \rightleftharpoons CO_{2(aq)}\]

Another important concept is the saturation state for carbonate minerals such as calcite and aragonite. In seawater, the aragonite saturation state can be written as:

\[ \Omega_{arag} = \frac{[Ca^{2+}]\times[CO_3^{2-}]}{{K_{sp}}^*}\]

where brackets denote concentration and K$_{sp}$ is the stochiometric solubility product.

Although these two parameters don’t reduce the degrees of freedom, they relate aqueous chemistry to the atmosphere and solid Earth. CO$_2(g)$ is an important greenhouse gas, so we often use it instead of CO$_{2(aq)}$ because it better depicts the relationship between an aqueous system and Earth’s climate. The saturation state determines whether a mineral will dissolve ($\Omega$ <1) or precipitate ($\Omega$ >1) within fluid of a given composition. Carbonate saturation states also influence the rate of precipitation/dissolution reactions and may indicate how easy it is for an organism to build a hard skeleton. These concepts are very useful in trying to link sedimentological, diagenetic, and paleontological records to past environments.

Visualizing the carbonate system

If the equations in the preceding sections left you feeling a bit cold, you are in good company. Most scientists rely on graphical techniques to study the carbonate system rather than just staring at the raw equations. Among the most common depictions are a family of contour plots known collectively as fan diagrams or Deffeyes diagrams. In a fan diagram, the horizontal and vertical axes are independent variables representing two degrees of freedom within the carbonate system. The z axis has a third variable which is rendered in 2D using shading or contour lines. Take a look at the diagram below which shows pH (z) as a function of DIC (x) and total alkalinity (y):

A well-designed fan diagram can quickly tell us how a particular reaction or process will affect carbonate chemistry. For example, you have probably heard that CO$_2$ released by burning fossil fuels will make its way into the ocean, leading to ocean acidification. Can we explain this process using the plot above? Take another look at the definitions of DIC and alkalinity; DIC includes CO$_2$ whereas alkalinity does not. In the absence of other factors, adding pure CO$_2$ to water will push its composition along a path parallel to the red arrow, towards lower pH contours. Although we might have been able to deduce the answer directly from equations for carbonate equilibria, using a graphical approach is a handy way to synthesize information–especially as we consider more complicated processes.

CO2SYS and the seacarb library

There are several programs that can solve aqueous carbonate chemistry. The first program we will explore, CO2SYS (Lewis et al., 1998), has some key advantages for us. First, it can be called in R using the seacarb package (Lavigne and Gattuso, 2010). This will allow us to run many calculations very quickly, and to combine it with other tools in R such as graphical packages. Second, CO2SYS makes it easy to select key parameters like dissociation constants from a list of published studies. The drawback is that CO2SYS is really meant for modern seawater-like solutions. It is probably inappropriate for high salinity brines, freshwater lakes, or ancient oceans with radically different chemistry. We will look at other programs to tackle these problems in the future.

The carb() function

Once we have the seacarb package installed, we can solve some simple problems using the carb() function. We simply provide two parameters from the carbonate system and the carb() function solves the remaining four. We will also need to provide some additional information like temperature, salinity, and pressure as these affect dissociation constants. In the code below, we provide DIC and alkalinity and then query the pH that corresponds with those conditions:

library(seacarb)

# Set DIC and total alkalinity.

  alkalinity = 2.35e-3       # Total alkalinity [moles]
  DIC        = 2.05e-3       # DIC [moles]  
  
# Solve the carbonate system for the given conditions
  
  output  <- carb(flag = 15, alkalinity, DIC, S=35, T=25, P = 0, k1k2="r", pHscale="T")
  
# Print the calculated pH value

  output$pH

## [1] 8.010483

Let’s break down the inputs in the function call. The first input, the flag, tells seacarb what combination of variables to expect. The flag can take on the following values:

flag = 1 pH and CO2 given
flag = 2 CO2 and HCO3 given
flag = 3 CO2 and CO3 given
flag = 4 CO2 and ALK given
flag = 5 CO2 and DIC given
flag = 6 pH and HCO3 given
flag = 7 pH and CO3 given
flag = 8 pH and ALK given
flag = 9 pH and DIC given
flag = 10 HCO3 and CO3 given
flag = 11 HCO3 and ALK given
flag = 12 HCO3 and DIC given
flag = 13 CO3 and ALK given
flag = 14 CO3 and DIC given
flag = 15 ALK and DIC given
flag = 21 pCO2 and pH given
flag = 22 pCO2 and HCO3 given
flag = 23 pCO2 and CO3 given
flag = 24 pCO2 and ALK given
flag = 25 pCO2 and DIC given

The second and third inputs are the two independent parameters for the carbonate system, in this case total alkalinity and DIC. Note that the function call is sensitive to the order in which we type the independent variables. If you are having problems, check the order in which the variables appear in the list above.

The remaining inputs in the function call specify other parameters such as the salinity (S, in ppt), temperature (T, in degrees C), the pH scale (pHscale), the hydrostatic pressure P (in bars, defined as P =0 at the water surface). In addition, the exact relationship between variables (e.g., k1 and k2 as a function of temperature) varies between published sources. The carb() function lets us pick a relationship from a pre-defined list of published sources; in the example above, k1k2 = “r” tells seacarb to use dissociation constants from Roy et al., (1993). The upside of this system is that it makes it easy to keep track of the half-dozen publications that have go into making a single calculation. However, the downside is that it’s not easy to use a relationship that isn’t already on the list. (Note: we have not specified the concentrations of Boron or phosphate, although it is possible to do so.)

The full list of publications, options, defaults, etc. in seacarb can be found in the official documentation:

https://cran.r-project.org/web/packages/seacarb/seacarb.pdf

The carb() function is covered on pages 24-26.

Using vectors with seacarb()

In the previous section we provided seacarb with a single pair of DIC and alkalinity conditions. However, the real strength of seacarb (and R-based approaches in general) is that we can handle large datasets that would be tedious by other methods. Let’s modify our code to include three pairs of DIC and alkalinity “measurements” stored in vectors:

library(seacarb)

# Set DIC and total alkalinity.

  alkalinity <- c(2.35e-3, 2.35e-3, 2.35e-3)        # Vector of total alkalinity values [moles]
  DIC        <- c(2.05e-3, 2.15e-3, 2.25e-3)        # Vector of DIC values [moles]  

# Solve the carbonate system for the given conditions

  output  <- carb(flag = 15, alkalinity, DIC, S=35, T=25, P = 0, k1k2="r", pHscale="T")

# Print the calculated pH value

  output$pH

## [1] 8.010483 7.825954 7.594557

We can see that carb() easily handles vector inputs! The name of the game now is to format our data– and our scientific questions–to take advantage of this.

Another common format you will see in R is that of a data frame. A data frame is very similar to an Excel spreadsheet with rows, columns, and headers. In fact, if you import an excel document into R it will automatically be interpreted as a data frame. R has some very simple syntax that allows us to pull columns or rows from a data frame and turn them into vectors. Try the script below:

library(seacarb)

# Set DIC and total alkalinity.

  alkalinity <- c(2.35e-3, 2.35e-3, 2.35e-3)        # Vector of total alkalinity values [moles]
  DIC        <- c(2.05e-3, 2.15e-3, 2.25e-3)        # Vector of DIC values [moles]  

# A data frame with two columns: DIC and alkalinity
  
  inputs  <- data.frame(alkalinity,DIC)

# Solve the carbonate system for the given conditions

  output  <- carb(flag = 15, inputs$alkalinity, inputs$DIC, S=35, T=25, P = 0, k1k2="r", pHscale="T")

# Print the calculated pH value

  output$pH

## [1] 8.010483 7.825954 7.594557

In this example, we create a data frame by combining the DIC and alkalinity vectors into a data frame called “inputs”. “Inputs” has two columns and three rows; note that R automatically applies column headers (“alkalinity” and “DIC”) based on the names of the parent vectors. If you run this script in R studio, you should see a new tab called “inputs” in your Global Environment pane (top right). Clicking on “inputs” should open a new window with a spreadsheet view. This is a nice quality-of-life feature in R studio that can be very useful for troubleshooting.

The rest of the script shows the syntax for calling up information out of a data frame. The syntax inputs$alkalinity tells R to look in the data frame named “inputs” and find a column named “alkalinity”. R will then interpret the “alkalinity” column as a vector of the appropriate length. We can see that this yields the same results as calling on individual vectors in the previous example.

Building fan diagrams with seacarb

Let’s set our sights higher: we want to program R to quickly and easily make fan diagrams. Take another look at the fan diagram in Figure 1. We already know that seacarb takes vectorized inputs to handle multiple calculations. A reasonable plan would be to 1) make a grid of control points in DIC-alkalinity space, 2) pass them through seacarb to get the pH value (our z variable) for each point, and 3) use a graphics package to turn the points into a contour plot. Ideally, we would like to add options so that we can easily re-size the grid by changing the endpoints. We would also like to adjust the spacing between points by adding or subtracting nodes.

Making a grid in R can be accomplished with two functions: seq() and expand.grid. The seq() function creates a sequence of evenly-spaced numbers and stores them as a vector. It only needs three inputs: the first number (from =), the last number (to =), and the step size between entries. The expand.grid() function takes two vectors and makes a data frame consisting of every possible combination. The result can be plotted as a series of points with unique x and y values:

DIC_range  =  seq(from = 1.9e-3, to = 2.4e-3, by = 5e-5)  # limits and spacing for DIC axis [moles]
TA_range   =  seq(from = 2.1e-3, to = 2.5e-3, by = 1e-4)  # limits and spacing for alkalinity axis [moles]

grid <- expand.grid(DIC = DIC_range, TA = TA_range)

Sometimes it’s preferable to set the number of nodes (entries) along an axis and then let the computer find their spacing. We can solve the spacing, ($\Delta$), for n nodes as follows:

\[ \Delta = \frac{max - min}{n -1}\] Usually it’s best to treat each axis seperately with it’s own maximum, minimum, and spacing.

For a grid with $m * n$ nodes, the output of the expand.grid() function is a data frame with two columns and $m * n$ rows. This is exactly the configuration that we need to pull columns into carb(). Once carb() has solved the other variables in the carbonate system, we can pick one to render on the z-axis of our plot. The code below produces gridded points in DIC-alkalinity space colored by their pH values:

library(seacarb)
library(ggplot2)

## Set the max, min, and number of nodes for each axis:

  DIC_min    =   500e-6              # minimum value for the DIC axis [moles]
  DIC_max    =   9000e-6              # maximum value for the DIC axis [moles]
  DIC_nodes  =        11              # number of DIC nodes 

  TA_min     =   500e-6              # minimum value for the total alkalinity axis [moles]
  TA_max     =   9000e-6              # maximum value for the total alkalinity axis [moles]
  TA_nodes   =        17              # number of total alkalinity nodes 

## Compute grid

  DIC_range  =   seq( from = DIC_min, to = DIC_max, 
                    by = (DIC_max - DIC_min)/(DIC_nodes-1))

  TA_range   =   seq( from = TA_min, to = TA_max, 
                    by = (TA_max - TA_min)/(TA_nodes-1))

  grid      <-   expand.grid(DIC = DIC_range, TA = TA_range)

## Run grid through seacarb

  output  <- carb(flag = 15, grid$TA, grid$DIC, 
                  S=35, T=25, P = 0, k1k2="r", pHscale="T")

## Plot results using the ggplot2 library

  plot<- ggplot() +
         geom_point(data =output, 
         mapping = aes(x = output$DIC*10^3, 
                       y = output$ALK*10^3, 
                       colour = output$pH), 
         size = 4, alpha = 1, 
         fill = "black") +
    
    scale_color_gradientn(colors = c("red", "yellow", "green"),
                            name = "pH") +
    
    theme_classic()+
    
 # Add labels and tidy up the plot for export.
    
    labs (x = 'DIC (mmol)', y = 'Total alkalinity (mmol)', 
          title = 'pH as a function of DIC and alkalinity') +
   
    theme(plot.title = element_text(size = rel(1.2), hjust = .5)) + 
    theme(axis.title  = element_text(size = rel(1), hjust = .5)) 
  
  plot

From here it is only a small step to turn the grid into contour lines using graphics packages such as gglpot2 and plotly. We won’t discuss the syntax of these packages here, but there are many excellent guides online such as this one.

(Side note: you may be wondering why some of the cells are greyed out in the upper left. My guess is that they are empty because of how seacarb calculates total alkalinity, which is the sum of carbonate alkalinity plus minor components. If you do not specify boron and phosphate, seacarb will assume their concentrations according to some default values for modern seawater. I think the algorithm gets confused when total alkalinity:DIC ratio exceeds 2:1, which implies substantial contributions from non-carbonate species. Some of the other software we will use should bypass this limitation.)

Worked example: where did all the corals go after Triassic-Jurassic extinction?

Let’s try a real world example. Many Phanerozoic mass extinctions coincide with evidence for volcanic activity. One potential kill mechanism is ocean acidification triggered by volcanic CO$_2$ emissions, which has several negative effects on coral survival. Martindale et al., (2012) investigate whether ocean acidification could have caused the “coral gap” in the Early Jurassic where corals are rarely preserved in the fossil record. First, they summarize evidence for how modern corals respond under low aragonite saturations ($\Omega_{arag}$). A central part of their analysis is a fan diagram exploring $\Omega_{arag}$ (z) as a function of DIC (x) and pCO$_2$ (their Figure 1). Let’s see if we can reproduce their analysis in R.

The first challenge is conceptual: their fan diagram is in DIC-pCO$_2$ space while we have been operating in DIC-alkalinity space. Furthermore, their z variable is $\Omega_{arag}$ while ours is pH. Just remember that all fan diagrams operate on the same principle: “from two, six”. The x and y axes are treated as independent variables which uniquely determine a third carbonate variable, z. We can see this by examining the expression for aragonite saturation state $\Omega_{arag}$ used in the paper:

\[ \Omega_{arag} = \frac{[Ca^{2+}]\times[CO_3^{2-}]}{{K_{sp}}^*}\] where brackets denote concentration and $K_{sp}$ is the stochiometric solubility product. If we treat $Ca^{2+}$ and $K_{sp}$ as constants, then $\Omega_{arag}$ is a function of $[CO_3^{2-}]$ only. The plot follows the same logic as other fan diagrams.

The underlying logic makes it relatively painless to switch the axes on our fan diagrams. In the code below, we have changed the flag value to tell carb() to expect pCO$_2$ followed by DIC. CO2sys also calculates $\Omega_{arag}$ assuming [$Ca^{2+}$] and $K_{sp}$ values appropriate for modern seawater. We have also used the plotly package to draw contours and apply a simple red-yellow-green colormap:

library(seacarb)
library(plotly)

## Set the max, min, and number of nodes for each axis:

  DIC_min    =   1500e-6              # minimum value for the DIC axis [moles]
  DIC_max    =   6000e-6              # maximum value for the DIC axis [moles]
  DIC_nodes  =        51              # number of DIC nodes 
  
  pCO2_min   =      1000              # minimum value for the total alkalinity axis [moles]
  pCO2_max   =      8000              # maximum value for the total alkalinity axis [moles]
  pCO2_nodes =        51              # number of total alkalinity nodes 

## Contour aesthetics

  min_level  =         0              # lowest pH contour
  max_level  =         7              # highest pH contour
  spacing    =         1              # contour spacing

## Compute grid

  DIC_range  =   seq( from = DIC_min, to = DIC_max, 
                    by = (DIC_max - DIC_min)/(DIC_nodes-1))

  pCO2_range =   seq( from = pCO2_min, to = pCO2_max, 
                    by = (pCO2_max - pCO2_min)/(pCO2_nodes-1))

  grid      <-   expand.grid(DIC = DIC_range, pCO2 = pCO2_range)

## Run grid through seacarb

output  <- carb(flag = 25, grid$pCO2, grid$DIC, 
                S=35, T=25, P = 0, k1k2="r", pHscale="T")

## Plot results using the plotly library

  x <- list(title = "DIC (mmol)")
  y <- list(title = "pCO2 (ppm)")
  title<- list(text = "Omega_arag by DIC and pCO2", x = .55)


  plot <- plot_ly(x = c(output$DIC*1000), y = c(output$pCO2), z = c(output$OmegaAragonite),
                type = "contour",
                colors = colorRamp(c("red","yellow","green")),
                contours = list(start = min_level, end = max_level, size = spacing, 
                                coloring = 'heatmap', showlabels = TRUE),  
                line = list(smoothing = 0.85, color = "black", size = 5)) 

  plot %>% hide_colorbar() %>% 
         layout(xaxis = x, yaxis = y, title = title )

One takeaway from this exercise is that the choice of axes is up to the researcher. However, good choices take into account what variables can be constrained and what hypothesis being tested. A contour plot usually implies that the z variable is a function of x and y. By placing$\Omega$ on the z axis, Martindale et al., (2012) suggest that it is a function of pCO$_2$ and the size of the oceanic carbon reservoir. This is, of course, completely intentional and is designed to mirror the proposed cause-effect relationship in their hypothesis.

Summary

Although fan diagrams in the literature show a wide range of variables on their axes, they are all based on the equations of carbonate equilibria and their two degrees of freedom. This underlying similarity, along with user-friendly packages in R, permits flexible scripting that can produce many variations in fan diagrams. My hope is that these scripts will aid your research and help you think critically about diagrams you encounter in this course.

References:

Lavigne, H., and J. Gattuso. “Seacarb: seawater carbonate chemistry with R.” (2010).

Martindale, Rowan C., et al. “Constraining carbonate chemistry at a potential ocean acidification event (the Triassic–Jurassic boundary) using the presence of corals and coral reefs in the fossil record.” Palaeogeography, Palaeoclimatology, Palaeoecology 350 (2012): 114-123.

Zeebe, Richard E., and Dieter Wolf-Gladrow. CO2 in seawater: equilibrium, kinetics, isotopes. No. 65. Gulf Professional Publishing, 2001.