1 Introduction

The Arctic Ocean is connected to the Atlantic Ocean on both sides of Greenland and to the Pacific Ocean through the Bering Strait (Talley et al. 2011). Water exchange between the Atlantic and Arctic oceans is important for both the regional climate and the global thermohaline circulation (Aagaard et al. 1985; Dickson et al. 2008). Understanding the Arctic Ocean changes and predicting its future evolution are among the key topics of climate research.

The main water mass at the Arctic intermediate depth, the Atlantic water (AW), originates from the North Atlantic. Warm and saline AW enters the Nordic Seas in the North Atlantic Current, and then travels northward as two branches of the Norwegian Atlantic Current (Orvik and Niiler 2002). Part of the eastern branch enters the Barents Sea, and then flows into the intermediate and deeper layers of the Arctic Ocean at the St. Anna Trough in the northern Kara Sea (Schauer et al. 2002a; Karcher and Oberhuber 2002; Maslowski et al. 2004). The remaining part of the eastern branch and the western branch approach the Fram Strait as the West Spitsbergen Current (WSC). Part of the WSC enters the Arctic Ocean, supplying the warm AW layer (Rudels and Friedrich 2000; Schauer et al. 2008; Beszczynska-Moeller et al. 2012), and the remaining part of the WSC recirculates in the Fram Strait and becomes part of the southward flowing East Greenland Current (EGC) (Marnela et al. 2013; Hattermann et al. 2016; Wekerle et al. 2017). The AW circulates mainly cyclonically in the Arctic basins and is aligned with the bathymetry (Rudels et al. 1994; Karcher et al. 2003). It plays an important role in the heat budget of the Arctic Ocean. After leaving the Arctic Ocean in the EGC, it supplies the dense waters that feed the Atlantic overturning circulation (Rudels and Friedrich 2000; Karcher et al. 2011). Although AW is at the intermediate depth and is not directly in contact with the surface Arctic Ocean, a warming trend in the AW layer in the Arctic Ocean has been observed (Polyakov et al. 2012, 2013), with an implication on accelerating sea ice decline in a warming climate (Polyakov et al. 2004, 2010, 2017; Ivanov et al. 2018).

Climate models are useful tools for studying Arctic climate dynamics and predicting future climate changes. Although the performance of AW layer simulations in forced ocean-ice models has been assessed thoroughly in the Arctic Ocean Intercomparison Project (AOMIP, Proshutinsky and Kowalik 2007; Proshutinsky et al. 2011) and the Coordinated Ocean-ice Reference Experiments, phase II project (CORE-II, Griffies et al. 2009) by Holloway et al. (2007) and Ilicak et al. (2016), comprehensive assessment of the AW layer simulated in state-of-the-art climate models, especially its long-term trend, has not been undertaken. Thus, the performance of AW layer simulations in coupled climate models still remains unknown.

Previous studies have shown that ocean general circulation models (OGCMs) used in the AOMIP and CORE-II projects commonly produce an AW layer that is too thick and too deep (Holloway et al. 2007; Ilicak et al. 2016). The common problem is believed to be related to the configuration of OGCMs, such as model resolution, vertical mixing schemes, and eddy–topography interaction parameterization. Unlike ocean-ice models forced with atmospheric reanalysis fields used in these studies, climate models allow active atmosphere-ice-ocean coupling, which has advantages such as the avoidance of unphysical freshwater fluxes associated with surface salinity restoring (Griffies et al. 2009). However, ocean models in fully coupled systems might have large biases because of uncertainties in atmospheric and land models and biases in the representation of interaction between ocean and atmosphere. Therefore, it is necessary to evaluate AW layer simulations in state-of-the-art climate models to understand their performance.

The Coupled Model Intercomparison Project Phase 5 (CMIP5) constitutes a useful platform for studying climate change and for assessing the performance of state-of-the-art climate models. In this study we focus on the assessment of the AW layer in the Arctic Ocean in CMIP5 historical simulations, for which observations are available for comparison. The paper is structured as follows. The data and methodology are introduced in Sect. 2. Section 3 presents the results, and Sect. 4 provides discussions and the derived conclusions.

2 Data and methodology

This study uses monthly ocean potential temperature and salinity output from the historical simulations (1950–2005) of 41 CMIP5 coupled models, which are available from the Earth System Grid Federation (ESGF) (http://pcmdi9.llnl.gov/esgf-web-fe/). There are different ensemble realizations in the different CMIP5 models, and for simplicity, the first ensemble member from each model is selected in this study. Table 1 lists the names and resolutions of each model used in this study. The horizontal resolution of most models is nominally 1°. The mean horizontal grid numbers of the 41 models are 341 × 267. The MIROC4h model (Sakamoto et al. 2012) has the highest horizontal resolution with 1280 × 912 grid numbers.

Table 1 Information regarding the CMIP5 models and PHC3.0 dataset used in this study

Gridded climatological ocean temperature from the Polar Hydrographic Climatology version 3.0 dataset (PHC3.0, Steele et al. 2001) is used as observational reference in this study. The PHC3.0 dataset has high quality in the Arctic Ocean because it is the combination of the World Ocean Atlas, Arctic Ocean Atlas, and some Canadian observations. In the Arctic Ocean, it mainly covers the period after 1950. Its horizontal resolution is 1° × 1°. To assess the long-term trend of AW temperature, the time series of observations from Polyakov et al. (2012) is also used in this study.

Different from the Arctic surface, deep, and bottom waters, the AW is characterized by its high temperature (defined by T > 0 °C). In this paper, the AW core temperature (AWCT) and AW core depth (AWCD) during 1950 to 2005 from the CMIP5 models and their multimodel mean (MMM) are compared with those derived from the PHC3.0 dataset. Similar to previous studies (Polyakov et al. 2004; Li et al. 2012, 2014; Wang et al. 2018), the AWCT is defined as the maximum temperature below the halocline (150 m in this study) in the vertical profile, and the AWCD is defined as the depth of the AWCT. During our analysis, the Arctic deep basin (where ocean is deeper than 500 m) is divided into the Eurasian and Canadian basins separated by the Lomonosov Ridge, as shown in Fig. 1. In this study, correlation analysis is used to understand the relationship of AWCT bias to upstream conditions. The correlation coefficient used in this paper is Spearman’ rho, and the p value is computed using large-sample approximations. If the p value < 0.05, we consider the correlation to be significant.

Fig. 1
figure 1

Arctic Ocean bottom topography (unit: m). Red lines indicate the main Arctic gateways and the black line crossing the North Pole indicates the location of section S1 used in Fig. 5 and the supplementary Figs. S1 and S3

3 Results

The water mass in the Arctic Ocean can be roughly divided into three layers: cold Polar Surface Water that extends from the sea surface to the depth of about 200 m, the relatively warm AW layer in the depth range of about 200–800 m, and deep/bottom waters that lie below the AW layer (Talley et al. 2011). The focus of this study is on the Arctic AW layer.

3.1 Temperature and salinity profiles

Basin mean vertical profiles of potential temperature and salinity in the Eurasian and Canadian basins are compared in Fig. 2. It can be seen that the CMIP5 models have relatively large intermodel spread in simulated temperature for the AW and deep/bottom waters (Fig. 2a, b). Most models have warm biases below the depth of 500 m. Below 1000 m, CMIP5 MMM temperature is about 1 °C warmer than the observation and the bias is larger than in the CORE-II forced-model simulations (Ilicak et al. 2016). The maximum value in the vertical profile of the basin-mean temperature in the PHC3.0 dataset is about 1.2 °C and 0.4 °C in the Eurasian and Canadian basins, respectively (Fig. 2a, b). However, the CMIP5 has a large intermodel spread, with maximum values ranging from < −1 to > 5 °C (Fig. 2a, b).

Fig. 2
figure 2

Profiles of basin mean potential temperature (a, b, unit: °C) and salinity (c, d, unit: psu) in the Eurasian and Canadian basins. Thin lines represent the 41 CMIP5 models, and the thick black and blue lines represent the PHC3.0 dataset and the CMIP5 MMM, respectively

The CMIP5 intermodel spread of simulated salinity is relatively small below the depth of 500 m, but large in the upper ocean (Fig. 2c, d). The basin-mean sea surface salinity ranges from < 30 to > 34 psu among the CMIP5 models. Compared with the CORE-II salinity simulations (Ilicak et al. 2016), the CMIP5 models have larger salinity biases. For the upper 200 m in the Eurasian Basin, the CMIP5 MMM salinity has a negative bias (− 1.0 psu) relative to the PHC3.0 dataset. The CORE-II model results also show a negative bias in this depth range, but with a smaller magnitude (Ilicak et al. 2016). In the Canadian Basin, the CMIP5 MMM salinity has a positive bias in the upper 100 m and a negative bias between the depths of 100 and 500 m. Overall, the signs of the salinity biases (negative or positive) in different depth ranges and basins are similar to standalone ocean-ice models shown by Ilicak et al. (2016), while the magnitudes of the biases are larger in the coupled climate models.

3.2 AWCD and AWCT

The observed AWCD in the Arctic Ocean is mainly in the depth range of 200–600 m (Ilicak et al. 2016). However, we find that the depth of the maximum temperature below the halocline in several CMIP5 models is much deeper than the observations (Fig. 3a, b). Models with the maximum temperature located deeper than four times the observed depth in the Eurasian or Canadian basins are excluded from further analysis, because it is hard to properly classify the AW layer within a reasonable depth range. Following this criterion, nine CMIP5 models (ACCESS1.0, ACCESS1.3, BCC-CSM1.1, CanESM2, GISS-E2-H-CC, GISS-E2-H, MPI-ESM-LR, MPI-ESM-MR, and MRI-ESM1) are excluded (Fig. 3a, b). ACCESS1.0, ACCESS1.3, MPI-ESM-LR, MPI-ESM-MR, and MRI-ESM1 have much deeper AWCD in the Canadian basin, while BCC-CSM1.1, CanESM2, GISS-E2-H-CC, and GISS-E2-H have too deep AWCD in both the Eurasian and Canadian basins (Fig. S1). The models with too large T/S biases, such as too low temperature (< −1 °C) or too high sea surface salinity (> 34 psu), can also be excluded by this criterion.

Fig. 3
figure 3

a, b Atlantic Water core depth (AWCD; unit: m) and c, d Atlantic Water core temperature (AWCT; unit: °C) derived from the CMIP5 models and PHC3.0 dataset for the Eurasian and Canadian basins. Bars are for the individual CMIP5 models. White bars represent models unable to represent the Atlantic Water (AW) layer in one or two Arctic basins; see text for details. Blue and black lines represent the CMIP5 MMM and the PHC3.0 dataset, respectively. Shaded area represents MMM ± one standard deviation. The models unable to represent the AW layer in the Arctic basins (shown by white bars) are excluded in the MMM and other analysis, the same in other figures in the paper. The x axis shows the model number and the corresponding model name can be found in Table 1

Even after excluding these models, the MMM AWCD remains deeper than in the PHC3.0 dataset (Fig. 3a, b). The observed basin-mean AWCD in the PHC3.0 dataset is about 310 m and 460 m in the Eurasian and Canadian basins, respectively, whereas the CMIP5 MMM AWCD is 680 m and 830 m, respectively. Only three models (HadGEM2-AO, HadGEM2-CC, and HadGEM2-ES) in the Eurasian Basin and five models (GISS-E2-R-CC, GISS-E2-R, HadGEM2-AO, HadGEM2-CC, and HadGEM2-ES) in the Canadian Basin produced slightly shallower AWCD than the PHC3.0 dataset. Therefore, the problem of the AW layer being too deep, reported for AOMIP and CORE-II models (Holloway et al. 2007; Ilicak et al. 2016), also exists in the CMIP5 coupled models.

The AWCT from the CMIP5 models and PHC3.0 dataset in the two basins is shown in Fig. 3c, d, respectively. The MMM AWCT shows a warm bias compared to the observations. The basin-mean AWCT from the PHC3.0 dataset is about 1.1 °C and 0.5 °C in the Eurasian and Canadian basins, respectively. The CMIP5 MMM warm bias is about 0.2 °C and 0.5 °C in these two basins, respectively.

The CMIP5 models have very large intermodel spread in the simulated AWCD and AWCT (Fig. 3). The AWCD has a standard deviation of about 240 m and 300 m in the Eurasian and Canadian basins, respectively. The minimum and maximum AWCDs in the Eurasian Basin are 250 m (HadGEM2-CC) and 1220 m (MRI-CGCM3), respectively (after excluding the nine models mentioned above). In the Canadian Basin, the basin mean AWCD is in the range 320–1370 m. For the AWCT, the standard deviation is about 0.9 °C and 1.0 °C in the Eurasian and Canadian basins, respectively. The IPSL-CM5B-LR model has the warmest AWCTs with basin-mean values of 4.8 °C and 4.2 °C in the Eurasian and Canadian basins, respectively. The GFDL-ESM2G and HadGEM2-ES models have the coldest AWCTs with values of about 0.1 °C and − 0.2 °C in the Eurasian and Canadian basins, respectively.

The spatial patterns of AWCT and AWCD are related to the AW circulation pathways in the Arctic basins. In the Arctic Ocean, the AW follows a path that is mainly cyclonic along the continental slope and mid-ocean ridges, i.e., a topographically steered boundary current (Rudels et al. 1994). The AW from the Fram Strait flows eastward along the Eurasian slope, converges with the AW from the Barents Sea Opening (BSO) at the St. Anna Trough (Schauer et al. 2002a; Karcher and Oberhuber 2002), and then flows eastward along the continental slope. After passing the Laptev Sea slope, the AW circulation divides into two branches: one along the Lomonosov Ridge toward the Fram Strait and the other along the continental slope toward the Canadian Basin (Woodgate et al. 2001). Along the pathways, the AWCT decreases gradually. It is around 3 °C near the Fram Strait, and decreases gradually to about 0.8 °C near the Lomonosov Ridge and to about 0.4 °C in the Canada Basin (Fig. 4a). The AWCD deepens from the depth of 200 m in the Fram Strait to about 350 m near the Lomonosov Ridge, and to about 500 m in the Canada Basin (Fig. 4c). These changes are due to mixing from both above and below and with shelf waters (Talley et al. 2011).

Fig. 4
figure 4

a, b AWCT (unit: °C) and c, d AWCD (unit: m) from PHC3.0 and CMIP5 MMM

Figure 4b shows that the CMIP5 MMM AWCT is high near the Fram Strait and low in the Canadian Basin, similar to that in the observations. However, the CMIP5 MMM AWCT is higher than the observed in most regions of the Arctic Ocean, and the gradient from warm to cold regions is different from the observed. For the simulated AWCD, the CMIP5 MMM is shallow in the Fram Strait and deep in the Canada Basin. However, it is much deeper than in the PHC3.0 dataset for almost the entire Arctic Ocean (Fig. 4d). The spatial patterns of both the AWCT and the AWCD have large biases in the Canadian Basin. In the PHC3.0 dataset, low AWCTs and deep AWCDs are found mainly in the southeastern Canadian Basin. In the CMIP5 MMM results, a tongue of relatively warm water extends directly from the Laptev Sea slope towards north of Greenland through the North Pole, implying that the simulated AW pathways are not well along the bottom topography (Fig. 4b). The MMM AWCD is deepest in the region close to the East Siberian continental slope (Fig. 4d), different from the observation.

Inspecting individual models reveals that most of them cannot adequately reproduce the observed topography-following feature of the AW circulation (Fig. S2). This might be attributed to the unrealistic Arctic Circumpolar Boundary Current in the coarse-resolution models. The AW pathways are along topographically steered boundary currents, which follow continental slopes or the mid-ocean ridges. However, the nominal resolution of about 1° in most CMIP5 models is too coarse and the spurious dissipation leads to a weak and wide Arctic Circumpolar Boundary Current (Fig. S2). It is suggested that this problem could be alleviated by including Neptune parameterization of eddy–topography interaction (Holloway 1986, 1987; Polyakov 2001; Golubeva and Platov 2007; Li et al. 2013) or by improving model resolution (Li et al. 2013; Wang et al. 2018). However, the MIROC4h model (Sakamoto et al. 2012), which has the highest resolution among the CMIP5 models (0.28125° × 0.1875°), is also subject to this common problem (Fig. S2). This may indicate that higher (eddy-resolving) horizontal resolution is needed in the AW simulations, while we cannot exclude some other unknown reasons. The isopycnal-coordinate ocean model in NorESM1-M and NorESM1-ME produces better results, which is very possibly due to the fact that advection operators in isopycnal coordinates do not allow for spurious diapycnal mixing (Griffies 2004).

3.3 Temperature vertical section

To assess the AW layer thickness, a vertical section of potential temperature between 70°E and 110°W along section S1 (indicated in Fig. 1) from the PHC3.0 dataset and the CMIP MMM is shown in Fig. 5a, b, respectively. This section can easily capture the spatial gradient of temperature in the Arctic Ocean. If the AW layer is defined as the layer with potential temperature > 0 °C (Holloway et al. 2007), the AW layer thickness in the PHC3.0 dataset is about 700 m (Fig. 5a). The observed upper boundary of the AW layer along section S1 is at the depth of about 200 m (slightly shallower in the Eurasian Basin and deeper in the Canadian Basin), and the lower boundary is at the depth of about 800 m.

Fig. 5
figure 5

Vertical section of water temperature (unit: °C) between 70°E and 110°W along section S1 (Fig. 1) from a the PHC3.0 dataset and b the CMIP5 MMM. Black line is the 0 °C isotherm

It can be seen that the CMIP5 MMM tends to produce a too thick AW layer in comparison with the PHC3.0 dataset (Fig. 5b). The depth of the upper boundary of the AW layer is reasonable, while the lower boundary of the AW layer is too deep and almost reaches the bottom of the Arctic Ocean. This is most possibly the consequence of too much spurious diapycnal mixing associated with coarse resolution models. Spurious mixing is expected to decrease with increasing resolution. By increasing horizontal resolution to 4.5 km in the Arctic Ocean while keeping vertical resolution the same, Wang et al. (2018) obtained realistic AW layer thickness. The vertical section of potential temperature along section S1 for each CMIP5 model is shown in Figure S3. Most CMIP5 models simulated a too thick AW layer. Only a few models (GISS-E2-R-CC, GISS-E2-R, HadGEM2-AO, HadGEM2-CC, NorESM1-M, and NorESM1-ME) produced relatively reasonable AW layer thickness. These models also better simulated AWCD (Fig. 3). The isopycnal-coordinate ocean model in NorESM1-M and NorESM1-ME produced better results in this respect too by eliminating spurious diapycnal mixing.

The problem of too deep and too thick AW layer in the Arctic basins could also be partly attributed to the upstream condition, the AW properties in the Fram Strait. Figure S4 shows that the AW in the Fram Strait in most CMIP5 models has already the characteristics of being too deep and too thick in comparison with the PHC3.0 dataset. The characteristics of the AW in the Fram Strait can propagate into the Arctic basins, explaining part of the biases inside the Arctic Ocean. This suggests that spurious mixing associated with coarse resolution both outside and inside the Arctic Ocean causes the obtained biases in the Arctic Ocean.

3.4 Relationship of AW temperature bias to upstream conditions

Analysis of forced ocean-ice models showed that the performance of the simulated AW inflow through the Arctic gates could explain some of the biases in the Arctic AW temperature (Ilicak et al. 2016). In the following we will also investigate possible causes of the biases in the simulated AWCT in the CMIP5 models.

AW enters the Arctic Ocean via warmer (Fram Strait) and cooler (Barents Sea) branches. Figure 6 shows the relationship of the AWCT bias in the Arctic basins to the AWCT bias in the Fram Strait in the CMIP5 models. Three models (HadGEM2-AO, HadGEM2-CC, and HadGEM2-ES), which cannot resolve the topography in the Fram Strait due to coarse resolution, are excluded in the plots (their model topography is about 1000 m instead of about 4000 m, see Fig. S4). Significant correlation is found between the AWCT bias in the Arctic basins and the AWCT bias in the Fram Strait in the CMIP5 models (Fig. 6), with the Spearman’s correlation coefficient for the Eurasian Basin (0.74) being slightly larger than for the Canadian Basin (0.61). Models with large AWCT bias are those with a large temperature bias in the Fram Strait.

Fig. 6
figure 6

Relationship of Atlantic Water core temperature (AWCT) biases in the a Eurasian and b Canadian basins to AWCT biases in the Fram Strait. The models numbered 26–28 are not included. The latter models have too shallow ocean depth in the Fram Strait (see Fig. S4)

Another branch of AW flows into the Barents Sea through the BSO, where the water is exposed to the air above and vertically mixed and cooled very efficiently in winter time (Smedsrud et al. 2013). This cooled branch flows into the Arctic basins via the St. Anna Trough in the northern Kara Sea, where part of the AW converges with the warmer branch from the Fram Strait (Schauer et al. 2002a; Karcher and Oberhuber 2002). Figure 7 shows that there is significant correlation between the AWCT biases in the Arctic basins and the temperature biases in the whole Kara Sea in the CMIP5 climate models. The Spearman’s correlation coefficient for the Eurasian Basin (0.71) is also slightly larger than for the Canadian Basin (0.67). The high correlation reveals that the Arctic Ocean AWCT biases in the CMIP5 models are also related to temperature biases in the Barents Sea AW branch.

Fig. 7
figure 7

Relationship of AWCT biases in a the Eurasian Basin and b the Canadian Basin to the Kara Sea temperature biases in the CMIP5 models

Cooling of the AW passing through the Barents Sea plays a crucial role in the ventilation of the Arctic Ocean (Schauer et al. 2002b). The winter cooling of the AW in the Barents Sea generates relatively deep upper mixed layer and cold dense water, which passes the Kara Sea before penetrating into the Arctic deep basin. Significant correlation between Kara Sea temperature biases and the Barents Sea March mixed-layer depth (MLD) biases (Fig. 8a), and between the Barents Sea March MLD biases and the Barents Sea March sea ice extent (SIE, Fig. 8b) are found in the CMIP5 models. Models with shallow wintertime MLD in the Barents Sea usually have warm biases in the Kara Sea. Too shallow wintertime MLD means that the AW in the Barents Sea is not cooled efficiently. Figure 8b further suggests that the models with too shallow wintertime MLD usually have greater SIE in winter. Increased coverage of sea ice will limit the air–sea heat exchange in these models and cause inefficient cooling of the AW in the Barents Sea. Thus, excessive heat is transported into the Kara Sea and then into the Arctic basins. Onarheim et al. (2015), Wang et al. (2016) and Li et al. (2017) indicate that the Barents Sea SIE is correlated with heat inflow through the BSO. So models with greater heat inflow via the BSO usually have less sea ice coverage, release more heat into the air over the Barents Sea, and then have cold biases in the Arctic Ocean. Therefore, Barents Sea cooling is also a crucial factor impacting the AWCT in climate models, similar to the finding based on forced ocean-ice models in Ilicak et al. (2016).

Fig. 8
figure 8

Relationships among Kara Sea temperature biases, Barents Sea March mixed-layer depth (MLD) biases, and Barents Sea March sea ice extent (SIE) biases in the CMIP5 models

3.5 Variability and trend

Long-term observations during the twentieth century revealed that the AW layer of the Arctic Ocean is featured with low-frequency oscillation on the timescale of 50–80 years (Polyakov et al. 2004). For the past several decades, the AW layer in the Arctic Ocean has experienced a warming trend (Polyakov et al. 2004, 2012). Based on analysis of an extensive array of observations in the Arctic deep basin obtained during 1950–2011, Polyakov et al. (2012) concluded that significant warming has started since the late 1970s (Fig. 9).

Fig. 9
figure 9

a Time series of Arctic mean AWCT anomalies over 1950–2005 and b the AWCT trends during 1980–2005 in the CMIP5 simulations and observations. Observations are from Polyakov et al. (2012). In a thin lines represent individual CMIP5 simulations and the thick black line represents the observations. Anomalies are referenced to the 1950 value. Bars in b represent the number of CMIP5 models that have the AWCT trend indicated by the x axis. Thick black line in b represents the observed AWCT trend

The CMIP5 models did not reproduce the observed warming trend of the Arctic AW layer (Fig. 9). The observed linear trend of AWCT is 0.66 °C decade−1 during 1980–2005, whereas the CMIP5-simulated AWCT is very stable without significant trends. Figure 9b shows that 8 CMIP5 models produced slight cooling trends and 24 CMIP5 models produced very weak warming trends during 1980–2005. The HadGEM2-AO model produced the largest warming trend (0.074 °C decade−1). This might be related to HadGEM2-AO having a more realistic AW layer thickness (Figure S3). But its trend is still much smaller than the observed. Figure 9a also shows that interannual variability in the CMIP5 models is very small and much weaker than the observed.

4 Discussions and conclusions

In this study we assessed the AW layer of the Arctic Ocean in 41 CMIP5 coupled climate models. Nine of the CMIP5 models did not reproduce a well-defined AW layer. Although the MMM results derived from the remaining 32 CMIP5 models can reproduce the main spatial patterns of the AWCT and the AWCD, the simulated AW layer is too deep and thick. This is also a common problem in state-of-the-art standalone ocean-ice models (Li et al. 2014; Ilicak et al. 2016). Spurious numerical dissipation was considered to be one of the key reasons for the deepening and thickening of the AW layer (Holloway et al. 2007).

The CMIP5 models show large intermodel spreads in the simulated Arctic hydrography, AWCT and AWCD. Our analysis indicates that the AWCT biases in the Arctic basins are related to the AW temperature bias in the Fram Strait and the ocean temperature in the Barents and Kara seas. We suggest that the performance in the simulation of sea ice coverage and surface cooling in the Barents Sea is one of the key factors that can influence the fidelity of the AW layer in the models.

The interannual variability of the AWCT in the CMIP5 models is much weaker than the observed. No CMIP5 model can reproduce the observed significant warming trend in the AW layer of the Arctic Ocean in the past decades. The recently observed “Atlantification” of the eastern Eurasian Basin implies increasing impacts of the AW layer on the sea ice state (Polyakov et al. 2017). Failing to simulate the warming trend of the AW layer (or delay in capturing the warming trend) can prevent climate models from being applied in studies on the impacts of “Atlantification” in the warming climate.

Many factors in models might affect the simulation of the AW layer in the Arctic Basin, e.g., model resolution, vertical mixing, eddy and eddy–topography interaction parameterizations, advection schemes, and representation of import of thermal and freshwater anomalies through Arctic gates (Morales Maqueda and Holloway 2006; Zhang and Steele 2007; Holloway et al. 2007; Li et al. 2011, 2013; Wang et al. 2018). Previous studies have shown that the common problems in simulating the AW layer in standalone ocean-ice models could be alleviated by incorporating high-order advection schemes (Morales Maqueda and Holloway 2006; Holloway et al. 2007), including eddy–topography interaction parameterization (e.g., Golubeva and Platov 2007; Holloway and Wang 2009; Li et al. 2013), tuning background vertical mixing coefficients (Zhang and Steele 2007), or increasing model horizontal resolution (e.g., Li et al. 2013; Wang et al. 2018). Our results show that the temperature and salinity biases in coupled climate models are larger than in standalone ice–ocean models. Therefore, to improve the representation of the AW layer in climate models, such measures should also be tested and studied by different model development groups.

In the coming CMIP6 phase, many modeling groups will use improved model versions and better resolutions. We expect to see improvement in the model representation of the Arctic Ocean in the new simulations, which need to be accessed when results become available.