A path towards uncertainty assignment in an operational cloud-phase algorithm from ARM vertically pointing active sensors

Knowledge of cloud phase (liquid, ice, mixed, etc.) is necessary to describe the radiative impact of clouds and their lifetimes, but is a property that is difficult to simulate correctly in climate models. One step towards improving those simulations is to make observations of cloud phase with sufficient accuracy to help constrain model representations of cloud processes. In this study, we outline a methodology using a basic Bayesian classifier to estimate the probabilities of cloud-phase class from Atmospheric Radiation Measurement (ARM) vertically pointing active remote sensors. The advantage of this method over previous ones is that it provides uncertainty information on the phase classification. We also test the value of including higher moments of the cloud radar Doppler spectrum than are traditionally used operationally. Using training data of known phase from the Mixed-Phase Arctic Cloud Experiment (M-PACE) field campaign, we demonstrate a proof of concept for how the method can be used to train an algorithm that identifies ice, liquid, mixed phase, and snow. Over 95 % of data are identified correctly for pure ice and liquid cases used in this study. Mixed-phase and snow cases are more problematic to identify correctly. When lidar data are not available, including additional information from the Doppler spectrum provides substantial improvement to the algorithm. This is a first step towards an operational algorithm and can be expanded to include additional categories such as drizzle with additional training data.


Introduction
Cloud feedbacks are one of the largest uncertainties in global climate model simulations of future climates, limited in part by a lack of observations with sufficient and known accuracy to constrain cloud microphysical parameterizations (Stephens, 2005;IPCC, 2013).Cloud hydrometeor phase is a radiatively important property of clouds (Sun and Shine, 1994;Shupe and Intrieri, 2004;Turner, 2005) that is difficult to accurately model and observe (e.g., Komurcu et al., 2014;Shupe et al., 2008;Cesana and Chepfer, 2013), and is also important for understanding cloud life cycle (Shupe et al., 2008).Quantitative microphysical retrievals make assumptions regarding the phase of cloud properties (ice, liquid, or mixed) before choosing appropriate forward models to use in algorithms (e.g., Zhao et al., 2012).Retrievals of cloud phase are a necessary first step towards improved retrievals of water contents and particle sizes.
The focus of this study is the development of an algorithm that identifies cloud phase from vertically pointing radars and lidars at the ARM (Atmospheric Radiation Measurement) Climate Research Facility (www.arm.gov) that also estimates the uncertainty of that identification.A number of methods have been developed previously to identify cloud phase using similar instrumentation.Shupe (2007) presented an algorithm that uses thresholds of lidar backscatter and depolarization ratio, and three moments of the radar Doppler spectrum (reflectivity, mean Doppler velocity, and spectrum width) along with temperature to classify six different hydrometeor types.A target classification algorithm developed for the CloudNet network of observation sites (Illingworth et al., 2007) uses lidar and radar scattering parameters to Published by Copernicus Publications.L. D. Riihimaki et al.: Cloud phase from active sensors flag times when instruments indicate detection of small liquid drops, falling hydrometeors, and melting ice along with temperature information to give likely hydrometeor classifications (Hogan and O'Connor, 2006).Both of these decisiontree methods are based on well-established scientific understanding of instrument sensitivities to hydrometeors, but do not quantify the uncertainty of the phase assignment.
Lidar backscatter, especially paired with depolarization ratio, is a sensitive indicator of the presence of super-cooled liquid water (e.g., Sassen, 1991;Shupe, 2007), which exists at temperatures colder than 0 • C. A number of algorithms have been developed for ground-or space-based active sensors that use lidar backscatter thresholds or the attenuation of the lidar (Zhang et al., 2010;Choi et al., 2010;Hogan et al., 2003Hogan et al., , 2004;;Cesana and Chepfer, 2013) or lidar backscatter and depolarization ratio together (Cesana and Chepfer, 2013;Hogan et al., 2003) to identify liquid clouds.
Lidar data alone have two limitations in identifying cloud phase.First, because lidar data are more sensitive to high concentrations of small liquid droplets than low concentrations of large ice crystals, it may fail to detect mixed-phase conditions.A recent study by Bühl et al. (2013) showed that using lidar measurements alone to detect mixed-phase clouds underestimated the fraction of mixed-phase clouds compared to combined lidar and radar methods when the concentration of ice crystals was very low compared to the number of liquid droplets.The radar wavelength on the other hand is much longer, so the strength of the signal is proportional to the particle diameter to the sixth power and thus is much more sensitive to a few large ice particles.
Another considerable limitation of lidar measurements for phase detection is that lidars attenuate quickly in clouds with an optical depth greater than three so can only be used in optically thin clouds.In order to circumvent this limitation, Luke et al. (2010) trained a neural network on wavelet transforms of the full radar Doppler spectrum to emulate lidar backscatter and depolarization ratio measurements.Though noisier, the radar Doppler spectra generally did have sufficient information content to reproduce the phase information in optically thick clouds that would otherwise not have been retrievable from actual lidar observations.Yu et al. (2014) built on this work and used wavelet transforms to deconvolve liquid peaks in the Doppler spectra from other signals.These studies show that a good deal of information is available within the Doppler spectra to identify liquid within a cloud in addition to the high sensitivity to ice.
The goal of this study is to test the value of two potential improvements to previous decision-tree approaches to operational phase identification algorithms.First, the means and covariances of observational variables are used in a simple Bayesian classifier to estimate the probability that a given phase category describes a particular cloud volume.This gives an estimate of confidence in the phase identification, a first step towards quantifying the uncertainty of microphysical retrievals.Second, additional variables describing the radar Doppler spectra are used to test how much this information improves identification of liquid and mixed-phase cases, particularly in the absence of lidar measurements.
This study describes an algorithm proof of concept, using data from the Mixed-Phase Arctic Cloud Experiment (M-PACE) field campaign when aircraft in situ measurements are available along with vertically pointing lidar and radar measurements to help train and evaluate the algorithm.Section 2 documents the observational data used in the study.The ground-truth data set used to train and validate the algorithm is described in Sect.3, followed by a description of the algorithm methodology in Sect. 4. Section 5 discusses algorithm validation.Finally, conclusions and a description of additional work needed to create an operational algorithm using these techniques are included in Sect.6.

Remote sensing data
The time period of the M-PACE field campaign was chosen because of the simultaneous availability of data from the high spectral resolution lidar (HSRL), millimeter cloud radar (MMCR) Doppler spectra, and aircraft in situ measurements.The M-PACE field campaign took place in the fall of 2004 at the ARM Barrow, Alaska, site (Verlinde et al., 2007).Another advantage of using the M-PACE field campaign is that a number of studies have already been done interpreting M-PACE aircraft data (e.g., Klein et al., 2009;McFarquhar et al., 2007;Verlinde et al., 2013;Morrison et al., 2009), facilitating a quicker identification of a truth data set to train and test the algorithm.

HSRL
The University of Wisconsin HSRL was deployed at the Barrow, Alaska, ARM site during the M-PACE campaign.The lidar operates at a wavelength of 532 nm and independently measures molecular and particulate scattering based on the width of frequency of the returned signal (Eloranta, 2005a).Additionally, the HSRL measures the depolarization of the returned signal, which helps distinguish spherical from nonspherical hydrometeor shapes.In this study, HSRL data are used from the product provided to the ARM archive by the University of Wisconsin lidar group directed by Ed Eloranta (Eloranta, 2005b), which contains averaged lidar profiles with a height resolution of 30 m and temporal resolution of 60 s.This study uses measurements of particulate backscatter cross section per unit volume, particulate extinction cross section per unit volume, and circular depolarization ratio to help identify signatures of cloud phase.

MMCR
The MMCR is a 35 GHz vertically pointing cloud radar that operated at the ARM Barrow site during the M-PACE campaign.The MMCR measures a spectrum of Doppler veloc- First, we used data from the active remote sensing of clouds (ARSCL) value added product (Clothiaux et al., 2001;Johnson and Jensen, 2009), traditionally the most accessible radar measurements from ARM. ARSCL assesses the quality of the radar measurements, identifies cloud boundaries, and calculates three moments of the radar Doppler spectra: radar reflectivity, mean Doppler velocity, and spectrum width.
If the spectra were Gaussian, three moments would be sufficient to describe their information.However, at the spatiotemporal resolution sampled by the MMCR, spectra are much more complicated when they represent scattering from a heterogeneous mixture of hydrometeors (e.g., liquid and ice particles) in a cloud volume, as illustrated in the schematic in Fig. 1, and the spectra measurements in Fig. 2.
To include some of this additional information, a second more recent ARM radar data processing methodology, Mi-croARSCL (Jensen et al., 2016), was also used in this study.MicroARSCL takes advantage of the significant increase in temporal resolution and the continuous recording of Doppler spectra made possible by upgrades to the MMCR hardware, starting in 2004 (Kollias et al., 2007).MicroARSCL extracts approximately 30, mainly objective, variables directly from the radar Doppler spectrum of each time and range gate, treating the primary peak and a possible weaker secondary to their dynamic range (i.e., height), velocities of their tails, and a left-slope and right-slope, indicating the steepness of a straight line extending from either of the tails to the spectrum peak.Within the primary peak, MicroARSCL identifies up to three local maxima (if present), reporting their dynamic ranges and modal velocities.
The decomposition of radar Doppler spectra into characteristics of their subpeaks potentially offers substantial information about underlying microphysics.However, the MMCR also suffered from the drawback of introducing an artificial subpeak into the Doppler spectrum, often referred to as a spectral image, during stronger power returns.An example of this artifact is shown in weak peaks around −2 m s −1 in Fig. 2a and b.On the positive side, these unavoidable artifacts are weak and well characterized, thus predictable.On the negative side, while having a negligible impact on the lowest radar moments, their influence on the shape of the radar Doppler spectrum can be significant exactly where the often weak signal from small particles (e.g., cloud or drizzle) resides, introducing ambiguity into the detection and characterization of these small particles.Thus, a means of mitigating their effect is required for a study such as ours.In the case of the MMCR, it is known that the spectral images are manifested as a mirror, with opposite velocity sign, of any power in spectral bins exceeding about 30 dB.The strategy used to mitigate their effect is then to use an artificially raised noisefloor when computing higher-order moments, and other variables that are sensitive to their presence.This noise floor is held to be within 30 dB below the peak power in the spectrum.An example of data that have been corrected with this raised noise floor is shown in Fig. 2c.Effectively, this is a tradeoff in which low SNR (signal-to-noise ratio) features are discarded in favor of reliability during strong power return conditions.Sensitivity tests showed that this mitigation only had a substantial impact on one variable (primary peak maximum velocity) used in this study.

Merged data set
The HSRL, ARSCL, and MicroARSCL data sets have different temporal and spatial resolutions and must be merged to a common grid in order to create multi-instrument retrievals.For this study, all data were mapped into a common 10 s time (native ARSCL temporal resolution) and 45 m height (native MicroARSCL) resolution.This time and height grid was chosen in order to have the least impact on the values of the radar and lidar data.In particular, we wanted to change the radar (ARSCL and MicroARSCL) data as little as possible.
The nearest neighbor in time was used to merge the data sets onto the same time grid, so that each profile would remain intact as an individual measurement.That is, for a given time, the height profile with the closest time stamp is chosen.This choice subsamples the MicroARSCL data, since MicroARSCL processing retains the raw 3 s resolution data, but does not change any measurement values.Because the HSRL data used are a 60 s average, the same HSRL profile is assigned to six time stamps.Even though the HSRL data have a higher temporal resolution in raw form, raw lidar data can be quite noisy so averaging is required to improve data quality.
Linear interpolation was used to map the values of each profile to the common height grid.Since the native Mi-croARSCL height was used as the merged data grid, no interpolation is done to the MicroARSCL variables, the data stream we were most interested in preserving without averaging.The ARSCL data and HSRL data have height resolutions of 44 and 30 m respectively, leading to only minor changes when interpolating to a regular 45 m grid.
Once the data sets are merged onto the common grid, an algorithm to identify cloud layers is applied to the backscatter cross section measurements to create a lidar cloud mask (Wang and Sassen, 2001), including eliminating observations when the lidar is attenuated.The lidar mask is combined with the MicroARSCL radar cloud mask to create a merged mask that identifies each cloud point as being detected by lidar, radar, or both.

Identifying cases of known phase
In order to develop a phase detection algorithm with uncertainty estimates, some data of known phase must be available for training the algorithm.Four classifications were chosen in this study because sufficient data for these cases could be identified based on expert knowledge and in situ aircraft measurements.The time and height periods defining test data for each of these cases are given in Table 1, and the details of how they were chosen is described below.Though these categories describe a large fraction of the cloud phases observed during M-PACE, they are not exhaustive, which will be discussed in more detail in Sect. 5.
In situ measurements are particularly important for identifying known mixed-phase cases, with both ice and liquid present in a cloud volume, as these are harder to identify correctly than cloud volumes with a single hydrometeor type.During M-PACE several days of persistent, singlelayer mixed-phase clouds were sampled by the University of North Dakota Citation aircraft (Verlinde et al., 2007).Measurements from multiple cloud probes were merged and processed by McFarquhar and Zhang (2007) using the method described in their paper (McFarquhar et al., 2007).The mixed-phase and snow-training data in this study are obtained from 9 October 2004.This case has been studied extensively with aircraft and ground-based remote sensing instruments indicating a long-lived single-layer mixed-phase cloud with a thin liquid layer top and large ice particles falling out the base of the cloud (Forbes and Ahlgrimm, 2014;Klein et al., 2009;McFarquhar et al., 2007).Sampling differences between aircraft and remote sensing data introduce a significant challenge in using the in situ measurements to provide truth data for the remote sensing retrievals.For example, Fig. 3 shows the flight track of the airplane for the mixed-phase case used in this study, showing that the aircraft does not fly directly over the groundmeasurement station (yellow pin).This is true in all flights during the M-PACE campaign.Thus, the comparison between aircraft and remote sensing data must be done in a statistical sense rather than a direct point-by-point comparison.On 9 October 2004, a statistical comparison is reasonable because measurements were made in a long-lasting, relatively homogeneous cloud (Verlinde et al., 2007).Profiles of ice water mass fraction as a function of atmospheric temperature from the aircraft in situ measurements (Fig. 4a) show that, at temperatures between −14 and −16 • C, the cloud is composed primarily of liquid drops with a small amount of ice present (ice fraction near 0.0).Between −14 and −10 • C the cloud contains a substantial mass of both ice and liquid, with a gradual increase in ice mass fraction as temperature increases (altitude decreases).At roughly −10 • C the cloud has shifted to primarily ice (ice fraction near 1.0).Here ice fraction is calculated as, F ice = IWC/(LWC + IWC), where IWC is ice water content and LWC liquid water content.
Profiles of δ during two different time periods of the aircraft flight (Fig. 4b and c) indicate a shift in δ from ∼ 10 −2 to ∼ 1.0 near −10 • C, which indicates a shift from mixed phase to ice.The large δ at warmer temperatures indicate aspherical hydrometeors typical of snow or ice, whereas the smaller δ at colder temperatures (higher in the cloud) indicate that spherical liquid droplets dominate the lidar signal.This also corresponds with the aircraft measurements of ice fraction showing mixed-phase conditions.At temperatures around −13.5 • C the lidar attenuates and the data higher in the cloud is not used for training the phase algorithm.The lower horizontal line in Fig. 4b and c show a temperature threshold of −9.2 • C. At temperatures warmer than this threshold, after 17:00 UTC, hydrometeors were identified as snow as indicated by the white lines plotted in Fig. 5.This threshold was chosen from visual inspection of the depolarization ratios in Fig. 4, and was set to exclude cases that were not dominated by ice.The radar reflectivity measurements (Fig. 5) are relatively large (−10 to +10 dBZ), which indicates the hydrometeors are large, typical of snow.During the same time period, remote sensing data were considered mixed phase when temperatures were colder than −10 • C. A mix of high and low depolarization ratios are seen for temperatures colder than −10 • C (Fig. 4c), which may indicate the changing mixed-phase conditions that are dominated by ice or liquid as shown in the variability of the aircraft measurements (Fig. 4a).Finally, the time period 00:00-14:00 UTC (shown as a white box in Fig. 5) was also included as mixed phase in order to give sufficient data to train the algorithm.Since the liquid cloud base level occurs at a lower altitude (determined by HSRL extinction) and the depolarization ratios are not as high below the liquid level as later in the day (Fig. 5), no distinction was made between mixed phase and snow in this part of the cloud.
Lidar depolarization measurements are also used to distinguish liquid drops and ice crystals to identify training data cases consisting of all ice or all liquid.We use δ to identify two ice cases to use for training the ice phase for the detection algorithm (Figs. 6 and 7).Both cases have significant depolarization ratios (indicating aspherical hydrometeors with aspect ratios less than 1), extinction coefficients indicative of ice clouds (Young and Vaughan, 2009), and reflectivity values indicating the presence of large particles (Atlas et al., 1995).These factors, along with the macrophysical cloud structure, and the fact that the lidar does not significantly attenuate are typical of cirrus clouds.Additionally, at least part of the cloud on 22 October (Fig. 7) is colder than −38 • C, the temperature where hydrometeors freeze by homogenous freezing mechanisms.
The liquid training case used in the algorithm is displayed in the white box in Fig. 8. Several measurements confirm our decision to identify this as a liquid cloud.First, the lidar backscatter reveals that the lidar beam is completely attenuated before reaching the cloud top, indicative of high drop concentrations (∼ 100 cm −3 ) typically found in Arctic liquid layer clouds (Rangno and Hobbs, 2001;Shupe et al., 2001).Second, the HSRL circular depolarization ratio remains below 0.09, which falls below the δ threshold for identifying liquid layers using lidar measurements.Note the change in δ from < 0.09 to > 1.0 as the below cloud precipitation in Fig. 8 transitions from liquid to ice (or possibly mixed) phase just after 12:00 UTC.Finally, the radar reflectivity during this time period is less than −25 dBZ, indicating hydrometeors are small, also typical of liquid clouds.Aircraft measurements in Arctic non-precipitating liquid clouds show that drop sizes are generally < 9 µm, which correspond to reflectivity less than −20 dBZ (Shupe et al., 2001).

Algorithm description
The cloud-phase identification problem was treated as a multivariate statistics classification problem: classifying volumes within the cloud into one of four possible cloud-phase populations: ice, liquid, mixed phase, and snow.Note that at our current resolution, each day of measurement data can have tens of thousands to a million pixels (depending on the occurrence of clouds during the day).For example, the mixedphase case on 9 October, has about 100 000 pixels of good data.Thus, to create an algorithm that can be run operationally on multiple years of data at multiple sites, we need a solution that is computationally efficient.

Algorithm theory
A simple-to-implement classifier was developed using Bayes' Theorem.At every cloud volume (i.e., at pixel j ), there is a vector of data X j and a discrete random variable M j , which represents the cloud-phase population member- ship at that volume (M j = 1 for "ice," M j = 2 for "liquid," etc.).The likelihood functions of the data f X j | M j are as- sumed to be multivariate normal density functions with different mean vectors (µ i ) and covariance matrices ( i ): where k is the number of variables in x.The uninformative prior probability distribution for M j is π M j = i = p i = 0.25.Thus, the posterior conditional distribution for M j given X j is as described by Anderson (1958, Sect. 6.6).We use robust population parameter estimates in the computation of the posterior probabilities as described in Sect.4.2 below.The algorithm is related to the naïve Bayesian classifier (Domingos and Pazzani, 1997), except we do not assume that the lidar and radar variables are independent.
A phase classification is assigned to a set of observations when the probability of a given phase is 60 % or greater.The 60 % threshold was used instead of choosing the phase with the highest posterior probability to remove cases when two classes had similar probabilities.If all phase likelihoods are smaller than 1 × 10 −20 the algorithm returns the prior probabilities and no phase assignment is made.

Population parameter estimation
Because the population parameters (mean vectors µ and covariance matrices ) are unknown, they must be estimated from measurements of known phase.The parameters of the four populations were robustly estimated using the training data described in Sect.3. Robust estimators were used to account for possible errors in the identification of the training data since robust estimators will downweight the influence of the incorrectly labeled data as long as most of the data are correctly labeled.The population means were calculated using trimmed means, that is, trimming 15 % of the data from both extremes.The covariance matrices were robustly estimated using the method described by Croux et al. (2007) and implemented in R's pcaPP package (covPCAproj) by Heinrich Fritz and Peter Filzmoser (P.Filzmoser@tuwien.ac.at).

Variables included in parameter estimation
Two algorithms were created using two distinct collections of input variables.These variables and their originating data sets are listed in Table 2, and probability distribution functions of their values are plotted in Fig. 9.Note that temperature is not included in the retrieval algorithm in order to be able to study the statistical relationship between temperature and cloud phase.The 5-variable algorithm uses the three radar moments available in the ARSCL data set along with the attenuated backscatter and circular depolarization ratio from the HSRL, comparable to the information used in the phase-identification algorithm described by Shupe (2007).
The 10-variable algorithm includes 8 variables (see Table 2) from the MicroARSCL processing of the radar Doppler spectra along with the particulate extinction cross section and circular depolarization ratio from the HSRL.While additional radar Doppler spectra variables are available in the MicroARSCL processing, we chose these variables because they each gave some separation between the four cloud-phase populations as can be seen in their probability distributions (Fig. 9).The distributions show that normal distributions are a reasonable approximation for the variables used in these algorithms.The logarithm of the lidar variables was taken in order to make the distributions of these variables more normally distributed.The population parameters were only trained when all variable data were available, though can be applied to any subset of available variables.This assumes that missing variables are randomly occurring and are not in themselves dependent on the cloud phase.
Figures 10-13 show the application of the 5-and 10-variable algorithms to the test cases.Posterior probabilities are plotted for each phase category in gray scale in percentages.A phase identification that uses a threshold of 60 % probability to define a phase category is plotted in color.

Cross-validation results
Cross-validation was used to test the accuracy of the algorithm.Half of the complete phase data set was randomly chosen to train the algorithm, and the other half was reserved to test how well the algorithm performed.Both the 10-variable (Table 3a) and 5-variable (Table 3b) algorithms identify the pure ice and liquid cases well with over 94 % of liquid data identified correctly and 96 % of ice data identified correctly, indicating very distinct signatures of pure liquid or cirrus type ice clouds.This clear identification of ice and liquid cases also corresponds to very high posterior probabilities of phase identification as can be seen in Figs.10-12, indicating a high degree of confidence in the ability of the algorithm to perform in these conditions.It is more difficult for the algorithms, however, to distinguish between the mixed-phase and snow cases on 9 October.The 10-variable algorithm identifies about 88 % of mixed and snow data as it was defined in the validation/training data set (Table 3a).About 3.5 % of the mixed-phase data are identified as liquid, which from Fig. 13 appears to be either at the top of the cloud or in bands such as that around 03:00 UTC that may be drizzle, or liquid sections of the cloud.These few liquid cases may indicate that the validation classification is incorrect rather than the phase algorithm, though this cannot be determined.The remaining misclassification of snow and mixed-phase cases reflects the uncertainty in distinguishing these two categories in the remote sensing data, as well as the difficulty in identification of mixed-phase and snow data in the validation data as described in Sect.4.1.This uncertainty is largely captured in the variable posterior probabilities of mixed and snow identification over this day (note the speckled pattern in Fig. 13).The 5-variable phase algorithm shows similar results on 9 October, with 91 % of mixed-phase and snow cases identified correctly, a small fraction of data identified as liquid, and 6-8 % of data misclassified (Table 3b).
The difference in the accuracy of the 5-and 10-variable phase algorithms is seen primarily in cases when HSRL data are not available.Comparing Table 3a and b with Table 3c and d shows that identification of ice and liquid cloud volumes does not significantly depend on the availability of lidar data, the percentage of points identified correctly does not drop when only examining data when the lidar is present.However, in mixed and snow cases the validation percent-  tions and models or using the classification in microphysical cloud property retrievals.Second, additional information was included from the MicroARSCL processing of the radar Doppler spectra.By and large, the three radar variables used in the Shupe (2007) algorithm contained sufficient information to classify liquid and ice phase clouds, but had difficulty in distinguishing mixed-phase and snow categories when lidar data were unavailable.Additional variables from the Mi-croARSCL processing of the radar Doppler spectra improved the ability of the algorithm to distinguish phase-in-cloud volumes without lidar measurements.
This phase classification method has a number of strengths.It is a relatively simple algorithm that is easy to parallelize and run operationally.The method is also very useful for understanding the sensitivity of the results to data input into the algorithm as is seen in the comparison of results using 5-or 10-variable inputs and statistics when lidar measurements are or are not available.It includes the information from the covariance between multiple variables in a seamless way that can handle missing input variables.As long as that missing data are random with respect to phase, the loss of information content is reflected in the phase probability estimates.Since the mean and covariance matrices are trained with operational data, the algorithm also inherently includes random measurement uncertainties in the posterior probability estimates.
The primary limitation of this classification method is that it is only as good as the accuracy and representativeness of the data used to train it.This is of course a primary limitation in any attempt to estimate measurement or retrieval uncertainty, because without a valid standard of truth, there is no way to define uncertainty.This is a particularly difficult problem in cloud remote sensing retrievals.Aircraft in L. D. Riihimaki et al.: Cloud phase from active sensors situ measurements are the typical truth data set used to validate remote sensing retrievals, but the challenges associated with collocation of aircraft and remote sensing data as well as sampling issues are not trivial.The proof of concept algorithm shown in this study uses training data for liquid, ice, mixed-phase, and snow cases that could fairly reliably be identified from a combination of aircraft in situ measurements and expert interpretation of remote sensing data.The 4 days used in this study are not representative of all hydrometeor conditions encountered at the Barrow site, however, and to create an operational retrieval additional categories and training data sets are needed.For example, one short period that was identified in the literature as a super-cooled drizzle case (Verlinde et al., 2013) was examined.The 10-variable algorithm identified that period either as liquid or no solution.These were reasonable results given the categories used to train the algorithm, and examination of observational values suggested that drizzle could be retrieved reliably if sufficient training data for known drizzle cases were available.
What is most needed to improve this retrieval algorithm is additional training data that are representative of the span of cloud conditions seen in the atmosphere.The ARM Airborne Carbon Measurements (ARM-ACME-V) field campaign based out of Deadhorse, AK, will have aircraft flights focused over Oliktok Point, AK, where an ARM Mobile Facility (AMF-3) is located, and Barrow, AK, where a fixed ARM site is located, will fly transects between the two locations and create an opportunity to collect more cloud condition data.ACME-V will occur during the summer of 2015 and will provide routine (2-3 flights per week) aircraft in situ cloud measurements over a 3-month period with many over-flights of the two ground sites that will sample the same clouds with lidar and radar instruments.This field experiment has the potential to provide an extensive truth data set to better train and evaluate the cloud-phase algorithm under a wider range of cloud conditions.
In addition, future development is planned to test and train the algorithm at different sites in the mid-latitudes and tropics that may include higher updraft speeds and turbulence, and thus impact the values of radar Doppler spectra variables associated with hydrometeor classes.When a more complete training data set is available that describes the full space of potential atmospheric conditions, sensitivity tests will be performed to evaluate the optimal set of radar Doppler spectra variables that give sufficient accuracy at the lowest computational cost.

Data availability
All data in this paper are available at the ARM archive, as cited in the references (Eloranta, 2005b;McFarquhar and Zhang, 2007;Johnson and Jensen, 2009;Jensen et al., 2016).

Figure 1 .
Figure 1.Schematic of Doppler spectrum and MicroARSCL variables.Variables describing the primary peak ( * _pri) and a secondary peak ( * _sec) are labeled in the diagram.The variables used in this study are shown in red.Each radar time and height bin measured by the vertically pointing instrument returns a full spectrum of Doppler velocities.

Figure 2 .
Figure 2. Doppler spectra example from 9 October 2004 09:00-10:00 UTC for the 465 m height range gate (a).The data are also plotted in 3-D (b) and with the raised noise floor to correct for spectral image artifacts in (c).

Figure 3 .
Figure 3. Flight track for UND Citation on 9 October 21:46-22:07 UTC during M-PACE campaign.ARM ground-measurement site marked with the yellow pin.

Figure 4 .
Figure 4. Profiles of (a) ice water fraction from aircraft, HSRL depolarization ratios for (b) the time of the in situ measurements and (c) for a longer period of time.Times listed are in UTC.Horizontal lines in (b) and (c) show the temperature thresholds used to divide data between snow and mixed phase, as well as indicating the region where the HSRL is attenuated and no longer gives a reliable signal.

Figure 5 .
Figure 5. On 9 October 2004, vertical profile data from MMCR reflectivity, HSRL extinction, and HSRL circular depolarization ratio.Periods identified as snow, and mixed phase are outlined in white.

Figure 8 .
Figure 8.As in Fig. 5, but for 4 October 2004.White outlined area shows liquid cloud.

Figure 9 .
Figure 9. Histograms of algorithm training data set input variable values for four phase categories.Note that temperature is not included in the retrieval algorithm, but is included here for reference.Most clouds examined in this study are in the potentially mixedphase temperature range (−40-0 • C).

Figure 10 .
Figure 10.Application of 5-variable (left) and 10-variable (right) phase algorithms to 1 October 2004 test case.Color panels show algorithm identification when probability of a given phase is greater than 60 %.Gray-scale figures show probabilities of a given phase in percentage.

Table 2 .
Lists variables used in the 5-and 10-variable algorithms.All variables from the MicroARSCL data set refer to the primary peak detected in the Doppler velocity spectrum.

Table 3 .
Cross-validation results given in percentage of validation data in a given class.(a) 10 variable: complete testing data results.(27.9 % of data).(b) 5 variable: complete data (52.2 % of data).(c) 10 variable: the no lidar data results.(71.8 % of the incomplete data).(d) 5 variable: no lidar data (92.7 % of the incomplete data).