Bile proteomic profiles differentiate cholangiocarcinoma from primary sclerosing cholangitis and choledocholithiasis

Early detection of malignant biliary tract diseases, especially cholangiocarcinoma (CC) in patients with primary sclerosing cholangitis (PSC), is very difficult and often comes too late to give the patient a therapeutic benefit. We hypothesize that bile proteomic analysis distinguishes CC from nonmalignant lesions. We used capillary electrophoresis mass spectrometry (CE‐MS) to identify disease‐specific peptide patterns in patients with choledocholithiasis (n = 16), PSC (n = 18), and CC (n = 16) in a training set. A model for differentiation of choledocholithiasis from PSC and CC (PSC/CC model) and another model distinguishing CC from PSC (CC model) were subsequently validated in independent cohorts (choledocholithiasis [n = 14], PSC [n = 18] and CC [n = 25]). Peptides were characterized by sequencing. Application of the PSC/CC model in the independent test cohort resulted in correct exclusion of 12/14 bile samples from patients with choledocholithiasis and identification of 40/43 patients with PSC or CC (86% specificity, 93% sensitivity). The corresponding receiver operating characteristic (ROC) analysis revealed an area under the curve (AUC) of 0.93 (95% confidence interval [CI]: 0.82‐0.98, P = 0.0001). The CC model succeeded in an accurate detection of 14/18 bile samples from patients with PSC and 21/25 samples with CC (78% specificity, 84% sensitivity) in the independent cohort, resulting in an AUC value of 0.87 (95% CI: 0.73‐0.95, P = 0.0001) in ROC analysis. Eight out of 10 samples of patients with CC complicating PSC were identified. Conclusion: Bile proteomic analysis discriminates benign conditions from CC accurately. This method may become a diagnostic tool in future as it offers a new possibility to diagnose malignant bile duct disease and thus enables efficient therapy particularly in patients with PSC. (HEPATOLOGY 2010;)

C holangiocarcinoma (CC) is the second most common hepatobiliary cancer and arises from cholangiocytes of the intra-and extrahepatic biliary tract. 1 Unfortunately, CC is often detected in an unresectable stage and hence it is associated with a poor prognosis and a life expectancy of 6-12 months. 2 The diagnosis of CC is based on a combination of imaging techniques and tissue sampling. Tumor markers, like serum carbohydrate antigen  and carcinoembryonic antigen (CEA), have a low specificity and sensitivity (<80%), which limit their potential to differentiate between benign and malignant bile duct stenosis. 1,3 Magnetic resonance cholangiopancreaticography and endoscopic retrograde cholangiopancreaticography (ERCP) provide imaging tools leading to a diagnostic accuracy of only 70%-80%. 4,5 Despite being more invasive, ERCP allows tissue sampling by brush cytology or biopsy of the suspected biliary stenosis. Unfortunately, ERCP-guided biopsies and brush cytology can provide a diagnosis in only 36%-46% of cases, although expert groups achieved a remarkably higher yield in diagnosing dysplasia and CC in patients with primary sclerosing cholangitis (PSC). 6,7 The main predisposing factor for the development of CC in Western countries is PSC. 3,8 PSC is associated with a >160-fold increased risk to develop hepatobiliary malignancies. 3,9 In patients with PSC, the differentiation between benign and malignant strictures is particularly difficult, because CC as well as chronic or acute inflammation frequently result in similar cholangiographic findings. 10 An interesting approach for the early detection of CC in PSC is the identification of markers in bile, as the development of carcinoma takes place at the biliary epithelium and tumor-related proteins become detectable in bile rather than serum. The measurement of established serum biomarkers in bile has been performed, but is only of limited clinical value. 11,12 A novel concept to detect CC is the analysis of protein patterns instead of focusing on single proteins. In a small number of preliminary descriptive studies proteomic analyses with only a few samples were performed to evaluate the role of proteomics in bile without developing disease-specific models. [13][14][15][16][17][18] Proteomic analysis may be a valuable tool in the discovery and differentiation of various diseases, but has not been clinically evaluated in hepatobiliary diseases. 19,20 We hypothesize that proteomic analysis of bile can be applied to distinguish CC from nonmalignant biliary tract pathologies.

Patients and Methods
Patients. Bile samples were collected at the gastrointestinal endoscopy unit of the Hannover Medical School, Germany. We performed 142 endoscopic procedures and bile aspiration was successful in 75% of cases. From 94 consecutive patients included in the study, bile samples were successfully collected during 107 interventions (102 ERCs, five percutaneous transhepatic cholangiographies [PTC]). Indications for cholangiographic interventions were: PSC, CC, and choledocholithiasis. Ten patients developed CC in addition to PSC. Clinical cholangitis was present preintervention in one patient with choledocholithiasis, in three patients with PSC, and in two patients with CC. We defined cholangitis as the presence of fever, elevated C-reactive protein (CRP), and alkaline phosphatase (AP). Antibiotic treatment before intervention was initiated in 30 out of 107 endoscopic procedures (seven choledocholithiasis, eight PSC, 15 CC). We performed microbiological bile analysis in 93/107 examinations and bile remained sterile in only 15% of cases (1/30 choledocholithiasis, 7/36 PSC, 8/41 CC). Coexistent bacterial infection (bacteriobilia with fever, elevated CRP, AP, and gamma-GT) was present in six patients. Detailed patients characteristics and laboratory data are given in Table 1.
The diagnosis of PSC was based on typical cholangiographic findings such as strictures or irregularity of intrahepatic or extrahepatic bile ducts after exclusion of secondary causes for sclerosing cholangitis. CC was proven histologically in 35 out of 38 patients. In three patients a definite histology could not be obtained, but clinical, laboratory, radiological, and ERCP findings were consistent with a diagnosis of CC. None of the patients with CC received chemotherapy before the cholangiographic intervention. The diagnosis of choledocholithiasis was based on ultrasound and/or endoscopic ultrasound and confirmed by ERC. The trial was approved by the local ethical committee of Hannover Medical School and written informed consent was obtained from all patients.
Collection of Bile. Bile collection was performed as previously described. 21 In brief, bile was aspirated by placing a 5F standard ERC catheter (without previous flushing) into the bile duct before contrast dye injection. Approximately 0.5 to 6 mL of bile (mean 2 mL) were collected and transferred into a sterile tube. In five patients (four patients with CC and one patient with choledocholithiasis) bile was aspirated during PTC directly after sonographic-guided percutaneous puncture of the bile ducts and before contrast injection. Bile samples were directly frozen at À80 C and were thawed only once just before proteomic analysis. Bile samples were diluted in H 2 O to a final protein concentration of 1 mg/mL, as verified with the bicinchoninic acid assay (Interchim, Montlucon, France).
Bile Sample Preparation. For CE-MS analysis, 0.7 mL diluted bile was added to 0.7 mL n-butanol/ iso-proyl ether 4:6 (v/v) and centrifuged for 10 minutes at 14,000 rpm and 4 C. The lower aqueous phase was extracted and diluted with 0.5 mL of 8 M urea, followed by 1 mL H 2 O, and passed over a 10 kDa MWCO Centrisart ultrafilter (Sartorius, Goettingen, Germany) at 3,000 rpm until 1.4 mL filtrate was obtained. The filtrate was desalted on a PD-10 column (GE Healthcare, München, Germany) preequilibrated in 0.01% aqueous NH 4 OH (Roth, Karlsruhe, Germany). After elution with ammonium buffer, the sample was lyophilized, stored at 4 C, and resuspended in CE-MS running buffer containing 20% acetonitrile and 1% formic acid before analysis.
Capillary Electrophoresis Mass Spectrometry Analysis. CE-MS analysis was performed as described using a P/ACE MDQ capillary electrophoresis system (Beckman Coulter, Fullerton, CA) on-line coupled to a Micro-TOF MS (Bruker Daltonic, Bremen, Germany). 19,22 The ESI sprayer (Agilent Technologies, Palo Alto, CA) was grounded, and the ion spray interface potential was set between À4.0 and À4.5 kV. Data acquisition and MS acquisition methods were automatically controlled by the CE via contact-close-relays. Spectra were accumulated every 3 seconds over a range of m/z 350 to 3,000. Details regarding accuracy, precision, selectivity, sensitivity, reproducibility, and stability of the CE-MS method have been described. 19 Data Processing. Mass spectral ion peaks representing identical molecules at different charge states were deconvoluted into single masses using MosaiquesVisu software. 23 Only signals were included with a charge >1 observed in a minimum of three consecutive spectra and with signal-to-noise ratios >4. 24 The software employs probabilistic clustering and uses isotopic distribution and conjugated masses for charge-state determination of peptides/proteins. The resulting peak list characterizes each peptide by its molecular mass, CE-migration time, and ion signal intensity (amplitude). Because these parameters are influenced by the amount of salt and peptides in the sample, comparison of peptide spectra requires normalization. CE migration time and MS-detected mass were normalized by the definition of 339 clusters of peptides covering a range of 19.39 to 37.93 minutes in CEmigration time and 0.830 to 6.456 kDa in molecular mass. Amplitude calibration was based on 38 peptides with >60% abundance, >100 counts ion signal intensity above baseline, and <130% amplitude deviation.
Detected peptides were deposited, matched, and annotated in a Microsoft SQL database, allowing comparison of multiple samples (patient groups). Peptides were considered identical within different samples when mass deviation was lower than 50 ppm for small peptides and 75 ppm for large peptides and proteins. In the data clustering process, analyte diffusion was compensated by linearly increasing cluster widths over the entire electropherogram (19-45 minutes) from 2%-5%. After calibration, deviation of migration time had to be below 0.35 minutes.
Statistical Analysis. Sensitivity, specificity, and 95% confidence intervals (95% CI) were calculated based on receiver operating characteristic (ROC) analysis (MedCalc Software, Belgium). 25 ROC plots were obtained by plotting all sensitivity values (true-positive fraction) on the y axis against their equivalent (1-specificity) values (falsepositive fraction) for all available thresholds on the x axis. The area under the ROC curve (AUC) was evaluated, as it provides the single best measure of overall accuracy independent of any threshold. 25 For biomarker discovery, P-values were calculated using the natural-logarithm transformed intensities and the Wilcoxon rank sum test.
Classification. Disease-type specific peptide marker models were generated using the Support Vector Machine (SVM)-based MosaCluster software. 19 Sample Abbreviations: AP, alkaline phosphatase; y-GT, gamma glutamyltransferase; AST, aspartate aminotransferase; ALT, alanine aminotransferase; CRP, C-reactive protein; carbohydrate antigen, CA 19-9; LDH, Lactate dehydrogenase. classification was performed by determining the Euclidian distance of a particular dataset to the maximal margin of the SVM hyperplane and assignment of a positive or negative value depending on which side of the hyperplane, case or control, the data point was located.
Sequencing of Peptides. Samples were stage tippurified using Empore Disk C18 as described. 26 The peptides were analyzed by reversed phase chromatography-tandem MS using an LTQ Orbitrap XL (Thermo, Bremen, Germany) coupled to an Agilent 1200 nanoflow-HPLC (high-performance liquid chromatography) (Agilent, Waldbronn, Germany). HPLC-column tips (fused silica) with 75 lm inner diameter (New Objective, Woburn, MA) were self-packed with Reprosil-Pur 120 ODS-3 (Dr. Maisch, Ammerbuch, Germany) to a length of 20 cm. 27 Samples were applied directly onto the column without precolumn. The peptides were injected onto the separation column with a linear 140 minutes gradient from 2%-80% B (0.5% acetic acid in 80% acetonitrile [LC-MS grade, Wako, Germany]) in solvent A (0.5% acetic acid [LGC Promochem, Wesel, Germany] in ddH 2 O). The flow rate was 250 nl/min for operation and 500 nl/min for sample application. The mass spectrometer was operated in the datadependent mode and switched automatically between MS (maximum 1 Â 10 6 ions, mass range m/z ¼ 350 to 2,000, resolution 60,000) and MS/MS. Each MS scan was followed by a maximum of five MS/MS scans in the linear ion trap (collision energy 35%, target value 30,000). Singly charged parent ions and unassigned charge states were excluded for fragmentation. MS parameters were 2.3 kV spray voltage, no sheath, and auxiliary gas flow and 125 C ion-transfer tube temperature.
Individual MS/MS spectra were searched against the IPI human database using the Proteome Discoverer 1.1.0.263 software and an in-house Mascot server (parent mass deviation 10 ppm, fragment ion mass deviation 0.6 Da, decoy database search activated: strict false-discovery rate [FDR] 0.01, relaxed FDR 0.05). An additional search was employed against the NCBI human nonredundant database using the Open Mass Spectrometry Search Algorithm.

Results
Biomarker Discovery. CE-MS measurements revealed high variability in the composition of the low molecular weight proteome in the range of 0.8 to 10 kDa. An average of 1,680 peptides (minimum 469, maximum 3,309) was detected in the 0.8-10 kDa mass range of 1 mg/mL-diluted bile by CE-MS. This high variability in peptide composition necessitates normalization of peptide amplitudes as described in Patients and Methods.
A training set of 50 samples from choledocholithiasis (n ¼ 16), PSC (n ¼ 18), and CC (n ¼ 16) patients was used for the identification of differentially expressed peptides (Fig. 1). We evaluated the data with respect to marker candidates with Wilcoxon P-values < 0.05. This resulted in a list of 83 peptides for the differentiation of PSC/CC from choledocholithiasis and 90 for the differentiation of PSC from CC. On the basis of the two sets of preselected candidate peptide markers, peptide patterns were established to differentiate PSC and CC from choledocholithiasis (PSC/CC model), and in another model to distinguish PSC from CC (CC model). These two models were chosen to construct independent classification schemes for discrimination of sclerosing/malignant lesions from gallstones and of CC from PSC.
PSC/CC Model to Differentiate Gallstones from PSC or CC. The PSC/CC model was constructed by selection of 18 out of the 83 PSC/CC peptide marker candidates (Table 2), yielding best classification performance on the training set. This PSC/CC model differentiates PSC and CC from choledocholithiasis with an AUC of 0.90 (95% CI: 0.79-0.97, P ¼ 0.0001) in ROC analysis after total cross-validation of training set data (not shown). In Fig. 2A (Table 3) was defined with an AUC of 1.0 (95% CI: 0.9-1.0, P < 0.0001) on the training set after cross-validation. Figure 3A displays the compiled CE-MS profiles of the peptides in the CC model for the PSC and CC training set.
Applied to the independent set of samples, the CC model exhibited an AUC of 0.87 (95% CI: 0.73-0.95, P ¼ 0.0001) in ROC analysis (Fig. 3B) and was able to identify 14 from 18 PSC and 21 from 25 CC patients correctly at 0.008 as the cutoff (78% specificity, 84% sensitivity). Most notably, 8 out of 10 bile samples of patients with CC on top of PSC scored positive for the CC pattern.
Classification at different timepoints in a minimum and maximum time range between the cholangiography dates of 1 week and 22 months, respectively, resulted in repeated correct classification in 7 of 9 cases . For all sequence-identified peptides, the amino acid sequence, the name of the protein precursor and the amino acid positions within the protein's primary sequence (according to the UniProt Knowledge Base) are presented. For accurate sequence annotation, only sequences were accepted with a mass tolerance of less than 10 ppm in MS/MS. In addition, the frequency and the mean amplitude in the choledocholithiasis control and in the PSC/CC case group of the training cohort, as well as the P and AUC value of the intergroup comparison of peptide distribution is provided.
(Supporting Information Table 2). Classification stability of both models was tested by evaluation of three independent measurements of one sample from a patient with CC, one patient with PSC, and one patient with choledocholithiasis. As presented in Supporting Information Table 3, the classification results were not influenced by the number of detected peptides.
Peptide Sequencing Using Tandem Mass Spectrometry. To characterize biliary peptides with respect to their amino acid sequence, MS/MS peptide sequencing was applied. 28,29 The majority of sequence-identified peptides are fragments of hemoglobin alpha and beta chains (Supporting Information Table 1A), followed by peptides of serum albumin, pancreatic triacylglycerol lipase, and cytoplasmic actin 1 (at least 10 different peptides). Other peptides were assigned to structural proteins, i.e., keratins, histones, proteins involved in proteolysis and degradation of lipids and polysaccharides, i.e. proteasome subunits, carboxypeptidases, trypsin, alpha-amylase, bile salt-activated lipase, as well as proteins involved in immune responses, i.e., complement factors, immunoglobulins. A detailed list of the detected precursor proteins is provided in Supporting Information Table 1B. Correlation of obtained sequence data with the biomarker candidates included in both models revealed differential regulation or proteolysis of hemoglobin alpha and beta chains, cytoplasmic actin, keratins, 14-3-3 zeta/delta, and inter-alphatrypsin inhibitor heavy chains (Tables 2, 3).

Discussion
The differentiation between benign and malignant bile duct diseases, particularly strictures, is a very demanding challenge even for specialists in this field.
This study shows, for the first time, that proteomic analysis of bile is able to differentiate malignant bile duct diseases from benign lesions and may become a diagnostic and screening tool in the future.
The analysis of bile to diagnose CC is of particular interest as tumor cells might release and/or shed proteins directly into the bile. Therefore, bile may contain higher levels of secreted or shed markers than serum. 30 Although bile is not easily accessible, it has been shown that bile aspiration during ERC is successful in over 70% of examinations. 21 However, although promising, none of the single markers found in bile has found its way into clinical routine so far. 31,32 A novel approach is the simultaneous analysis of a set of markers that form a specific pattern. The potential of proteomic analysis is obvious: pathological alterations in any organ will result in changes in extracellular proteins. Because such extracellular proteins can be found in the ambient body fluids, proteins in such fluids contain substantial information on the status of organs at the time of sampling. Although blood from first sight appears to be the most appropriate material, its use is associated with several difficulties. The 10 most abundant proteins . For all sequence-identified peptides, the amino acid sequence, the name of the protein precursor and the amino acid positions within the protein's primary sequence (according to UniProtKB) are presented. For accurate sequence annotation, only sequences were accepted with a mass tolerance of less than 10 ppm in MS/MS. In addition, the frequency and the mean amplitude in the PSC control and in the CC case group of the training cohort, as well as the P and AUC value of the intergroup comparison of peptide distribution is provided.
in blood account for >90% of the total protein content. 33 These most abundant proteins contain little information regarding the status of an organ 33 and greatly inhibit the accurate detection of less abundant proteins that potentially contain more information. 33 A few preliminary proteomic bile analyses in single patients suggested that this method could be feasible. 13,14,[16][17][18]30,34 Most of the studies applied sodium dodecylsulfate polyacrylamide gel electrophoresis (SDS-PAGE) and liquid chromatography-mass spectrometry (LC-MS). In our study, capillary electrophoresis (CE) coupled to mass spectrometry (MS, CE-MS) was used for the first time to study proteomic profiles in bile.
Although CE and LC possess similar resolution characteristics, the absence of a sieving matrix in CE provides several advantages over LC. No buffer gradients that require ramping of ionization parameters are needed to ensure stable flow; sample migration can be controlled by varying the electric field strength. 35 The differentiation from PSC and CC was based on a peptide model of 22 peptides. Twelve of those were identified by sequencing and are fragments of hemoglobin subunits, serum albumin, cytoplasmic actin, keratins, inter-alpha-trypsin inhibitor heavy chains, and the 14-3-3 zeta/delta protein (for details, see Table 3). Expression profiles of these peptide markers in PSC and CC patients may reflect changes in molecular pathways involved in proteolysis/protein catabolism, inflammation, apoptosis, and epithelial cell transformation. [36][37][38] The increased abundance of a 14-3-3 zeta/ delta protein derived peptide fragment (of which the annotated mass spectrum is presented in Supporting Information Fig. 1) in patients with CC is of special interest, because 14-3-3 proteins are involved in many cellular processes, i.e., actin cytoskeletal organization, mitogenesis, cell adhesion, and apoptosis prevention. Hermeking 39 described that cancer-associated downregulation or loss of 14-3-3 proteins leads to an increased or unscheduled cell-cycle progression. In connection with decreased levels of peptides from cytokeratins and increased levels of actin, 14-3-3 may also be implicated in epithelial-mesenchymal transition, 40,41 with the latter recently associated with CC and invasive CC tumor growth. 42,43 However, further analyses have to be performed to investigate the role of 14-3-3 proteins in the pathogenesis of CC.
A disease-specific biomarker panel to identify different bile-related diseases has never been evaluated so far. Our study focused on clinically diagnostic challenges such as the detection of malignant (CC) and premalignant (PSC) biliary lesions. First, we developed a marker model (PSC/CC model) comparing a PSC and CC cohort with a choledocholithiasis cohort. For biomarker discovery, the latter was chosen as a control group because the risk of postprocedural pancreatitis or cholangitis ethically bans ERC from application in healthy subjects. However, as the control group consists of patients with choledocholithiasis, proteomic analysis may reflect the difference between a relatively normal biliary tree and liver, and an inflamed, cholestatic liver, which can be expected in patients with PSC and patients with a dominant stenosis due to CC. The PSC/CC model was able to distinguish CC and PSC from nonmalignant lesions with an AUC of 0.93 (P ¼ 0.0001), a sensitivity of 93%, and a specificity of 86% as validated in an independent cohort. These findings are of clinical significance, as in patients with suspected CC or PSC endoscopic procedures will be performed and thus bile becomes accessible. Furthermore, even in the presence of large masses suspicious of CC, a definite diagnosis often cannot be made.
The surveillance of patients with PSC is of crucial importance, as those patients have an increased risk to develop CC and curative treatment such as liver transplantation or radical resection can be performed only at an early stage. Therefore, our aim was to distinguish PSC from CC in a second model. This model was established using a training set consisting of 18 patients with PSC and 16 with CC. Applied to an independent validation set (18 PSC, 25 CC) it showed an AUC of 0.87, a specificity of 78%, and a high sensitivity of 84%. Our findings indicate a possible role of proteomic analysis for surveillance in patients with PSC. Nevertheless, PSCassociated CC may be of different origin than sporadic cholangiocarcinoma. Ten patients within the CC group developed CC in addition to PSC. Eight of those patients were identified positive for CC by proteomic analysis.
This proteomic model reaches a high sensitivity compared to single biochemical markers. Direct comparison with the widely used CA 19-9 tumor marker is impossible, because previous studies used different cutoff values in various study populations, leading to enormous range of sensitivity (53%-92%) and specificity (50%-98%). 44 Our proposed model may be of clinical relevance in diagnosing CC in patients with PSC especially if supplementary to other diagnostic methods, as a higher accuracy can be reached by a combination of different diagnostic tools. 45 All in all, proteomic analysis of bile as a diagnostic tool for surveillance of patients with PSC alone or in combination with other methods may provide an early and reliable diagnosis of CC.
In summary, our data indicate a possible role of proteomic analysis of bile to differentiate CC from PSC and benign lesions. Due to the cross-sectional type of the study with histologically and clinically well-defined CC, no information exists as to what timepoint the CC model is able to detect CC in addition to PSC during its evolution in the PSC patient. This important issue must be addressed in a future prospective multicenter clinical trial by inclusion of patients with PSC in whom ERCP is performed regularly to treat strictures and with a clinical follow-up of 1 year for CC diagnosis as requirement for reliability of the longitudinal analysis.