Electrostatic Modifications of the Human Leukocyte Antigen-DR P9 Peptide-Binding Pocket and Susceptibility to Primary Sclerosing Cholangitis

The strongest genetic risk factors for primary sclerosing cholangitis (PSC) are found in the human leukocyte antigen (HLA) complex at chromosome 6p21. Genes in the HLA class II region encode molecules that present antigen to T lymphocytes. Polymorphisms in these genes are associated with most autoimmune diseases, most likely because they contribute to the specificity of immune responses. The aim of this study was to analyze the structure and electrostatic properties of the peptide-binding groove of HLA-DR in relation to PSC. Thus, four-digit resolution HLA-DRB1 genotyping was performed in 356 PSC patients and 366 healthy controls. Sequence information was used to assign which amino acids were encoded at all polymorphic positions. In stepwise logistic regressions, variations at residues 37 and 86 were independently associated with PSC (P = 1.2 × 10−32 and P = 1.8 × 10−22 in single-residue models, respectively). Three-dimensional modeling was performed to explore the effect of these key residues on the HLA-DR molecule. This analysis indicated that residue 37 was a major determinant of the electrostatic properties of pocket P9 of the peptide-binding groove. Asparagine at residue 37, which was associated with PSC, induced a positive charge in pocket P9. Tyrosine, which protected against PSC, induced a negative charge in this pocket. Consistent with the statistical observations, variation at residue 86 also indirectly influenced the electrostatic properties of this pocket. DRB1*13:01, which was PSC-associated, had a positive P9 pocket and DRB1*13:02, protective against PSC, had a negative P9 pocket. Conclusion: The results suggest that in patients with PSC, residues 37 and 86 of the HLA-DRβ chain critically influence the electrostatic properties of pocket P9 and thereby the range of peptides presented. (Hepatology 2011;53:1967-1976)

As for most HLA-associated diseases, a multitude of HLA class I and class II gene associations have been reported in PSC, most consistently for alleles that are components of the extended ancestral haplotypes AH8.1 (i.e., HLA-B*08-DRB1*03 [serological DR3]) and AH7.1 (i.e., HLA-B*07-DRB1*15 [serological DR2]), along with various less conserved HLA class II haplotypes, namely, DRB1*13:01, DRB1*04, and DRB1*07. [3][4][5][6] In genome-wide association studies, 2,7 strong associations near HLA-C, HLA-B, and MICA suggest a role for these loci in modifying PSC risk. The mechanism could involve an effect of alleles carried by the AH8.1 and AH7.1 haplotypes on the activation level of natural killer cells and T cells. [8][9][10][11] However, associations detected for HLA class II haplotypes appear to have a significant influence on PSC, in addition to the effect of HLA class I. 7 The class II genes encode heterodimers consisting of an a and a b chain (e.g., the HLA-DR molecule is encoded by HLA-DRA and HLA-DRB1) which present peptides to CD4-positive T cells. The sequences encoded by the second exon of class II genes determine the properties of the peptide-binding groove. In several autoimmune diseases HLA class II associations have been attributed to particular amino acids in the molecule that critically determine the binding of disease-specific antigen(s). One example is the protective effect in type 1 diabetes of HLA-DQb1 chains with aspartic acid in residue 57, 12 which induces distinct characteristics of the peptide-binding groove of the HLA-DQ molecule. 13 Determination of the structural and electrostatic properties of the molecules associated with disease may help in identifying the disease mechanism. In primary biliary cirrhosis and autoimmune hepatitis, specific residues have been suggested to explain associations with HLA-DRB1 alleles. 14,15 In PSC, an association with leucine in residue 38 of the HLA-DRb chain was proposed by Farrant et al., 16 whereas a later study considered residues 55 and 87 of the HLA-DQb chain as more likely candidates. 3 A consistent peptide-binding motif for the class II molecules associated with PSC has not been defined, and no attempts have been made to model how specific amino acids affect the structure and the electrostatic properties of the peptide-binding groove.
The portal inflammation in PSC livers is dominated by T cells, which seem to exhibit a restricted T-cell re-ceptor repertoire. 17 It would be of importance to identify characteristics of the HLA molecules that determine the specificity of these T-cell responses. Strong LD in the HLA class II region makes it difficult to determine at the genetic level which loci are most relevant. However, several minor observations suggest that HLA-DRB1 could be the determinant of PSC risk; (1) The HLA-DQA1 and DQB1 alleles encoded on the AH8.1 haplotype are associated with PSC only on this haplotype and not when encoded on different haplotypes. 4 (2) The protective DRB1*04 haplotypes may carry different DQB1 alleles. 4 (3) A recent study in African-Americans confirms the association with DR13, 18 which in Northern Europe forms the DRB1*13:01-DQB1*06:03 haplotype, 16 whereas in African-Americans both DRB1*13:01-DQB1*06:03 and DRB1*13:01-DQB1*05:02 are common haplotypes. 19 The HLA-DRB1 association is also more consistent than the association with the closely related (paralogous) HLA-DRB3 gene; e.g., PSC-associated HLA-DRB1*13:01 haplotypes may carry either the HLA-DRB3*01:01 or DRB3*02:02 alleles. 4 Given this background we aimed to explore how HLA-DRB1 variation affects the molecular characteristics of HLA-DR and susceptibility to PSC.

Materials and Methods
Subjects. Scandinavian PSC patients (n ¼ 356, Table 1) were recruited from Oslo University Hospital, Rikshospitalet, Oslo, Norway, and Karolinska University, Hospital Huddinge, Stockholm, Sweden. Diagnosis of PSC was based on accepted criteria with typical cholangiographic appearance. Ethnically and gendermatched healthy controls (n ¼ 366) were randomly selected from the Norwegian Bone Marrow Registry. All participants gave informed consent. The study was approved by the Regional Committee for Research Ethics in South-Eastern Norway and the Ethics Committee of Karolinska Institutet.
HLA-DRB1 Data. Four-digit HLA-DRB1 genotypes were available from a previous study. 20 Peptide sequences of all HLA-DRB1 alleles in IMGT/HLA database release 2.23 (October 2008) were aligned, and each individual was assigned two amino acids (one encoded by each chromosome) for each polymorphic residue.
Statistical Methods. Stepwise logistic regressions were performed in the statistical package R v2.10.0 (http://www.r-project.org/) assuming an ''allele dosage'' model, entering the count of all amino acids at a given residue as covariates. A model with all observed combinations of amino acids (''genotypes'') at a given residue entered as covariates was applied to control the validity of the model. Some combinations of amino acids were rare and after testing several criteria, combinations with a frequency of n < 2 in cases or controls at a given residue were grouped in order to avoid empty cells. In both models the reference was randomly chosen, thus no assumptions were made on which amino acid or pair of amino acids constituted high or low risk. Comparisons of allele and carrier frequencies were performed in Microsoft Excel (Redmond, WA) and PASW v. 18 (SPSS, Chicago, IL). P < 0.05 was considered statistically significant. P-values of novel HLA-DRB1 allele associations were Bonferroni corrected according to the number of alleles present in the dataset (n ¼ 32).
3D Protein Structure Modeling of HLA-DR Molecules. The atomic coordinates of the most common HLA-DR molecules were determined using comparative protein structure modeling by satisfaction of spatial restraints as implemented in the MODELLER computer algorithm. 21 HLA-DR proteins of known structure suitable as modeling templates were identified in the Protein Data Bank (PDB; http://www.rcsb.org/ pdb/) and evaluated for structural quality. Accordingly, seven structures were selected as templates (PDB entries: 1KLU, 2G9H, 1D5Z, 1D5M, 2Q6W, 1PYW, and 2IPK). The amino acid sequences of the target HLA-DR molecules were obtained from the IMGT/ HLA database. Multiple sequence alignments were performed with CLUSTAL_X v.1.83 22 and manually corrected when indicated. The alignment files were then used as input to the MODELLER program. In brief, MODELLER generates the 3D atomic coordinates of the target sequences by satisfying spatial restraints, obtained from the templates, and by CHARMM 23 energy terms enforcing proper stereochemistry. Optimization is then carried out by employing methods of conjugate gradients and molecular dynamics with simulated annealing. 24 All calculations were performed in the absence of antigenic peptides to enable direct comparison of the structural and physiochemical characteristics of the peptide-binding groove among different molecules. The stereochemical quality of the modeled structures was verified using the PROCHECK 25 and WHAT_ CHECK 26 algorithms and by assessment of Ramachandran plots. In addition, the structures were examined for protein folding quality using empirical energy potentials as implemented in the ProSA algorithm. 27 Modeled coordinate sets are available upon request.
Electrostatic Potential Calculations. The electrostatic potential around the 3D structures was computed by numerically solving the Poisson Boltzmann equation using the finite difference method implemented in the DelPhi program within Discovery Studio 2.1 (Accelrys, San Diego, CA). Essential hydrogens were added to the structures. To determine the protonation state of titratable amino acid side chains the titration curves and residue pKa were calculated for each molecule (dielectric constant of 10 for the protein interior and 80 for the solvent) and titratable residues were protonated at a pH of 7.4. The protonated protein molecule was subsequently used to compute the electrostatic potential. The low dielectric protein interior (dielectric constant of 2) was embedded in a high dielectric continuum environment (water exterior, dielectric constant of 80). A solution with charged ions was simulated with an assigned ionic strength of 0.145, typical of the conditions at a pH of 7.4. The dielectric boundary between the protein and the solvent was defined by calculating the solvent-accessible surface generated by a rolling probe sphere of 1.4 Å radius. Atomic radii and partial atomic charges were taken from the CHARMM parameter set. 23 An ion exclusion layer (Stern layer) for the solvent ions was defined around the solvent-accessible surface using an ionic radius of 2 Å . The layer has an ionic strength of 0.0 and determines the maximum distance that an ion can approach the solvent-accessible surface. The system was mapped into a 3D cubical grid and the electrostatic potential at each grid point was calculated iteratively starting from the Debye-Hückel boundary conditions. The accuracy of the calculations was improved by using a method of grid focusing; in the first run the coarse grid was allowed to be filled by 50% by solute and the calculated grid point potentials were used in the second run where the fine grid was filled by

Results
Statistical Modeling: Identification of Position 37 and 86 in the DRb1 Chain as PSC-Associated Residues. The amino acid sequence encoded by exon two of HLA-DRB1 was determined from the genotypes of each individual. Thirty residues were polymorphic, i.e., two or more different amino acids were observed at these positions. In the first step, a logistic regression was performed for each polymorphic residue. The counts (0, 1, 2) of the observed amino acids were included as covariates and the overall effect of the residue was tested with a likelihood ratio test. The strongest PSC associations were detected for residue 37 (P ¼ 1.2 Â 10 À32 , Table 2). In a second step, two-residue models were fitted containing the amino acid covariates of both the investigated residue and residue 37 and compared with the single-residue model of residue 37. The only residue that remained strongly associated with PSC in these tworesidue models was 86 (Table 2). When performing a similar two-residue test for additional effects on top of 86, several residues (i.e., also residue 37) were found to contribute significantly (Table 2). No other residues showed significant disease association when included in three-residue models with residues 37 and 86 (Table 2).
In the logistic models used above the effect of a single amino acid was assumed to be additive on the logscale: The log-odds ratio of having PSC given two copies of the amino acid is two times the log-odds ratio when having one copy. The advantage with this  model is that it keeps the number of covariates to a minimum, leading to more powerful tests as long as the model assumptions are approximately true. In order to confirm the results obtained with this model, we also performed regressions where we allowed each observed combination (''genotype'') of amino acids to have a potential effect. In these ''genotype'' model analyses, residue 37 remained the most significantly PSC-associated residue (P ¼ 6.9 Â 10 À32 , Table 3), with an independent contribution from residue 86 still observed (P ¼ 1.2 Â 10 À5 ). Several other residues contributed on top of residues 37 or 86 in two-residue models, as well as in three-residue models with both residues 37 and 86 included (Table 3). When inspecting the distribution of amino acid combinations (''genotypes'') in the dataset, it became apparent that the extra associated residues 26, 70, 71, 73, 74, and 77 (Table 3) reflected a large number of patients homozygous for HLA-DRB1*03:01 (n ¼ 62 patients versus n ¼ 3 healthy controls), meaning that it was not possible to determine the part of HLA-DRB1*03:01 that confers this additional risk.
In conclusion, residues 37 and 86 were consistent determinants of PSC susceptibility irrespective of sta-tistical model, whereas it was difficult to exclude additional risk associated with other parts of the b chain encoded by HLA-DRB1*03:01.
The specificity of the peptide-binding groove on an HLA class II molecule is governed by the properties of pockets in the groove that accommodate the amino acid side chains of the bound peptide, typically pockets for peptide residues 1 (pocket P1), 4, 6, and 9. Residue 37 of the HLA-DRb1 chain is integral to pocket P9. 28 Figure 1 shows the structural and electrostatic characteristics of pocket P9 on representative HLA-DR molecules. Significantly, HLA-DR carrying the risk residue Asn37 in the b chain (e.g., HLA-DRB1*03:01, *09:01, *13:01, *14:02; Fig. 1B) formed P9 pockets with similar structural architecture and consistently positive surface electrostatic potential (the only  exception was HLA-DRB1*13:02, further discussed below). In contrast, HLA-DR molecules expressing the protective Tyr37 residue in the b chain (e.g., HLA-DRB1*04:01, *10:01, *11:01, *03:25; Fig. 1C) formed P9 pockets with consistently negative electrostatic potential. The distinct P9 pocket electrostatic patterns were conserved both among molecules that differed at several amino acid sequence positions and between structures where residue 37 constitutes the only disparity (e.g., HLA-DRB1*03:01 and -DRB1*03:25). Interestingly, a database search for peptides eluted from HLA-DR molecules showed that the presence of Asn37 restricted the amino acid preferences at position 9 (e.g., only tyrosine, leucine, and phenylalanine are defined as P9 anchors in HLA-DRB1*0301), whereas most amino acids may be P9 anchors in HLA-DRB1*0401 which carries Tyr37 (www.syfpeithi.de). 29 Residue 86 Defines Opposite Effects of HLA-DRB1*13:01 and *13:02 on PSC Risk. At the dimorphic residue 86, the highest risk was observed for carriers of valine (Val86) (OR ¼ 4.8, 95% CI 2.9-7.9), whereas glycine (Gly86) appeared protective (OR ¼ 0.25, 95% CI 0.18-0.34). Residue 86 of the HLA-DRb1 chain is integral to pocket P1. 28 In contrast to pocket P9, modeling of the P1 pocket of several HLA-DR molecules showed that the glycine/valine dimorphism at residue 86 had a minimal physiochemical effect. The majority of HLA-DR molecules examined had P1 pockets with an overall neutral charge (Fig. 2). Even though a steric effect (i.e., an effect on the volume of the pocket) imposed by the side chain of Val86 cannot be excluded, the results of the present analysis argue against a significant role of residue 86 on the choice of peptide residue accommodated by pocket P1.
Further analysis, however, led to an interesting observation. As mentioned above, HLA-DR molecules expressing the risk residue Asn37 in their b chain possess electropositive P9 pockets, with the exception of HLA-DRB1*13:02 where an electronegative P9 pocket was observed (Fig. 3). Notably, when looking at the allele frequencies, HLA-DRB1*13:02, as opposed to other Asn37 encoding alleles (like the established PSC risk allele HLA-DRB1*13:01), was more frequent in healthy controls than in PSC patients (P corrected ¼ 0.040, Table 5), suggesting that HLA-DRB1*13:02 may protect against PSC. This statistical observation The area within the frame is depicted in expanded form in (B,C). All structures were superimposed on HLA-DRB1*03:01 and therefore show the same view. HLA-DR carrying the risk residue Asn37 in the b chain had P9 pockets (arrows) with positive charge (B), whereas molecules expressing Tyr37 had P9 pockets (arrows) with consistently negative charge (C). Potentials less than À5 kT/e are colored red, those greater than 5 kT/e blue, and neutral potentials (0 kT/e) are colored white. Linear interpolation was used to produce the color for surface potentials between these values.
is therefore in agreement with the protective effect associated with HLA-DR molecules expressing electronegative P9 pockets, as shown above for Tyr37 encoding alleles. Intriguingly, HLA-DRB1*13:02 and DRB1*13:01 have otherwise overall similar structural architecture and electrostatic properties (Fig. 3) with the main disparity observed at pocket P9. Because the only amino acid sequence difference between these alleles is at position 86 it may be suggested that the Gly86Val substitution may influence the choice of presented peptides through longrange electrostatic modification of pocket P9.

Discussion
By exploring variation in the amino acid sequence of the HLA-DRb1 chain in PSC, we show that residues 37 and 86 distinguish disease susceptibility alleles and protective alleles. Investigations into the HLA-DR molecular structure revealed that the electrostatic properties of pocket P9 are determined by residue 37 and, indirectly by residue 86, suggesting that the P9 pocket is crucial for PSC risk.
In the HLA-DR molecule, residue 37 of the b chain appeared to be a key determinant of the electrostatic properties of pocket P9, which may be related to disease risk. The situation is reminiscent of type 1 diabetes, where amino acids at residue 57 of the HLA-DQ b chain associated with disease risk contribute to a larger volume of pocket P9 and a positive charge, allowing, e.g., glutamate residues from insulin peptides at position 9. 30 In HLA-DR, Asn37 would restrict the range of amino acids at anchor position 9 of the peptide, and thereby which peptides may be presented. This is supported by data from peptide elusion experiments, where HLA-DR molecules with Asn37 and Tyr37 exhibit different ranges of amino acids at P9. 29 Direct experimental observations focusing on pocket P9 variation and T-cell responses are scarce, but it has been shown that modification of only residue 37 (on DR4 molecules) is sufficient to alter recognition by the T-cell receptor, e.g., by neutralizing the T-cell-activating potential of the peptide-DR-complex. 31,32 It should therefore be considered highly likely that characteristics of pocket P9 of the HLA-DR molecule facilitate particular immune responses.
Pocket P1 of HLA-DR was found to have an overall neutral electrostatic potential in the present study irrespective of whether glycine or valine was present at position 86. This fits with the observation that this pocket has a preference for hydrophobic amino acid side chains, and that the range of amino acids in position 1 of presented peptides is largely overlapping. 33,34 However, pocket P1 with Gly86 in the b chain (e.g., as encoded by HLA-DRB1*13:02) has a tendency to accept larger (aromatic) side chains than when Val86 is present (e.g., encoded by DRB1*13:01); this has been attributed to the lack of a side chain on glycine allowing for a larger pocket volume. 33,34 A more remarkable difference between HLA-DRB1*13:01 and DRB1*13:02 encoded HLA-DR molecules was that the amino acid substitution at residue 86 affected the electrostatic properties of pocket P9, in another part of the molecule. HLA-DRB1*13:02 was the only allele which contributed to a HLA-DR molecule with a negative pocket P9 with asparagine at position 37 of the b chain. Intriguingly, this allele exhibited a significantly reduced frequency in PSC patients. Taken together, our findings suggest that the association of residues 37 and 86 with PSC primarily reflects the properties of pocket P9.
Although HLA-DRB1*13:01 is a well-established PSC risk allele, 4 this study is the first identifying HLA-DRB1*13:02 as a protective allele. This observation was significant even when correcting for multiple comparisons. Interestingly, similar contrasting effects have been observed in autoimmune hepatitis in Latin America, where risk is associated with HLA-DRB1*13:01 and protection with DRB1*13:02. 15 HLA-DRB1*13:01 has also been associated with a protracted course of hepatitis A virus infection, which has been postulated to be a trigger of autoimmune hepatitis. 35 To what extent these parallel observations are relevant for the specificity of the immune response in PSC can currently only be speculated.
Given the complexity of the HLA associations in diseases such as PSC, it is not unlikely that other alleles besides the most strongly associated ones modify  the disease risk. Two previous studies of HLA-DR in PSC evaluated selected residues encoded by haplotypes associated with disease, 3,16 and suggested that the presence of leucine at position 38 (Leu38) of the b chain may confer risk. Leu38 is rarely present in DRb1 (most often encoded by DRB1*12 alleles). An explanation for the conflicting results is that the previous studies included alleles at both the HLA-DRB1 locus as well as those at other, paralogous, HLA-DRB loci. Several HLA haplotypes carry a second HLA-DRB gene besides HLA-DRB1, e.g., HLA-DRB1*03:01 and *13:01 haplotypes typically also carry an allele encoded by HLA-DRB3; DRB1*04 and *07:01 carry an allele encoded by HLA-DRB4, and the DRB1*15:01 haplotype carries an allele encoded by HLA-DRB5. These b chains couple with DRa and also have a role in antigen presentation. 36 They are generally observed at severalfold lower expression levels than DRb1. 37,38 However, in diseases where the second DRB gene has been shown to be of actual relevance, the association seems to be specific to the gene in question and not due to shared sequence motifs with DRB1. [39][40][41] These facts, along with the more consistent PSC associations with HLA-DRB1 rather than HLA-DRB3, 4 make it likely that the present focus on DRB1 is valid, even though an effect of other DRB loci cannot be formally ruled out at this stage. Given the LD in the HLA complex, we cannot exclude the possibility that causal variants at other loci may be associated with the distribution of amino acids observed at given positions in HLA-DRb1. The strong LD is particularly important in relation to the neighboring HLA-DQ genes and HLA-DRB paralogs, but it is also difficult to formally exclude an association with the nearby BTNL2 gene, which has been associated with inflammatory bowel disease, 42 or even genetic variants further away. When applying a ''genotype'' model, in addition to residue 37 and 86 we could not exclude a residual association that could be attributed to being homozygous for HLA-DRB1*03:01. This may be speculated to relate to effects of a recessive variant outside HLA-DRB1, 4 potentially related to the AH8.1 haplotype which is associated with multiple autoimmune diseases and probably contains several genetic variants in strong LD contributing to disease. 43 In conclusion, this study shows that variation in PSC associated residues encoded by HLA-DRB1 impose distinct structural and physiochemical characteristics on the HLA-DR peptide-binding groove, suggesting that PSC risk molecules likely present a restricted peptide repertoire. The findings are highly relevant for and important to evaluate in future experimental studies of antigen presentation in PSC. The amino acid sequence and structural observations did not apply uniformly to all PSC patients, suggesting multiple pathogenetic mechanisms, as might be expected for a disease with the clinical heterogeneity observed in PSC.