CTx-648

Metagenomic characterization of lysine acetyltransferases in human cancer and their association with clinicopathologic features

Yuanyuan Jiang 1, Xuhui Guo 1 2, Lanxin Liu 1, Shomita Rode 1, Rui Wang 1 3, Hui Liu 1 4, Zeng-Quan Yang 1 5

Abstract
Lysine acetyltransferases (KATs) are a highly diverse group of epigenetic enzymes that play important roles in various cellular processes including transcription, signal transduction, and cellular metabolism. However, our knowledge of the genomic and transcriptomic alterations of KAT genes and their clinical significance in human cancer remains incomplete. We undertook a metagenomic analysis of 37 KATs in more than 10 000 cancer samples across 33 tumor types, focusing on breast cancer. We identified associations among recurrent genetic alteration, gene expression, clinicopathologic features, and patient survival. Loss-of-function analysis was carried out to examine which KAT has important roles in growth and viability of breast cancer cells. We identified that a subset of KAT genes, including NAA10, KAT6A, and CREBBP, have high frequencies of genomic amplification or mutation in a spectrum of human cancers. Importantly, we found that 3 KATs, NAA10, ACAT2, and BRD4, were highly expressed in the aggressive basal-like subtype, and their expression was significantly associated with disease-free survival. Furthermore, we showed that depletion of NAA10 inhibits basal-like breast cancer growth in vitro. Our findings provide a strong foundation for further mechanistic research and for developing therapies that target NAA10 or other KATs in human cancer.

1 INTRODUCTION
The dynamic and reversible acetylation of protein, catalyzed by lysine acetyltransferases (KATs, also known as HATs) and deacetylases, plays critical roles in a wide range of biological processes including transcriptional regulation, DNA damage repair, signal transduction, stem cell self-renewal, cellular metabolism, and biological rhythm. Lysine acetyltransferases are a highly diverse group of enzymes. Under the action of KATs, the acetyl group from acetyl coenzyme A is co- or posttranslationally attached to either the α-amino group of the N-terminus or the ε-amino group of lysine residues of proteins including both histone and nonhistone proteins.

To date, more than 30 KATs and KAT candidates have been identified.2-4 Based on their cellular localization, KATs are classified into types A and B: type A is mainly localized in the nucleus, and type B is localized in the cytoplasm.2, 5 The type A KATs can be further grouped into 5 major families: (i) the GNAT (the GCN5-related acetyltransferase) family that includes KAT2A, KAT2B, and NAT8; (ii) the CBP/p300 family that contains CREBBP and EP300; (iii) the MYST family that includes KAT5, KAT6A, KAT6B, KAT7, and KAT8; (iv) nuclear receptor coactivator family including NCOA1 and NCOA3; and (v) basal transcription factors such as TAF1 and TAF1L.2-4 In general, KATs and histone acetylation are functionally linked with open chromatin and transcriptional activation.4 In addition to histones, cytoskeletal proteins (eg actin and tubulin) are another crucial family targeted by KATs, including KAT2A and ATAT1. Furthermore, N-terminal acetylation, catalyzed by a family of N-terminal acetyltransferases, including NAA10, NAA50, and NAA60, can influence protein properties including folding, oligomerization, stabilization, and intermolecular interactions.6, 7

Alterations of several KATs due to gene mutations, amplifications, deletions, and translocations have been linked directly to devastating human diseases, notably developmental disorders and cancer.1, 4, 5, 8, 9 Mutations in CREBBP and EP300 have been identified to cause Rubinstein-Taybi syndrome that is characterized by mental retardation, growth retardation, and a particular dysmorphology.10, 11 Dominant mutations in KAT6A (also known as MOZ and MYST3) cause intellectual disability syndrome.12 The X-linked lethal human genetic disorder Ogden syndrome is caused by NAA10 mutations.13, 14 Previous studies also document the existence of a myriad of alterations of KAT genes in both blood and solid tumors. For example, KAT6A is recurrently rearranged and fused to that of CREBBP/EP300 and other partner genes in acute myeloid leukemia.15, 16 Recurrent amplification of the KAT6A and KAT6B genes has been identified in various solid tumors, including breast cancer, ovarian cancer, uterine cervix cancer, lung adenocarcinoma, colon and rectal adenocarcinomas, and medulloblastoma.9, 15, 17 Nuclear receptor coactivators, including NCOA1 and NCOA3, are overexpressed in breast, prostate, endometrial, and pancreatic cancers where they promote tumor growth, invasion, metastasis, and chemoresistance.18

The initiation and progression of hematological malignancies and solid tumors have been associated with dysregulation of several KATs. However, our knowledge of the genomic and transcriptomic alterations of KAT genes and the clinical significance of those alterations in human cancer remains incomplete. In the present study, we undertook a metagenomic analysis of KATs in more than 10 000 cancer samples across 33 tumor types. We then focused on human breast cancer, one of the most common cancers, resulting in more than 450 000 deaths each year worldwide. We investigated the associations between recurrent copy number alteration (CNA) and gene expression level of each KAT, clinicopathologic features, and disease-free survival of patients with breast cancer. Furthermore, loss-of-function assays identified which KAT has important roles in cancer cell growth and survival in vitro. Our studies prioritize a subset of KATs for future research focused on understanding the molecular mechanisms and therapeutic potential.

2 MATERIALS AND METHODS
2.1 Genomic and clinical data on TCGA and METABRIC cancer samples
Genetic and expression alteration data from 10 967 tumor samples spanning 33 tumor types in The Cancer Genome Atlas (TCGA) Pan-Cancer studies were obtained from the cBioPortal for Cancer Genomics19-22 (http://www.cbioportal.org). In the cBioPortal, the copy number for each KAT gene was generated by the Genomic Identification of Significant Targets in Cancer (GISTIC) algorithm and categorized as copy number level per gene: −2, possible homozygous deletion; −1, heterozygous deletion; 0, diploid; 1, low-level gain; and 2, high-level amplification. The relative expression of an individual gene and the gene’s expression distribution in a reference population were analyzed in mRNA expression data. The reference population consists of tumors that are diploid for the gene in question. The Z score represents the number of standard deviations the expression of a gene is from the reference population gene expression. Somatic mutation data were obtained by exome sequencing.

Breast cancer subtype and clinicopathologic information were obtained from a previous publication and extracted through cBioPortal.19, 23 Among the 1084 breast cancer samples, 981 had intrinsic subtype data available, including 36 normal-like, 499 luminal A, 197 luminal B, 78 human epidermal growth factor receptor 2-enriched (HER2+), and 171 basal-like breast cancers.19, 22 A detailed description of the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset can be found in the original publication.24 The CNAs and normalized expression data from the METABRIC database were downloaded with permission from the European Genome-phenome Archive (https://www.ebi.ac.uk/ega) under accession number EGAC00000000005 as well as from the cBioPortal for Cancer Genomics.19 In the METABRIC dataset, 1974 samples had subtype data available, including 199 normal-like, 718 luminal A, 488 luminal B, 240 HER2+, and 329 basal-like breast cancers.24

2.2 Semiquantitative PCR reactions
To assess gene expression at the mRNA level, RNA was prepared from human breast cancer cell lines and the MCF10A cell line by using an RNeasy Plus Mini Kit (Qiagen).21 RNA was mixed with qScript cDNA SuperMix (Quanta Biosciences) and then converted to cDNA through an RT reaction. The cDNA was then used for real-time PCR reactions. Primer sets were obtained from Life Technologies. A PUM1 primer set was used as a control. Semiquantitative RT-PCR was carried out using the FastStart Universal SYBR Green Master (Roche Diagnostics) as described earlier.21, 22

2.3 Cell culture and growth assays
The SUM cell lines were obtained from Dr Stephen P. Ethier, and the remaining cell lines were obtained from ATCC and German Collection of Microorganisms and Cell Cultures. All cell lines were tested routinely and authenticated using cell morphology, proliferation rate, a panel of genetic markers, and contamination checks. To determine the effect of NAA10 overexpression on the growth of human breast cancer in vitro, NAA10 expression was knocked down using siRNA in human breast cancer cell lines. The siRNAs were purchased from Sigma-Aldrich. We used a MISSION siRNA Universal Negative Control for the negative control group. For the transfection procedure, cells were seeded in appropriate cell culture plates and maintained overnight under standard conditions. Plate sizes, cell densities, and siRNA quantities were dependent on the cell line and the experimental setup; siRNA was transfected using the MISSION siRNA transfection reagent according to the manufacturer’s protocol (Sigma-Aldrich). Expression levels of NAA10 mRNA and protein in knockdown cells were measured by quantitative (q)RT-PCR and western blot assays 48 and 72 hours after siRNA transfection, respectively. Five days after siRNA transfection, CellTiter-blue cell viability assays (Promega) were undertaken according to the manufacturer’s guideline.

2.4 Immunoblotting and Abs
Immunoblot assays were carried out as previously described.21, 22 Whole-cell lysates were prepared by scraping cells from the dishes into cold RIPA lysis buffer. Centrifugation protein content was estimated by the Bradford method. A total of 20-50 μg of total cell lysate was resolved by SDS-PAGE and transferred onto a PVDF membrane. Antibodies used for the western blot in the study included anti-NAA10 (1:1000 GTX125971; GeneTex) and anti-β-actin (1:5000 A5441; Sigma-Aldrich).

2.5 Statistical analysis
Statistical analyses were undertaken using R software (http://www.r-project.org) and GraphPad Prism (version 6.03).21, 22 Statistical significance of the differences in mRNA expression level for each KAT among different subtypes, stages, and grades of breast cancer samples was determined using ANOVA and Welch’s t test as described earlier.21, 22 Spearman and Pearson correlation tests were used to correlate copy numbers and mRNA levels of each KAT from approximately 10 000 TCGA Pan-Cancer samples together, as well as 20 individual cancer types that contain at least 200 samples. We used the “cor” function in R for computation, specifying which type of test we wanted (Spearman or Pearson). Relationships between KAT mRNA expression and disease-free survival in METABRIC breast cancer were analyzed by equally dividing 1980 samples into high and low expression groups for each KAT based on mRNA expression Z scores. Thus, there were 990 samples each with high or low expression for each KAT gene. A similar statistical method was also used for survival analysis of individual subtypes of METABRIC cancer patients.

3 RESULTS
3.1 Genomic alterations of KATs across 33 human tumor types
Genetic alterations, including CNA and somatic mutation, are a universal hallmark of cancer.25, 26 We hypothesized that KATs with recurrent genetic alterations and mRNA dysregulations play important roles in different types of human tumor, and hence serve as diagnostic and therapeutic targets. Based on the current ChromoHub database, the human genome encodes 37 KAT proteins with demonstrated or predicted acetyltransferase activity (Figure S1 and Table S1).2, 3 We first analyzed CNAs and mutations in 37 KAT genes compiled from 10 967 TCGA samples (Pan-Cancer cohort) across 33 divergent tumor types.19, 20 These 33 tumor types include 3 lymphatic and hematologic tumors and 30 solid tumors (Table S2). The copy number for each KAT was generated by the copy number analysis algorithm GISTIC and categorized according to copy number level per gene as high-level amplification, low-level gain, diploid, shallow deletion (possibly a heterozygous deletion), and deep deletion (possibly a homozygous deletion). Somatic mutation data were obtained from exome sequencing.19, 20 We will use the term “alteration” henceforth for gene amplification, deep deletion, and mutations, including missense mutation, gene truncation, and fusion. In the Pan-Cancer cohort, 42% of the tumors contained an alteration in at least 1 of the 37 KAT genes. Among genomic alterations, KAT6A (3.18%) and NAA10 (1.78%) were the most frequently amplified, and ESCO2 (2.68%) and ELP3 (2.64%) were most frequently deeply deleted (Figure 1A,B). In mutations, 2 pairs of homologs CREBBP (4.65%) and EP300 (4.22%), and TAF1 (3.46%) and TAF1L (4.48%) were the most mutated KATs in the TCGA Pan-Cancer cohort (Figure 1C).

Figure 1
Heatmap showing the frequencies of lysine acetyltransferase (KAT) (A) amplification (red), (B) deep deletion (blue), and (C) mutations (green) across 33 tumor types from The Cancer Genome Atlas (TCGA). Heatmap was generated using Morpheus software from the Broad Institute (https://software.broadinstitute.org/morpheus/). The 33 TCGA tumor types are: breast invasive carcinoma (BRCA), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), ovarian serous cystadenocarcinoma (OV), uterine carcinosarcoma (UCS), uterine corpus endometrial carcinoma (UCEC), bladder urothelial carcinoma (BLCA), esophageal carcinoma (ESCA), head and neck squamous cell carcinoma (HNSC), lung squamous cell carcinoma (LUSC), adrenocortical carcinoma (ACC), brain lower grade glioma (LGG), cholangiocarcinoma (CHOL), colon adenocarcinoma (COAD), glioblastoma multiforme (GBM), kidney chromophobe (KICH), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), mesothelioma (MESO), pancreatic adenocarcinoma (PAAD), pheochromocytoma and paraganglioma (PCPG), prostate adenocarcinoma (PRAD), rectum adenocarcinoma (READ), sarcoma (SARC), skin cutaneous melanoma (SKCM), stomach adenocarcinoma (STAD), testicular germ cell tumors (TGCT), thyroid carcinoma (THCA), uveal melanoma (UVM), acute myeloid leukemia (LAML), lymphoid neoplasm diffuse large B-cell lymphoma (DLBC), thymoma (THYM).

There is considerable variation in the KAT genes altered across different tumor types. We found that 3 KAT genes were amplified in more than 10% of individual TCGA solid tumors; KAT6A was amplified in 17.86% of uterine carcinosarcoma, BRD4 was amplified in 12.06% of ovarian cancer (OV), and ESCO1 was amplified in 10.38% of pancreatic cancer samples (Figure 1A and Table S3). The highest percentages of deep deletion were ESCO2 in prostate cancer (6.95%) and OV (6.47%) samples (Figure 1B and Table S4). Seven KAT genes (CREBBP, EP300, KAT6A, TAF1, TAF1L, MCM3AP, and CIITA) had mutation frequencies greater than 10% in at least 1 tumor type (Figure 1C and Table S5). Notably, CREBBP was mutated in more than 10% of 4 tumor types: bladder cancer (BLCA, 13.41%), uterine corpus endometrial cancer (UCEC, 12.77%), diffuse large B-cell lymphoma (12.20%), and melanoma (SKCM, 1 2.05%). The highest frequency of mutation in TAF1, EP300, and TAF1L is in UCEC (20.12%), BLCA (15.85%), and SKCM (15.45%), respectively. Taken together, among 37 KATs, several KAT genes, including KAT6A, NAA10, CREBBP/EP300, TAF1/TAF1L, ESCO2, and ELP1 had relatively higher frequencies of genetic alterations in a spectrum of human tumors.

3.2 Mutation profiling of CREBBP/EP300 and TAF1/TAF1L
The CBP/p300 (KAT3) subfamily of KATs is composed of 2 homologous enzymes, CREBBP and EP300, both containing a well-conserved acetyltransferase domain and a number of protein interaction domains (eg bromodomain and plant homeodomain) that facilitate binding with over 400 proteins.27-29 CREBBP and EP300 are key transcriptional coactivators that are essential for a multitude of cellular processes.27 Both CREBBP and EP300 were highly mutated (more than 4%) in TCGA Pan-Cancer samples. In 10 436 TCGA samples with mutation data, there are 629 CREBBP mutations, including 451 missense, 148 truncating, 8 in-frame, and 22 fusion mutations. The most mutated site of CREBBP was R1446 (15 samples with missense mutation) which is located at the KAT enzymatic domain (Figure 2). For EP300, there are 556 mutations, including 378 missense, 159 truncating, 5 in-frame, and 14 fusion mutations. The most mutated residue of EP300 was D1399, which also is located at the KAT enzymatic domain (Figure 2).

Figure 2
Mutational spectra of CREBBP, EP300, TAF1, and TAF1L in The Cancer Genome Atlas (TCGA) Pan-Cancers. Images show protein domains and the positions of somatic mutations in CREBBP, EP300, TAF1, and TAF1L in TCGA Pan-Cancers. Red dot, nonsense mutation, frameshift deletion, insertion, or splice; green dot, missense mutation; black dot, in-frame insertion or deletion. The TAF1 gene encodes the largest subunit of the transcription factor II D complex, which promotes transcriptional initiation and activation.30 The TAF1 homologue TAF1L has 95% amino acid identity with TAF1.31 In TCGA Pan-Cancer, there are 477 TAF1 mutations, including 418 missense, 54 truncating, 4 in-frame, and 1 fusion mutations. R869, which is located at the putative histone acetyltransferase domain of TAF1, had the highest frequency (9 mutations) of mutations (Figure 2). For TAF1L, there are 625 mutations, 565 missense, 58 truncating, 1 in-frame, and 1 fusion mutations. R658, R1205, and R1252 have mostly missense mutations (Figure 2). Together, 4 KAT genes (CREBBP, EP300, TAF1, and TAF1L) had higher frequencies of somatic mutation, and enriched mutation sites likely have an effect on their KAT activities.

3.3 Copy number and expression profiling of KATs in different subtypes of breast cancer
Breast cancer is the most common cancer and one of the leading causes of cancer death among women. Using gene expression profiling, breast cancer has been classified into 5 intrinsic subtypes with distinct risks and underlying biology; these 5 subtypes are luminal A, luminal B, HER2+, basal-like, and normal-like breast cancers.23, 32 One of the KATs, MAPT, is in the PAM50 (prediction analysis for microarray 50) gene panel for classifying the 5 subtypes of breast cancer. To determine whether the genetic alteration or mRNA expression of other KATs is specific to a breast cancer subtype, we analyzed CNA and mRNA expression independently across 5 subtypes of breast cancer samples.20 The frequencies of high-level amplification, low-level gain, diploid, heterozygous deletions, homozygous deletions, and somatic mutation of 37 KAT genes in 5 TCGA breast cancer subtypes are shown in Table S6. Notably, KAT6A was amplified in 7.62% of luminal A, 9.64% of luminal B, 5.13% of HER2+, and 11.11% of basal subtypes of breast cancer. Additionally, ELP3 and ESCO2 also showed the highest frequency of homologous deletion in luminal B and basal-like subtypes (Table S6). We also noted that CREBBP had the highest frequency (7.02%) of mutation in basal-like breast cancer (Table S6).

Next, we analyzed the correlation between copy number and mRNA level of 37 KATs from TCGA samples (Table S7). In the Pan-Cancer analysis, 36 of 37 KATs have positive correlation between mRNA and DNA copy number in both Spearman and Pearson correlation, with 23 KATs having both Spearman and Pearson coefficients greater than 0.3. Based on the means of Spearman coefficient of all KATs, we found that BRCA, OV, lung squamous cell carcinoma, and cervical squamous cell carcinoma had relatively higher positive correlation coefficient, whereas thyroid carcinoma and acute myeloid leukemia had lower positive correlation coefficient (Table S7 and Figure S2). These data suggested that CNA could be a major factor in the dysregulation of several KAT mRNAs in human cancer, including breast cancer. We also found that different subtypes of breast cancer had different patterns of expression for each KAT gene. Consistently with previous studies, we revealed that MAPT was significantly underexpressed, whereas BRD4 was significantly overexpressed in estrogen receptor- negative (ER-), basal-like breast cancer (Table S8). In addition to BRD4, we found that 8 KATs, including NAA10 and ACAT2, were also significantly higher expressed (P < .05, mean Z score difference greater than 0.5) in basal-like compared with luminal subtypes, and 11 genes including KAT6A, KAT6B, KAT7, and NCOA3 showed lower expression in basal-like subtype breast cancer (Figure 3A and Table S8).

Figure 3
Expression levels of 4 lysine acetyltransferases (KATs), NAA10, ACAT2, KAT6A, and KAT8, in breast cancers grouped by subtype, grade, and Nottingham prognostic index (NPI). A, Expression levels of 4 KATs across 5 subtypes of The Cancer Genome Atlas (TCGA) breast cancer samples. B, Expression levels of 4 KATs across 5 subtypes of Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) breast cancer samples. NAA10 and ACAT2 were expressed at significantly higher levels (P < .05) in basal-like compared with luminal subtypes. C, Expression levels of 4 KATs in 3 grades of METABRIC breast cancer samples. D, Expression levels of 4 KATs in METABRIC patients with a poor prognosis (NPI > 3.4) and good prognosis (NPI ≤ 3.4) scores. High expression of NAA10 and ACAT2, but not KAT6A and KAT8, were significantly (P < .001) associated with high grade and poor prognosis of METABRIC breast cancers.

To validate our findings from TCGA breast cancer dataset regarding KAT genetic alterations, we undertook an independent analysis using the METABRIC dataset, which contains 2509 breast cancer samples with long-term clinical follow-up data.24 We found that KAT6A had the highest frequency (10.81%) of high-level amplification in the METABRIC breast cancer samples (Table S9). Among 37 KAT genes, 3 genes, EP300, TAF1, and NCOA3, had been selected for the mutation analysis with the frequencies of 2.75%, 2.07%, and 1.63%, respectively, in METABRIC breast cancer samples.33 For mRNA expression, data for 33 KATs were available (mRNA data of BLOC1S1, CIITA, NAT8B, and NAT8L were not available) in the METABRIC dataset. Again, we found that 3 KATs (NAA10, ACAT2, and DLAT) had significantly higher expression (P < .05, mean Z score difference greater than 0.5), whereas 5 KATs (MAPT, CREBBP, EP300, KAT6B, KAT8, and NAA60) were underexpressed in METABRIC basal-like subtypes compared with luminal subtypes (Figure 3B and Table S10).

3.4 Association of KAT gene expression with clinical features and survival of breast cancer patients
We next investigated the clinical relevance of KAT expressions in the METABRIC cohort, as the METABRIC breast cancers have long-term clinical follow-up data. We first examined expression levels of each KAT gene at different grades of METABRIC breast cancer samples. The means of Z score and P value for each KAT gene across grades 1-3 are shown in Table S11. We found that 3 genes, NAA10, ACAT2, and NCOA3 were significantly highly expressed in higher-grade breast cancers (t test: grade 3 vs 1 + 2; P < .001 and means difference greater than 0.4; Figure 3C and Table S11). The Nottingham prognostic index (NPI), a clinicopathologic classification system based on tumor size, histological grade, and lymph node status that is widely used in Europe for breast cancer prognostication, was also available in the METABRIC cohort.

Thus, we compared expression levels of KATs between patients with high NPI (greater than 3.4) vs those with low NPI (3.4 or less). As shown in Figure 3D and Table S11, 3 KATs, NAA10, ACAT2, and NCOA3, were significantly overexpressed, whereas ELP3 and MAPT were underexpressed in samples with higher NPI (P < .001 and means Z score difference greater than 0.4) compared with that in lower NPI samples.35 Next, we analyzed the relationship between KAT mRNA expression and disease-free survival of METABRIC breast cancer patients. We found that higher mRNA levels of 4 KATs (ACAT2, BRD4, NAA10, and NCOA3) and lower expression of 5 KATs (MAPT, ACAT1, ELP3, KAT2A, and KAT8) were significantly associated with shorter disease-free survival in METABRIC breast cancer patients (P < .005) (Figures 4A and S3, and Table S12).35 We also undertook survival analysis of KATs in different subtypes of breast cancer. As shown in Table S12, we found that higher expression of BRD4 was significantly associated with shorter disease-free survival in basal-like and luminal B subtypes, whereas lower expression of MAPT and ACAT1 was significantly 3 with shorter disease-free survival in luminal A patients (P < .05). In summary, we found that three KATs, NAA10, ACAT2, and BRD4, were highly expressed in the aggressive basal-like subtype, and NCOA3 was highly expressed in luminal B subtype, and several KATs (eg BRD4 and NAA10) were significantly associated with tumor grade and disease-free survival of breast cancers.

Figure 4
Lysine acetyltransferases (KATs) associated with disease-free survival of breast cancer and contributed to growth and viability of tumor cells in vitro. A, Kaplan-Meier plots of disease-free survival associated with mRNA expression levels of 4 KATs (ACAT2, BRD4, NAA10, and NCOA3) in Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) breast cancers. For the survival analysis, we equally divided 1980 METABRIC samples into high and low expression groups for each KAT based on mRNA expression Z scores. B, Scatter plot showing mean of each KAT dependency score in genome-scale loss-of-function screens of 712 tumor lines

3.5 Potential role of NAA10 in breast cancer proliferation
The availability of multiple datasets comprising genome-scale RNAi viability screens in hundreds of diverse cancer cell lines presents new opportunities for understanding cancer vulnerabilities.36, 37 To determine how endogenous KATs contribute to growth and viability of tumor cells, we analyzed a genome-wide loss-of-function shRNA screen in 712 tumor lines (https://depmap.org/portal/).36, 38, 39 We ranked average dependency (DEMETER2) scores of all 37 KATs and revealed that 6 of them, notably MCM3AP, NAA10 and BRD4, have average DEMETER2 scores of less than −0.5 (lower score means more dependency on target) (Figure 4B).40 Given that NAA10 was overexpressed in advanced stage, higher grade, and basal-like breast cancer, we undertook a more detailed analysis of NAA10 shRNA screen data in breast cancer cell lines. Among 80 breast cancer lines, 63 lines have DEMETER2 scores of less than −0.5, and among them, 10 lines have scores of less than −1.0 (Table S13).

Next, we carried out qRT-PCR and western blot assays to measure NAA10 in 22 breast cancer cell lines. Compared with MCF10A, an immortalized but nontumorigenic breast epithelial cell line, mRNA levels of NAA10 were highly expressed in 13 lines, particularly the basal-like subtype lines including SUM149, MDA-MB-468, and SUM1315 (Figure 5A). Expression levels of the NAA10 protein were also higher in basal-like breast cancer lines (Figure 5B). To confirm the results of genome-wide RNAi screens in cancer cell lines, we examined the effects of knocking down NAA10 in SUM149, MDA-MB-468, and SUM1315 basal-like breast cancer lines. We obtained 3 siRNAs targeting different regions of NAA10 gene. Quantitative RT-PCR and western blot assays revealed that 2 siRNAs significantly decreased the expression of NAA10 at mRNA and protein levels (Figures 5C and S4). As shown in Figure 5C, NAA10 knockdown dramatically slowed SUM149 cell growth to approximately 40% of the growth of the nonsilenced control. NAA10 knockdown also modularly, but statistically significantly, slowed MDA-MB-468 and SUM1315 growth (Figure 5C). Our genomic, clinical, and in vitro cellular data suggest that NAA10 has a potential role in promoting breast cancer growth and progression.

Figure 5
Knockdown of NAA10 inhibits cell proliferation and survival in breast cancer cell lines. A, mRNA expression levels of NAA10, measured by quantitative RT-PCR, in a panel of 21 breast cancer cell lines. mRNA expression levels in the immortalized but nontumorigenic breast epithelial cell line MCF10A cells were arbitrarily set as 1. Cell lines: black, MCF10A; green, normal-like breast cancer line; blue, luminal breast cancer cell lines; pink, HER2+ breast cancer cell lines; and red, basal-like breast cancer cell lines. B, Immunoblot analysis of NAA10 expression in a panel of breast cancer cell lines as well as MCF10A line. C, top panel: western blots showing successful knockdown NAA10 protein levels with 2 different siRNAs. Bottom panel: bar graph shows relative cell growth after knocking down NAA10 in SUM149, MDA-MB-468, and SUM1315 breast cancer cells (*P < .05, **P < .01). Data are expressed as mean ± SD.

4 DISCUSSION
Here, we describe the genomic and functional analysis of 37 KATs in a large cohort of human cancer primary samples and cell lines. Our Pan-Cancer analysis showed that several KAT genes, including KAT6A, NAA10, CREBBP/EP300, and TAF1/TAF1L have relatively higher frequencies of genetic alterations in various human cancers. Next, we undertook integrated genomic and transcriptomic analyses of KAT genes in breast cancer with different molecular subtypes and clinicopathologic features. We identified that a subset of KAT genes, notably KAT6A and NAA10, are commonly amplified and overexpressed in breast cancer. Different subtypes of breast cancer had different patterns of copy number and expression of each KAT. Importantly, we found that 3 KATs, NAA10, ACAT2, and BRD4, were highly expressed in the aggressive basal-like subtype, and their expression was significantly associated with disease-free survival. Furthermore, we found that knockdown of NAA10 inhibits basal-like breast cancer growth, suggesting NAA10 has oncogenic potential in breast cancer.

Among 37 KATs, the KAT6A gene (located at 8p11.21) had the highest frequency of amplification in both TCGA Pan-Caner and METABRIC breast cohorts. Notably, more than 5% of breast, uterine, lung squamous, esophageal, and bladder cancers have KAT6A amplification. These data agree with and consolidate prior reports on the genetic alterations and oncogenic potentials of KAT6A in human cancer.17, 41-43 For example, Turner-Ivey et al suggested that KAT6A is a candidate oncogene in luminal breast cancer.41 Furthermore, Yu et al found that KAT6A activates ERα promoter through its histone acetyltransferase function.44 In this study, we also found that KAT6A was highly amplified and overexpressed in a subset of ER-, basal-like breast cancers. Very recently, Baell and colleagues identified novel competitive KAT6A inhibitors, WM-8014 and WM-1119, which induce cell cycle exit and senescence and are effective in preventing the progression of lymphoma in mice.45 It will be interesting to test KAT6A inhibitors in human cancer models with KAT6A amplification in the future.

CREBBP and EP300 (CBP/p300) are the most mutated KATs in human cancer. It is possible that compensatory mechanisms exist among different KATs in different cell types and functional states. However, previous studies on KAT-null and heterozygous mice have revealed highly specific functions of individual enzymes in development, physiology, and disease.5, 46 Studies using CBP/p300 conditional knockout mice reveal a distinct role for CREBBP and EP300 in defined cell lineages.47 Many studies indicated that CBP/p300 function as tumor suppressors.27 Furthermore, CBP/p300 might contribute to DNA repair through histone acetylation and facilitate the recruitment of DNA repair factors to the site of damage.48 Our finding, together with others, indicate that the KAT domains of CREBBP and EP300 are hotspots for mutation in human cancer, suggesting the critical roles of KAT function of CREBBP/EP300 in regulating tumor growth and progression.27

Genome-scale shRNA viability screens in human cancer cell lines revealed that, among KATs, MCM3AP, NAA10, and BRD4 have the strongest preferential dependency (Figure 4B). The MCM3AP gene is mutated in 2.36% of TCGA samples, with the highest frequency of mutation (10.06%) in UCEC. The majority of MCM3AP mutations in UCEC tumor were missense. The MCM3AP acetylates MCM3, a highly conserved minichromosome maintenance protein, which is involved in the initiation of genome replication.49 Interestingly, 1 UCEC line, called EN, with the MCM3AP mutation has the lowest DEMETER2 score in genome-scale shRNA viability screens of human cancer cell lines (https://depmap.org/portal/). Therefore, it is worth investigating the functional roles of MCM3AP mutation in human cancer, particularly UCEC, and whether it is a novel therapeutic target for UCEC in the future. The functional roles of BRD4 in cancer have been intensively studied; BRD4 plays critical roles in superenhancer organization and oncogene expression regulation.50 It was recently reported that BRD4 has intrinsic acetyltransferase activity, although the functional significance of this activity to the function of BRD4 remains to be further investigated.51 Nevertheless, BRD4 is currently regarded as one of the most promising new drug targets for both hematological and solid tumors.50, 52

N-terminal acetylation impacts the modulations of protein-protein interactions, protein subcellular targeting, folding, stabilization, and degradation.7 NAA10 is the catalytic subunit of the N-terminal acetyltransferase NatA complex, which includes NAA10, NAA15, and NAA50.7 N-terminal acetylation activity by NAA10 was found to relate strongly to human genetic disease, notably Ogden syndrome.13, 14 Studies revealed that Naa10-null mice display partial embryonic lethality, growth retardation, brain disorders, and maternal effect lethality.53 In addition, NAA10 can act independently to posttranslationally acetylate a distinct set of substrates, including histone and actin.7 Previous studies also showed that NAA10 regulates cell proliferation and cancer formation, DNA damage response, hypoxia, bone formation, and neuronal development in enzymatic activity-dependent and -independent manners.54 Our functional study revealed that NAA10 is critical for cancer cell growth and survival, particularly aggressive, basal-like breast cancer. It is important to deeply investigate NAA10 function in different subtypes of breast cancer by overexpressing NAA10 in model cells in the future.

In summary, we undertook a large-scale genomic analysis of 37 KATs in human cancer, focusing on breast cancer. We found that KAT6A and NAA10 were the most commonly amplified and CREBBP/EP300 and TAF1/TAF1L were most often mutated in a spectrum of human cancers. Integrated genomic, transcriptomic, and clinicopathologic data in breast cancers revealed that several KATs, including NAA10, ACAT2, and BRD4, were significantly associated with aggressiveness and poor prognosis of patients. Loss-of-function analysis revealed that NAA10 had important roles in promoting cancer cell growth and survival. Our findings provide a strong foundation for further mechanistic research and for developing therapies CTx-648 that target NAA10 or other KATs in human cancer. Given that many cellular factors can be regulated by KATs, and KATs act in a highly context-specific and cell type-dependent manner, care is required when choosing KAT inhibitors against a specific type of cancer.

ACKNOWLEDGMENTS
This work was partially supported by grants from the Department of Defense (DoD) Breast Cancer Program BC161536, DoD Prostate Cancer Program PC130259, and the DMC Foundation to Dr Zeng-Quan Yang. We thank Dr Stephen P. Ethier for providing the SUM breast cancer cell lines. We thank Qianhui Huang for technical contributions, and Madison Bonahoom for proofreading.