Figure 1. Overview of the data analysis steps and methods utilized in this study.

Figure 2. Percentage of the peak overlap of the 39 overlapping KZNFs between the Trono original and Hughes data (dark blue bars), Hughes data and Trono original data (light blue bars), Trono reprocessed and Hughes data (dark red bars) and Hughes and Trono reprocessed data (light red bars). Figure2 (.xlsx).

Figure 3. Overview of the ERE enrichment in Hughes and Trono ChIP data. (A) Pearson Correlation between the 39 Hughes and Trono reprocessed overlapping KZNFs (matched pairs; red bars) and non-overlapping KZNFs (unmatched pairs: 2964 comparisons; blue bars) and the frequency of the KZNF pairs at each given correlation. The arrow indicates the correlation beyond which 82% of the matched pairs and 8% of the unmatched pairs lie. The percentage of the peak overlap between the Hughes and Trono reprocessed (yellow dots) and Trono reprocessed and Hughes (green dots) at corresponding correlations are also presented. (B) Fraction of the top 500 overlapping KZNFs enriched in TEs (ERE instances and transposons). In total, 51 single TE instances were enriched with a fraction of > 0.1. O = Trono Original, R = Trono Reprocessed, H = Hughes. Figure3 (.xlsx).

Figure 4. Similarity between overlapping motifs from all sources (ChIP data and in-vitro motifs). (A) Similarity between the Hughes and Trono motifs for ZIM3. The heat map on the left indicates the MoSBAT similarity e-scores between each pair compared. The motifs and the motif-finding methods are represented on the right. TO = Trono Original, TR = Trono Reprocessed, H = Hughes. (B) The MoSBAT e-scores between Hughes motifs and Trono original and Trono reprocessed motifs and the corresponding aligned motifs for the 39 overlapping KZNFs. TO = Trono Original, TR = Trono Reprocessed, H = Hughes, R = RCADE, M = MEME, a = all peaks, nE = nonERE peaks. (C) MoSBAT similarity e-scores for the 39 overlapping KZNFs between the Hughes data and Trono original (blue) and Trono reprocessed (red). The dots indicate the percentage of the overlap between the Hughes and Trono original peaks (blue) and Hughes and Trono reprocessed peaks (red). Figure4C (.xlsx).

Figure 5. Significance level of the Hughes and Trono motifs and external motifs overlapping any of the two data sets. Heat map represents the AUROC value of each motif tested on Hughes, Trono original or Trono reprocessed peaks. The first row at the top indicates the source of the motif and the second row indicates the test data set. TO = Trono Original, TR = Trono Reprocessed, H = Hughes . Figure5 (.xlsx).

Figure 6. The highly confident reference motif set for the 242 KZNFs. (A) Number and percentage of the motifs fit into classes A-F and the median AUROC values of each group. (B) The reference motif for each of the 242 KZNFs. The source is where the motif was originally obtained from (TO = Trono Original; TR = Trono Reprocessed; H = Hughes; Naj = ChIP-seq (NAJAFABADI et al. 2015b); SM = SMiLE-seq (ISAKOVA et al. 2017); SelY = HT-SELEX (YIN et al. 2017); MSelY = Methyl-HT-SELEX (YIN et al. 2017); SelJ = HT-SELEX (JOLMA et al. 2013); EN = ENCODE; Trans = TRANSFAC; HM = HocoMoco). The group is the selection class that each motif falls into.

Figure 7. Web portal of ZNF549 containing all the analyses described. (A) Motifs for the same KZNF derived from different sources. (B) MoSBAT similarity heat maps between all motifs. (C) Overlap between Hughes peaks and Trono reprocessed (left) and Trono original (right) peaks for all peaks and top 500 peaks. (D) ERE enrichment for the Hughes and Trono ChIP peaks.


Figure S1. Number of KZNFs from different data sets, and their overlap.

Figure S2. Ranking system for selection of the reference motif categorized in six classes (Class A-F).