Global identification of human transcribed sequences with genome tiling arrays


Recent large-scale transcript mapping experiments suggest that many more sequences are transcribed throughout the human genome than current gene annotation data indicate. The structures of typical mammalian genes, consisting of many small coding sequences interspersed with large introns, makes the task of finding novel transcribed sequences very difficult. Developing a comprehensive map of coding sequences therefore remains an outstanding problem in human biology.

This experiment was designed to identify transcribed sequences across the genome, particularly those distinct from previously annotated genes. To measure global transcriptional activity a series of high-density oligonucleotide tiling arrays was constructed to represent sense and antisense strands of the entire nonrepetitive sequence (1.5 Gb). A total of 51,874,388 36mer oligonucleotide probes, positioned every 46 nt on average, were synthesized via maskless photolithography at a feature density of approximately 390,000 probes per slide. The arrays were hybridized to fluorescence-labeled cDNA reverse-transcribed from triple-selected poly (A)+ liver tissue RNA.

In addition to identifying many known and predicted genes, we found over 10,000 novel transcribed sequences not detected by other methods. A large fraction of these are located in intergenic regions distal from previously annotated genes and exhibit significant homology to other mammalian proteins.

P Bertone, V Stolc, TE Royce, JS Rozowsky, AE Urban, X Zhu, JL Rinn, W Tongprasit, M Samanta, S Weissman, M Gerstein, M Snyder. (2004)
Science 306:2242-2246.