Friday, February 11, 2011

Database Annotation in Molecular Biology







Contents
Preface ix
List of Contributors xi
1 Annotation and Databases: Status and Prospects 1
`
M. Hoebeke, H. Chiapello, J.-F. Gibrat, Ph. Bessieres
and J. Garnier
1.1 Introduction 1
1.2 Annotation of Genomic Data 3
1.3 Databases: Concepts and Definitions 9
1.4 Access to Annotation Databases 12
Glossary 19
References 20
I THE DATABANKS
2 Survey of Sequence Databases: Archival Projects 25
M. Magrane, M. Garcia-Pastor and R. Apweiler
2.1 Introduction 25
2.2 Nucleotide Sequence Databases 27
2.3 Swiss-Prot 33
2.4 TrEMBL 39
2.5 PIR 40
2.6 UniProt 42
References 43
3 Survey of Sequence Databases: Derived Databases 45
M. Pruess, N. Mulder and R. Apweiler
3.1 Introduction 45
3.2 Protein and Gene Family Databases 47
3.3 Discussion 58
References 60
4 Databanks of Macromolecular Structure 63
H. J. Bernstein and F. C. Bernstein
4.1 Introduction 63
4.2 Background 64
4.3 Archival Structural Databases Now 68
4.4 Contextual Databases 73
4.5 Derived Structural Data Databases 74
4.6 Summary and View of the Future 76
References 77
5 Gene Expression Databases 81
H. Parkinson
5.1 Introduction 81
5.2 What Do We Mean by Microarray Gene Expression Data? 83
5.3 Data Complexity 83
5.4 Minimum Information About a Microarray Experiment (MIAME) 85
5.5 Journals and MIAME 88
5.6 Storage and Exchange Formats: MAGE-OM and MAGE-ML 89
5.7 ArrayExpress 91
5.8 Annotation Tools 92
5.9 Curation 92
5.10 Standardization and Semantics 93
5.11 Public Microarray Databases 94
5.12 ArrayExpress, an Example of a Public Repository 94
5.13 Submissions to ArrayExpress 94
5.14 MIAMExpress and Other MIAME Compliant Annotation Systems 95
5.15 Databases of Protein Expression Patterns 95
5.16 The Gene Expression Database (GXD) 96
5.17 Conclusion 97
References 97
II THE BASIS OF ANNOTATION
6 Taxonomy: a Moving Target for Sequence Data 101
M. I. Krichevsky
6.1 Introduction 102
6.2 Nomenclature 104
6.3 Operational Definitions 106
6.4 Searching for the Taxonomic Gold Standard 109
6.5 Conclusions 112
References 112
7 Genomics and Proteomics: Design and Sources of Annotation 113
K. Mayer and G. Mannhaupt
7.1 Beyond the Sequence: the Challenge of Complete Genome Analysis 114
7.2 Extracting the Genes 114
7.3 Organism Specific Peculiarities 116
7.4 Topology of Genomes 117
7.5 Gene Extraction Pipelines 118
7.6 Added Value and Knowledge 121
7.7 Beyond the Parts List 124
References 126
8 Annotation of Protein Sequences 131
W. C. Barker and C. H. Wu
8.1 Introduction 132
8.2 What is Annotation? 132
8.3 UniProt: Universal Protein Resource 133
8.4 Protein Family Classification 134
8.5 InterPro: Integrated Resource of Protein Families, Domains and Sites 134
8.6 PIR Protein Families and Superfamilies 135
8.7 Ontologies 136
8.8 Protein Names, Source Information and Unique Identifiers 137
8.9 Common Identification Errors 138
8.10 Evidence Attribution 139
8.11 Position Specific Annotations 141
8.12 Rule-based Annotation 142
8.13 Conclusions 144
Acknowledgements 145
References 145
9 Issues in the Annotation of Protein Structures 149
G. J. Swaminathan, J. Tate, R. Newman, A. Hussain,
J. Ionides, K. Henrick and S. Velankar
9.1 Data Harvesting 151
9.2 Identification of the Biologically Relevant Assembly 152
9.3 Taxonomy 154
9.4 Sequence Recognition and Cross-reference 155
9.5 Recognition of Secondary Structure Elements 156
9.6 Validation of Structures 157
9.7 Residue Identification 158
9.8 Hetgroup Identification 159
9.9 Solvent Handling 161
9.10 Miscellaneous Annotation Issues 161
9.11 Conclusions 163
References 163
10 Classification of Protein Function 167
A. M. Lesk, H. Parkinson and J. C. Whisstock
10.1 Introduction 167
10.2 Mechanisms of Divergence of Protein Function 169
10.3 Classification of Protein Functions 171
10.4 Methods for Assigning Protein Function 175
10.5 Applications of Full-organism Information: Inferences from
Genomic Context and Protein Interaction Patterns 179
10.6 Conclusions 180
References 180
III DATABASE DESIGN AND INTEGRATION
11 Information Flow and Data Integration of Databanks 187
C. H. Wu and W. C. Barker
11.1 Introduction 187
11.2 Information Flow Among Databanks 188
11.3 Database Distribution Format 192
11.4 Genome Annotation Errors and Error Propagation 195
11.5 Data Integration and Knowledge Discovery: iProClass Case Study 196
11.6 Conclusions 198
Acknowledgements 199
References 199
12 Models of Database Interconnectivity 203
G. J. L. Kemp
12.1 Introduction 203
12.2 Heterogeneity in Bioinformatics Data Management 204
12.3 Data Models 206
12.4 Architectures for Data Integration 211
12.5 Implementing a Database Federation 214
12.6 Conclusions 218
References 219
13 The European Bioinformatics Institute Macromolecular
Structure Relational Database Technology 223
H. Boutselakis, D. Dimitropoulos, K. Henrick, J. Ionides, M. John,
P. A. Keller, P. McNeil, J. Pineda and A. Suarez-Uruena
13.1 Database Design Process 225
13.2 Loading and Exporting Data in mmCIF 226
13.3 Exporting mmCIFs or XML Files from the Deposition Database 229
13.4 Subtypes and ‘Leaf Views’ 229
13.5 Maintenance Aspects 230
13.6 Data Clean-up 231
13.7 The Search Database 232
13.8 Transformation 234
13.9 Incremental Transformation 234
13.10 Replication 235
13.11 Oracle Cartridge Applications 236
13.12 Related Data Warehouse 238
Acknowledgements 238
References 238
IV CONCLUSIONS AND PROSPECTS
14 Looking Around, Looking Ahead 243
A. M. Lesk
Index 245


Another Bioinformatic Books
Download

No comments:

Post a Comment

Related Posts with Thumbnails

Put Your Ads Here!