2425-m1-geniomhe-group-6

RNAr library

GitHub Python Read the Docs Version Build pages-build-deployment Open Demo in colab

Welcome to RNAr python library for structural RNA!

Modeling a Library for the Manipulation of Ribonucleic Acids (RNAs)

The goal of this series of labs is to build a library that allows easy manipulation and study of RNA sequences.

Table of Contents

Installation

This library can be installed directly from github through pip:

pip install git+https://github.com/rna-oop/2425-m1-geniomhe-group-6.git

or simply clone the repository and install it locally:

git clone https://github.com/rna-oop/2425-m1geniomhe-group-6
cd 2425-m1geniomhe-group-6
pip install .

To make sure it’s installed correctly, you can run the following command in your terminal:

python -c "import RNAr; print(RNAr.__version__)"

Documentation of the various functions and classes can be found in on github.io

Labs

Lab1

description | report | contents

Lab2

description | report | contents

Lab3

description | report | contents

Lab4

description | report | contents

Overview of Library Functionalities

The library is designed to manipulate and study RNA sequences. It provides functionalities for:

Class Diagram

Class Diagram Each color represents a different module, and opacity variations indicate different submodules.

Object Diagram

Object Diagram

Library Structure

The classes are organized in modules and submodules as follows:

.
├── Families
│   ├── __init__.py
│   ├── clan.py
│   ├── family.py
│   ├── species.py
│   └── tree.py
├── IO
│   ├── RNA_IO.py
│   ├── __init__.py
│   ├── parsers
│   │   ├── PDB_Parser.py
│   │   ├── RNA_Parser.py
│   │   └── __init__.py
│   └── visitor_writers
│       ├── __init__.py
│       ├── pdb_visitor.py
│       ├── visitor.py
│       └── xml_visitor.py
├── Processing
│   ├── ArrayBuilder.py
│   ├── Builder.py
│   ├── Director.py
│   ├── ObjectBuilder.py
│   └── __init__.py
├── Structure
│   ├── Atom.py
│   ├── Chain.py
│   ├── Model.py
│   ├── RNA_Molecule.py
│   ├── Residue.py
│   ├── Structure.py
│   └── __init__.py
├── Transformations
│   ├── Pipeline.py
│   ├── __init__.py
│   └── transformers
│       ├── BaseTransformer.py
│       ├── Distogram.py
│       ├── Kmers.py
│       ├── Normalize.py
│       ├── OneHotEncoding.py
│       ├── SecondaryStructure.py
│       ├── TertiaryStructure.py
│       ├── Transformer.py
│       └── __init__.py
├── utils.py
└── viz.py

Overview of Modules Implementation and Design Patterns

Structure Module

The Structure module is responsible for representing the RNA molecule and its components. It contains the hierarchical structure of the RNA molecule, including models, chains, residues, and atoms. The classes in this module are designed to work together to provide a comprehensive representation of the RNA structure.

Structure class:

An interface for all the classes in the Structure module. It enforces the implementation of the accept method, which is part of the Visitor design pattern that we will discuss later.

Common Implementation:

Atom class:

Residue:

Chain:

Model:

RNA_Molecule:


Families Module

The Families module represents the evolutionary and comparitive relationships between RNA sequences. It’s a module composed of several modules itself, where each contain a class of the same name.

Family:

Clan:

PhyloTree:

TreeNode:

Species:

IO Module

The IO module is responsible for reading and writing RNA structures from and to various file formats.

RNA_IO class:

RNA_Parser class:

PDB_Parser class:

visitor_writers Module

This module is home to the Visitor design pattern, part of teh IO subpackage due to its involvement in writing and exporting files from the RNA_Molecule object.

Visitor Design Pattern

Visitor Design Pattern

The Visitor pattern is used to export an RNA molecule object into different file formats:

  1. PDBExportVisitor → Exports to PDB
  2. XMLExportVisitor → Exports to PDBML/XML (more about the format in lab3 writing section)

Visitor interface:

Structure interface:

PDBExportVisitor class:

XMLExportVisitor class:

This design separates export functionality from the RNA_Molecule class, ensuring modularity and flexibility.

Advantages of the Visitor Pattern:

Disadvantages of the Visitor Pattern:

Processing Module

The Processing module is responsible for building RNA molecules and arrays from PDB files. It uses the Builder design pattern to create complex objects step by step.

Builder Design Pattern

The Builder pattern is used to construct different representations of an RNA molecule: 1- Object-Oriented Representation (ObjectBuilder) 2- NumPy Array Representation (ArrayBuilder)

Director class:

Builder class:

ObjectBuilder class:

ArrayBuilder class:

Disadvantages of the Builder Pattern:

Advantages of the Builder Pattern:


Transformations Module

The Transformations module is responsible for applying various transformations to RNA sequences and coordinates. It uses the Chain of Responsibility design pattern to handle a series of transformations in a flexible and extensible manner.

Chain of Responsibility Design Pattern

The Chain of Responsibility pattern allows multiple handlers to process a request without the sender needing to know which handler will ultimately handle it. In this module, each transformation is represented as a handler in the chain. Each transformer transforms the data and passes it to the next transformer in the chain.

Pipeline class:

Transformer class:

BaseTransformer class:

Concrete Transformers:

Order Constraints:

For more details on each transformer, please refer to lab4/README.md.

Disadvantages of the Chain of Responsibility Pattern:

Advantages of the Chain of Responsibility Pattern:


Visualizations

For a better understanding of the data, analysis, explanation and how it can be used, we have added a viz module at the root of the library containing functions to plot different representation of RNA, from the object 3D representation to raw and processed array representation.
We have used more than 5 plotting libraries, including matplotlib, plotly, networkx, graphviz and pyvis mainly and generated different spatial, interactive and network plots.

data viz

The latter image creates new horizon to view where this library is heading. Machine learning and deep learning are the new trends in bioinformatics, particulalry in structural prediction in the last couple of years, and this library is a step towards that direction. RNA structure prediction is still an open challenge today, many efforts are being made to present different types of features or labels to this model. The transformations in this library provide a good starting point to train models on different kinds of features and be assessed by the way the model is able to come near one of the RNA structural representation.

Acknowledgements

This project was developed as part of the course OOP2 at Université Paris-Saclay, M1 GENIOMHE 2024/25.

Contributors

License

This project is licensed under the MIT License. See the LICENSE file for details. For errors, suggestions, or contributions, please open an issue.