2425-m1-geniomhe-group-6

Lab 1 Report

Class Diagram

Class Diagram

According to this lab description, the first part is about RNA sequences and their spatial conformations. From the details and examples provided, we assumed that the RNA sequence that we want to model has a structure and the purpose is the manipulation of this structure.

Entity Atom

Entity Residue

Entity Chain

Entity Model

Entity RNA_Molecule

Entity Family

Entity Clan

Entity PhyloTree

Entity TreeNode

[!NOTE] This class is a helper class for PhyloTree. While it’s not explicitly mentioned in the description, we decided to include it in the model to represent the nodes in the tree. It was a crucial addition in order to describe the class a tree data structure and allow for the implementation of the tree traversal methods.

The TreeNode class serves to represent nodes in a phylogenetic tree within Pylotree, hence Phylotree has attribute of type TreeNode. It functions as the fundamental unit of the nodes list attribute, storing information in a graph-based data structure (by having a recurive link to other nodes, parent and children, in its attributes). Each node holds an RNA type as its data attribute and maintains a list of child nodes, this is shown through the self relationship in the class diagram, where a node can have multiple children nodes. Key attribute is branch_length, which represents the distance to the parent node.

Species

It is described that species refer to the organisms that contain RNA sequences belonging to a particular RNA family. So a family can be distributed across multiple species, and a species can contain multiple families. But we did not include it as a separate class but rather as an attribute of RNA_Molecule, and so we can obtain the distribution of the species for a particular family by looking at the species attribute of the RNA_Molecules that are part of that family. And in this way, the species description can also be used in the phylogenetic tree.

Object Diagram

Object Diagram

This object diagram portrays the example found in here. It shows the following objects:

object class description
a1 Atom An oxygen atom labeled “OP3” with specific coordinates (3rd oxygen of a phosphate molecule from a residue r1, found through the link)
a2 Atom A phosphorus atom labeled “P” with specific coordinates.
a3 Atom An oxygen atom labeled “OP1” with specific coordinates.
a4 Atom An oxygen atom labeled “OP2” with specific coordinates.
a25 Atom A phosphorus atom labeled “P” positioned further in the structure.
a26 Atom An oxygen atom labeled “OP1” linked to atom a25.
a48 Atom A phosphorus atom labeled “P” in another part of the structure.
a49 Atom An oxygen atom labeled “OP1” linked to atom a48.
r1 Residue A guanine residue positioned first in the sequence (has linked atoms a1, a2, a3…)
r2 Residue A guanine residue positioned second in the sequence.
r3 Residue A cytosine residue positioned third in the sequence.
ch1 Chain A chain labeled “X” that links residues together.
model Model The structural model representation, labeled with ID 0 (since x-ray structure, normally structure is one model, 0)
rna1 RNA_Molecule An RNA molecule identified by entry “7EAF,” studied via X-ray diffraction.
rna2 RNA_Molecule Another RNA molecule identified by entry “5KF6,” analyzed similarly.
sam_fam Family a SAM riboswitch family named “SAM” which is the one 7eaf molecule belong to, thus the relationship
sam1_4_fam Family SAM riboswitch family named “SAM-I/IV”, belonging to SAM clan
sam4_fam Family a riboswitch family named “SAM-IV”, belonging to SAM clan
sam_clan Clan clan of riboswitch families named “SAM”, linked to 3 families that belong to it
tree1 Phylotree phylogenetic tree representation of related sequences for the family SAM (thus the link)
root TreeNode The root node of the phylogenetic tree, attribute of tree1, thus the link with the PhyloTree object, it’s connected to all other through parent-child links
n1 TreeNode An internal tree node, conceptually a common ancestor, with a specific branch length away from root (since its linked directly to root)
n2 TreeNode tree node representing a RNA molecule of a particular species (leaf)
n3 TreeNode Another tree node representing a RNA molecule of a particular species
n4 TreeNode Another leaf representing a rna seq labeled with accession “92. CP018551.1”

Python Implementation

List of modules:

Class Atom

Class Residue

Class Chain

Class Model

Class RNA_Molecule

Class Family

Family Class Overview:

The Family class represents a family of RNA molecules, particularly those in the Rfam database. It ensures that each family is uniquely identified, maintains a list of RNA molecules as its members, and optionally includes a phylogenetic tree representation. The class prevents duplicate instances and provides structured methods for adding, removing, and retrieving RNA families.

Key Features


Attributes

Class Attributes

Instance Attributes

helper methods (private):

dunders:

Class Clan

Clan Class Overview

The Clan class represents a group of RNA families that share common ancestry or biological significance. It ensures unique identification of clans, prevents duplicates, and provides structured methods for managing RNA families (Family objects).

Key Features

Class Attributes

Instance Attributes

Class Methods

Instance Methods

Private Methods

Magic Methods

Class PhyloTree and TreeNode

This module tree.py defines a TreeNode class and a Phylotree class for constructing and managing a phylogenetic tree.

In the object diagram we see a phylotree is diretly linked to one node, which is the root (will be portrayed as an attribute) and each node is recursively linked to other nodes, in its attributes, as seen in the following implementation:

The TreeNode class represents a node in the tree with attributes:

Methods in TreeNode:

The Phylotree class represents a phylogenetic tree for RNA sequences, constructed using computational phylogenetics. It consists of:

Methods in Phylotree:

The module demonstrates tree construction using different input formats:

Example usage:

tree_dict = {
    "children": [
        {"name": "a", "branch_length": 0.05592},
        {"name": "b", "branch_length": 0.08277},
        {
            "children": [
                {"name": "c", "branch_length": 0.11049},
                {"name": "d", "branch_length": 0.31409}
            ],
            "branch_length": 0.340
        }
    ],
    "branch_length": 0.03601
}
tree = Phylotree.from_dict(tree_dict)
print(tree)

or

    newick_str = '''
    (87.4_AE017263.1/29965-30028_Mesoplasma_florum_L1[265311].1:0.05592,
    _URS000080DE91_2151/1-68_Mesoplasma_florum[2151].1:0.08277,
        (90_AE017263.1/668937-668875_Mesoplasma_florum_L1[265311].2:0.11049,
        81.3_AE017263.1/31976-32038_Mesoplasma_florum_L1[265311].3:0.31409)
    0.340:0.03601);
    '''
    tree=Phylotree.from_newick(newick_str) #success  

or from files:

tree=Phylotree.from_newick('lab1/examples/RF00162.nhx') #newick
tree=Phylotree.from_json('lab1/examples/RF00162.json') #json

Functions to Extract Data from PDB Files to Create RNA_Molecule Object

Other utility functions

[!IMPORTANT] The utils.py file contains helper functions for database calls, data extraction, and file handling. It includes interaction with the Rfam database API to retrieve information about RNA families and phylogenetic trees, automate etxraction, and manipulate various file formats like newick trees. In this module, if any files have to be downloaded as intermediary steps, they are saved in a CACHE directory defaulted to a hidden directory .rnalib_cache/ in teh working dir to avoid repeated downloads.

Python Code Examples

Kindly find an example of the implementation of the classes in this notebook

Small examples inside each class implementation:

Small Example to Manually create an RNA_Molecule

Small Example integrating RNA moelcules and families

In this example we proceeded with the RNA molecule 7EAF, and retrieved its RNA family which is teh SAM family, by just providing SAM as query and all info is automatically stored within the family object up until its own tree of type PhyloTree (see more in this notebook)

Small example automating creation of all RNA molecules found for a family

In this example, with only information regarding a family, we retrieved all family attributes from database and have a fully functional Family object with name, id, type and tree of type PhyloTree. We also automated the extraction of all RNA molecules found for the family from the PDB database, by first retriving all pdb ids through rfam api, then fetching them through PDB biopython’s api and creating the RNA_Molecule objects which we added as members of the family. Thus a fully declared family with all attributes, more in here