# Lab1: Modeling a Library for the Manipulation ofRibonucleic Acids (RNAs)

This notebook is to showcase the implementation and usage of our code following teh described class diagram in the UML diagram mentioned in the readme file.   
We will present some examples of usage, mainly relying on real life data examples retrieve in an automated way from 2 main public databases: 

* [`Rfam`](https://rfam.xfam.org/)  
* [`PDB`](https://www.rcsb.org/)

In [1]:
from RNA_Molecule import RNA_Molecule
from Model import Model
from Chain import Chain
from Residue import Residue
from Atom import Atom
from tree import Phylotree
from family import Family
from clan import Clan

from utils import *


RF00162


##  SAM-I riboswitch: lab1 example - manipulation of RNA Molecule

Example of usage of our code to create a RNA molecule. We will use the SAM-I riboswitch found in instructions of the lab1.

In [2]:
#Creating an RNA molecule
rna_molecule = RNA_Molecule("1JAT", "X-RAY DIFFRACTION", "Homo sapiens")


In [3]:
#Creating a model
model1 = Model(1)

In [4]:
#Creating a chain
ch1 = Chain('A')

In [5]:
#Adding the model to the RNA molecule
rna_molecule.add_model(model1)

In [6]:
#Adding the chain to the model
model1.add_chain(ch1)

In [7]:
#Creating Residues
res1=Residue("A", 1)
res2=Residue("U", 2)
res3=Residue("C",3)

In [8]:
#Adding Residues
ch1.add_residue(res1)
ch1.add_residue(res2)
ch1.add_residue(res3)

In [9]:
#Creating Atoms
a1=Atom("OP1", 0.1, 0.2, 0.3, "O")
a2=Atom("P", 0.4, -0.5, 0.6, "P")
a3=Atom("N1", 0.25, 0.54, 0.23, "N")
a4=Atom("C4", 0.21, 0.76, -0.93, "C")

In [10]:
#Adding Atoms
res1.add_atom(a1)
res2.add_atom(a2)
res3.add_atom(a4)
res3.add_atom(a3)

In [11]:
print(rna_molecule.get_models())
print(rna_molecule.get_models()[0].get_chains())
print(rna_molecule.get_models()[0].get_chains()[0].get_residues())
print(rna_molecule.get_models()[0].get_chains()[0].get_residues()[0].get_atoms())


[Model 1]
[A]
[A 1, U 2, C 3]
[OP1 0.1 0.2 0.3 O]


In [12]:
#Print the structure
rna_molecule.print_all()

1JAT X-RAY DIFFRACTION Homo sapiens
Model 1
ATOM 1 OP1 A A 1 0.1 0.2 0.3 O
ATOM 2 P U A 2 0.4 -0.5 0.6 P
ATOM 3 C4 C A 3 0.21 0.76 -0.93 C
ATOM 4 N1 C A 3 0.25 0.54 0.23 N



## SAM-I riboswitch: lab1 example - integration with family

In the example below, we will show how to use the code to portray the RNA molecule of the SAM-I riboswitch of PDB id `7EAF`, how to load and store it in an object, create the `SAM` family, retrieve its phylogenetic tree and add the family to the clan it belongs to.

In [13]:
# --get family from rfam
sam1_fam=Family.from_rfam('SAM')
print(sam1_fam)

Family created successfully
-- Family: RF00162 (SAM)
	Type: Cis-reg; riboswitch;
------------------------------------------------------------------------------------
				Tree:
- Internal (Branch Length: 0)
  - 0.970 (Branch Length: 0.16391)
    - 87_CP001818.1/1579334-1579230_Thermanaerovibrio_acida..[525903].1 (Branch Length: 0.07672)
    - 0.910 (Branch Length: 0.13628)
      - 88.4_ABTR02000001.1/1201751-1201854_Dethiosulfovibrio_pepti..[469381].1 (Branch Length: 0.06581)
      - _ACJX02000007.1/318260-318153_Anaerobaculum_hydrogeni..[592015].1 (Branch Length: 0.23502)
  - 0.280 (Branch Length: 0.00724)
    - 98.1_CP000141.1/2169978-2169874_Carboxydothermus_hydrog..[246194].1 (Branch Length: 0.07766)
    - 82.5_AP006840.1/278272-278389_Symbiobacterium_thermop..[292459].3 (Branch Length: 0.28187)
  - 0.900 (Branch Length: 0.06451)
    - 0.800 (Branch Length: 0.06816)
      - _CP000232.1/1354284-1354167_Moorella_thermoacetica_..[264732].2 (Branch Length: 0.15036)
      - 0.660 (Branch

In [14]:
# -- user can also create families manually

sam_1_4_fm=Family(id='RF01725',name='SAM-I-IV',type='riboswitch')


Family created successfully


In [15]:
print(sam_1_4_fm)

-- Family: SAM-I-IV (RF01725)
	Type: riboswitch



In [16]:
# -- create a clan
sam_clan=Clan(id='clanCL00012',name='SAM riboswitches')

sam_clan.add_family(sam1_fam)
sam_clan.add_family(sam_1_4_fm)

print(sam_clan)

Clan created successfully
Clan clanCL00012: SAM riboswitches
	Members: RF00162, SAM-I-IV


In [17]:
sam_riboswitch_actino=create_RNA_Molecule("7EAF")
sam_riboswitch_actino.species='Actinomyces'

sam_riboswitch_actino.print_all()

Downloading PDB structure '7eaf'...
7EAF X-RAY DIFFRACTION Actinomyces
Model 0
ATOM 1 OP3 G A 1 -9.698 3.426 -31.854 O
ATOM 2 P G A 1 -8.782 4.433 -32.44 P
ATOM 3 OP1 G A 1 -8.042 5.072 -31.321 O
ATOM 4 OP2 G A 1 -8.0 3.751 -33.502 O
ATOM 5 O5' G A 1 -9.668 5.577 -33.115 O
ATOM 6 C5' G A 1 -11.071 5.663 -32.885 C
ATOM 7 C4' G A 1 -11.77 6.41 -33.998 C
ATOM 8 O4' G A 1 -10.772 6.981 -34.881 O
ATOM 9 C3' G A 1 -12.656 5.57 -34.912 C
ATOM 10 O3' G A 1 -13.97 5.412 -34.396 O
ATOM 11 C2' G A 1 -12.616 6.33 -36.238 C
ATOM 12 O2' G A 1 -13.555 7.397 -36.237 O
ATOM 13 C1' G A 1 -11.206 6.926 -36.221 C
ATOM 14 N9 G A 1 -10.212 6.15 -36.997 N
ATOM 15 C8 G A 1 -9.009 5.69 -36.511 C
ATOM 16 N7 G A 1 -8.285 5.053 -37.384 N
ATOM 17 C5 G A 1 -9.052 5.1 -38.537 C
ATOM 18 C6 G A 1 -8.781 4.568 -39.821 C
ATOM 19 O6 G A 1 -7.779 3.941 -40.184 O
ATOM 20 N1 G A 1 -9.817 4.828 -40.716 N
ATOM 21 C2 G A 1 -10.969 5.518 -40.414 C
ATOM 22 N2 G A 1 -11.862 5.673 -41.404 N
ATOM 23 N3 G A 1 -11.229 6.022 -39.216 N

In [18]:
# --example of retrieving the tree for RF01725 family since we dont have it

sam1_4_nwk_str=get_tree_newick_from_fam('RF01725')
sam1_4_tree=Phylotree.from_newick(sam1_4_nwk_str)
print(sam1_4_tree)

- Internal (Branch Length: 0)
  - 0.980 (Branch Length: 0.00053)
    - _URS0000AB9575_12908/1-93_unclassified_sequences[12908].285 (Branch Length: 0.00055)
    - _URS0000AB3446_12908/1-93_unclassified_sequences[12908].290 (Branch Length: 0.00055)
  - 0.580 (Branch Length: 0.00899)
    - _URS0000ABC8FD_12908/1-93_unclassified_sequences[12908].284 (Branch Length: 0.00054)
    - 0.810 (Branch Length: 0.00902)
      - 0.900 (Branch Length: 0.01807)
        - 0.880 (Branch Length: 0.01753)
          - 0.620 (Branch Length: 0.00932)
            - 0.940 (Branch Length: 0.02784)
              - _URS0000AB6E63_12908/1-93_unclassified_sequences[12908].236 (Branch Length: 0.00055)
              - 0.840 (Branch Length: 0.00506)
                - _URS0000AB9FAA_12908/1-93_unclassified_sequences[12908].261 (Branch Length: 0.01732)
                - _URS0000AB9EA1_12908/1-93_unclassified_sequences[12908].262 (Branch Length: 0.00511)
            - 0.860 (Branch Length: 0.00054)
              - _URS000

## RF01510 family example:

In this example we will  create all RNA_Molecule objects from all structures of the RF01510 family that are found on the rfam database, through an automated function in `utils.py`. We will then create the family object and retrieve the tree from a dictionary object.

_the goal of this notebook is to show the diversity of usage of the code and the benefit of automation and api usage to retrieve data_

In [19]:
fam_RF01510=Family.from_rfam('RF01510')

Family created successfully


In [20]:
fam_RF01510


        Family(
            id=2dG-I, 
            name=RF01510, 
            type=Cis-reg; riboswitch;, 
            members=[],
            tree=- Internal (Branch Length: 0)
  - 87.4_AE017263.1/29965-30028_Mesoplasma_florum_L1[265311].1 (Branch Length: 0.05592)
  - _URS000080DE91_2151/1-68_Mesoplasma_florum[2151].1 (Branch Length: 0.08277)
  - 0.340 (Branch Length: 0.03601)
    - 90_AE017263.1/668937-668875_Mesoplasma_florum_L1[265311].2 (Branch Length: 0.11049)
    - 81.3_AE017263.1/31976-32038_Mesoplasma_florum_L1[265311].3 (Branch Length: 0.31409)

        )

In [21]:
pdb_ids=get_pdb_ids_from_fam('RF01510')
pdb_ids

['3skz',
 '3slq',
 '3skt',
 '3skw',
 '3skl',
 '3skr',
 '3skl',
 '3slm',
 '3skw',
 '3ski',
 '3slm',
 '3ski',
 '3slq',
 '3skr',
 '3skz',
 '3skt']

In [22]:
molecules_dict={}
for pdb in pdb_ids:
    rna_mol=create_RNA_Molecule(pdb)
    molecules_dict[pdb]=rna_mol

molecules_dict
    

Downloading PDB structure '3skz'...
Downloading PDB structure '3slq'...
Downloading PDB structure '3skt'...
Downloading PDB structure '3skw'...
Downloading PDB structure '3skl'...
Downloading PDB structure '3skr'...
Downloading PDB structure '3skl'...
Downloading PDB structure '3slm'...
Downloading PDB structure '3skw'...
Downloading PDB structure '3ski'...
Downloading PDB structure '3slm'...
Downloading PDB structure '3ski'...
Downloading PDB structure '3slq'...
Downloading PDB structure '3skr'...
Downloading PDB structure '3skz'...
Downloading PDB structure '3skt'...


{'3skz': 3skz X-RAY DIFFRACTION NA,
 '3slq': 3slq X-RAY DIFFRACTION NA,
 '3skt': 3skt X-RAY DIFFRACTION NA,
 '3skw': 3skw X-RAY DIFFRACTION NA,
 '3skl': 3skl X-RAY DIFFRACTION NA,
 '3skr': 3skr X-RAY DIFFRACTION NA,
 '3slm': 3slm X-RAY DIFFRACTION NA,
 '3ski': 3ski X-RAY DIFFRACTION NA}

In [23]:
list_of_molecules=list(molecules_dict.values())
list_of_molecules

[3skz X-RAY DIFFRACTION NA,
 3slq X-RAY DIFFRACTION NA,
 3skt X-RAY DIFFRACTION NA,
 3skw X-RAY DIFFRACTION NA,
 3skl X-RAY DIFFRACTION NA,
 3skr X-RAY DIFFRACTION NA,
 3slm X-RAY DIFFRACTION NA,
 3ski X-RAY DIFFRACTION NA]

In [24]:
for molecule in list_of_molecules:
    fam_RF01510.add_RNA(molecule)

print(fam_RF01510)

-- Family: RF01510 (2dG-I)
	Type: Cis-reg; riboswitch;
	Members:
		3skz X-RAY DIFFRACTION NA
		3slq X-RAY DIFFRACTION NA
		3skt X-RAY DIFFRACTION NA
		3skw X-RAY DIFFRACTION NA
		3skl X-RAY DIFFRACTION NA
		3skr X-RAY DIFFRACTION NA
		3slm X-RAY DIFFRACTION NA
		3ski X-RAY DIFFRACTION NA
------------------------------------------------------------------------------------
				Tree:
- Internal (Branch Length: 0)
  - 87.4_AE017263.1/29965-30028_Mesoplasma_florum_L1[265311].1 (Branch Length: 0.05592)
  - _URS000080DE91_2151/1-68_Mesoplasma_florum[2151].1 (Branch Length: 0.08277)
  - 0.340 (Branch Length: 0.03601)
    - 90_AE017263.1/668937-668875_Mesoplasma_florum_L1[265311].2 (Branch Length: 0.11049)
    - 81.3_AE017263.1/31976-32038_Mesoplasma_florum_L1[265311].3 (Branch Length: 0.31409)



In [25]:
random_item=list(molecules_dict.items())[0]
random_rna_molecule_object=random_item[1]

random_rna_molecule_object

3skz X-RAY DIFFRACTION NA

In [26]:
random_rna_molecule_object.print_all()

3skz X-RAY DIFFRACTION NA
Model 0
ATOM 1 P G A 23 9.757 5.938 -71.994 P
ATOM 2 OP1 G A 23 10.429 6.308 -70.726 O
ATOM 3 OP2 G A 23 9.165 4.577 -72.145 O
ATOM 4 O5' G A 23 8.708 7.058 -72.412 O
ATOM 5 C5' G A 23 8.98 8.426 -72.154 C
ATOM 6 C4' G A 23 7.965 9.338 -72.806 C
ATOM 7 O4' G A 23 7.872 9.052 -74.232 O
ATOM 8 C3' G A 23 6.528 9.215 -72.321 C
ATOM 9 O3' G A 23 6.295 9.853 -71.072 O
ATOM 10 C2' G A 23 5.759 9.833 -73.48 C
ATOM 11 O2' G A 23 5.898 11.248 -73.469 O
ATOM 12 C1' G A 23 6.545 9.278 -74.677 C
ATOM 13 N9 G A 23 5.977 7.995 -75.157 N
ATOM 14 C8 G A 23 6.48 6.721 -74.978 C
ATOM 15 N7 G A 23 5.738 5.763 -75.484 N
ATOM 16 C5 G A 23 4.657 6.44 -76.04 C
ATOM 17 C6 G A 23 3.51 5.945 -76.735 C
ATOM 18 O6 G A 23 3.2 4.776 -77.023 O
ATOM 19 N1 G A 23 2.667 6.983 -77.108 N
ATOM 20 C2 G A 23 2.891 8.316 -76.857 C
ATOM 21 N2 G A 23 1.962 9.172 -77.304 N
ATOM 22 N3 G A 23 3.948 8.79 -76.221 N
ATOM 23 C4 G A 23 4.791 7.809 -75.842 C
ATOM 24 P C A 24 5.283 9.2 -69.994 P
ATOM 25 OP1 C A

In [27]:
newick_str = '''(87.4_AE017263.1/29965-30028_Mesoplasma_florum_L1[265311].1:0.05592,_URS000080DE91_2151/1-68_Mesoplasma_florum[2151].1:0.08277,(90_AE017263.1/668937-668875_Mesoplasma_florum_L1[265311].2:0.11049,81.3_AE017263.1/31976-32038_Mesoplasma_florum_L1[265311].3:0.31409)0.340:0.03601); '''
tree=Phylotree.from_newick(newick_str)
print(tree)

- Internal (Branch Length: 0)
  - 87.4_AE017263.1/29965-30028_Mesoplasma_florum_L1[265311].1 (Branch Length: 0.05592)
  - _URS000080DE91_2151/1-68_Mesoplasma_florum[2151].1 (Branch Length: 0.08277)
  - 0.340 (Branch Length: 0.03601)
    - 90_AE017263.1/668937-668875_Mesoplasma_florum_L1[265311].2 (Branch Length: 0.11049)
    - 81.3_AE017263.1/31976-32038_Mesoplasma_florum_L1[265311].3 (Branch Length: 0.31409)

