Difference between revisions of "HowTo"
From salvaEwiki
(→In bash, at Moria) |
|||
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | Successful tiny examples. | ||
+ | |||
==In bash, at Moria== | ==In bash, at Moria== | ||
Line 9: | Line 11: | ||
Extract the first column: | Extract the first column: | ||
awk '{print $1}' sizes.out > sizesQ.out | awk '{print $1}' sizes.out > sizesQ.out | ||
+ | |||
+ | Get the lengths of the reads of a fastq file (after [http://onetipperday.sterding.com/2012/05/simple-way-to-get-reads-length.html this]): | ||
+ | cat /Users/Moria/Desktop/g043.refined.1.fq | awk '{if(NR%4==2) print length($1)}' > /Users/Moria/Desktop/g043.refined.1.EXTRACT.fq | ||
==At CUBE== | ==At CUBE== | ||
Line 15: | Line 20: | ||
for file in /proj/genomes/Thiobios/data/ThiobiosMAGs/* ; do gc $file >> gc.out ; done | for file in /proj/genomes/Thiobios/data/ThiobiosMAGs/* ; do gc $file >> gc.out ; done | ||
awk '{print $4}' gc.out > gcQ.out | awk '{print $4}' gc.out > gcQ.out | ||
+ | |||
+ | Assess completeness, contamination and heterogeneity of the genomes in a folder: | ||
+ | checkm lineage_wf -t 8 /proj/genomes/Thiobios/results/2017_08_24_checkM/data /proj/genomes/Thiobios/results/2017_08_24_checkM/therest.checkm --tab_table --file therest.checkm.out | ||
+ | |||
+ | Check for tRNAs of the genomes in a folder: | ||
+ | cd /proj/genomes/Thiobios/data/ThiobiosMAGs | ||
+ | for file in ./* ; do tRNAscan-SE -B $file -o /proj/genomes/Thiobios/results/2017_08_25_tRNAscan-SE/$file.tRNAscan-SE.out ; done | ||
+ | |||
+ | ==With R (and RStudio)== | ||
+ | Extract a part of a sequence, using ''ape'': | ||
+ | s2c(c2s(as.matrix(g43Z2[1])[44214:47213])) | ||
+ | |||
+ | Build a Maximum Likelihood tree, using ''ape'' and ''phangorn'': | ||
+ | ali.16SB<-as.phyDat(ssuAlignB[c(2:10,1),segSitB]) | ||
+ | dist.16SB<-dist.ml(ali.16SB) | ||
+ | tree.16S.njB<-root(NJ(dist.16SB),10) | ||
+ | mod.16SB<-modelTest(ali.16SB,model="all",multicore=TRUE) | ||
+ | env.16SB<-attr(mod.16SB,"env") | ||
+ | fitStart.16SB<-eval(get(mod.16SB$Model[which.min(mod.16SB$BIC)],env.16SB),env.16SB) # mod.16SB$Model[which.min(mod.16SB$BIC)]="K80" | ||
+ | fitNJ.16SB<-pml(tree.16S.njB,ali.16SB) | ||
+ | fit.16SB<-optim.pml(fitNJ.16SB,rearrangement="stochastic",model="K80",optInv=FALSE,optGamma=FALSE) | ||
+ | bs.16SB<-bootstrap.pml(fit.16SB,bs=1000,optNni=TRUE,multicore=TRUE) | ||
+ | plotBS(fit.16SB$tree,bs.16SB,p=50,type="p",bs.adj=c(1.2,-.7)) | ||
+ | add.scale.bar() |
Latest revision as of 21:38, 6 June 2018
Successful tiny examples.
In bash, at Moria[edit]
Generate the md5 signatures of the files in a folder:
for file in /Volumes/MBL21/A_TREASURY/012_A_TRASURY_ThiobiosGenomes/* ; do md5 -q $file >> resultsQ.out ; done
Generate the sizes of the files in a folder:
for file in /Volumes/MBL21/A_TREASURY/012_A_TRASURY_ThiobiosGenomes/* ; do wc -c $file >> sizes.out ; done
Extract the first column:
awk '{print $1}' sizes.out > sizesQ.out
Get the lengths of the reads of a fastq file (after this):
cat /Users/Moria/Desktop/g043.refined.1.fq | awk '{if(NR%4==2) print length($1)}' > /Users/Moria/Desktop/g043.refined.1.EXTRACT.fq
At CUBE[edit]
GC contents of the files in a folder:
for file in /proj/genomes/Thiobios/data/ThiobiosMAGs/* ; do gc $file >> gc.out ; done awk '{print $4}' gc.out > gcQ.out
Assess completeness, contamination and heterogeneity of the genomes in a folder:
checkm lineage_wf -t 8 /proj/genomes/Thiobios/results/2017_08_24_checkM/data /proj/genomes/Thiobios/results/2017_08_24_checkM/therest.checkm --tab_table --file therest.checkm.out
Check for tRNAs of the genomes in a folder:
cd /proj/genomes/Thiobios/data/ThiobiosMAGs for file in ./* ; do tRNAscan-SE -B $file -o /proj/genomes/Thiobios/results/2017_08_25_tRNAscan-SE/$file.tRNAscan-SE.out ; done
With R (and RStudio)[edit]
Extract a part of a sequence, using ape:
s2c(c2s(as.matrix(g43Z2[1])[44214:47213]))
Build a Maximum Likelihood tree, using ape and phangorn:
ali.16SB<-as.phyDat(ssuAlignB[c(2:10,1),segSitB]) dist.16SB<-dist.ml(ali.16SB) tree.16S.njB<-root(NJ(dist.16SB),10) mod.16SB<-modelTest(ali.16SB,model="all",multicore=TRUE) env.16SB<-attr(mod.16SB,"env") fitStart.16SB<-eval(get(mod.16SB$Model[which.min(mod.16SB$BIC)],env.16SB),env.16SB) # mod.16SB$Model[which.min(mod.16SB$BIC)]="K80" fitNJ.16SB<-pml(tree.16S.njB,ali.16SB) fit.16SB<-optim.pml(fitNJ.16SB,rearrangement="stochastic",model="K80",optInv=FALSE,optGamma=FALSE) bs.16SB<-bootstrap.pml(fit.16SB,bs=1000,optNni=TRUE,multicore=TRUE) plotBS(fit.16SB$tree,bs.16SB,p=50,type="p",bs.adj=c(1.2,-.7)) add.scale.bar()