1 [[!meta title="Finding out Cytobands/Idiograms for assemblies"]]
3 In many organisms it is common to use
4 [idiograms](https://secure.wikimedia.org/wikipedia/en/wiki/Cytogenetics#Advent_of_banding_techniques)
5 or cytobands which provide information on approximately where
6 something is located on a chromosome in reference to the chromosome's
7 larger structure, or when exact locations are not required.
9 Until recently, I didn't know where [NCBI](http://ncbi.nlm.nih.gov)
10 kept their idiogram annotations, which made my
11 [[mirror of dbsnp|genetics/dbsnp_mirror/]] (which I use to annotate my
12 whole genome analyses) slightly less useful than it could have been.
13 But, after a bit of searching of NCBI's ftp site, I was able to locate
15 [movie directory](ftp://ftp.ncbi.nlm.nih.gov/genomes/MapView/Homo_sapiens/objects/current/initial_release):
16 `ideogram_9606_GCF_000001305.13_850_V1`.
18 Then, a quick bit of work with SQL, I have the following schema:
20 CREATE TABLE idiogram (
23 idiogram TEXT NOT NULL,
24 -- I think these are related to recombination rates, but I'm not sure
29 -- I believe this indicates whether the band is black or white
33 CREATE UNIQUE INDEX ON idiogram(chr,pq,ideogram);
34 CREATE UNIQUE INDEX ON idiogram(chr,start);
35 CREATE UNIQUE INDEX ON idiogram(chr,stop);
37 and an additional bit of SQL in my SNP annotation perl script:
39 SELECT CONCAT(chr,pq,idiogram) AS idiogram
41 WHERE idiogram.chr = ? AND idiogram.start <= ? AND idiogram.stop < ? LIMIT 1;
49 my $rv = $param{sth}->execute($param{chr},$param{pos},$param{pos}) //
50 die "Unable to execute statement properly: ".$param{dbh}->errstr;
51 my ($idiogram) = map {ref $_ ?@{$_}:()} map {ref $_ ?@{$_}:()} $param{sth}->fetchall_arrayref([0]);
52 if ($param{sth}->err) {
53 print STDERR $param{sth}->errstr;
58 return $idiogram // 'NA';
63 | id | chr | pos | idiogram | ref | alt | orig_id | gene | [...] |
64 |------------|-----|----------|----------|-----|-----|------------|--------|-------|
65 | rs10000010 | 4 | 21618674 | 4p16.3 | T | C | rs10000010 | KCNIP4 | [...] |
67 idiograms for every SNP.
70 [[!tag genetics snp biology tech]]