X-Git-Url: https://git.donarmstrong.com/?p=don.git;a=blobdiff_plain;f=genetics%2Fdbsnp_mirror.mdwn;h=5264a9078866b7ea5b39d033ca4017d9c395baf0;hp=a56c0adc0b63ce9f3a63b7ff0f5d420b8c8adf2b;hb=HEAD;hpb=b8e4f2b614cc4e34e90152b918ada47494b71da4 diff --git a/genetics/dbsnp_mirror.mdwn b/genetics/dbsnp_mirror.mdwn index a56c0ad..5264a90 100644 --- a/genetics/dbsnp_mirror.mdwn +++ b/genetics/dbsnp_mirror.mdwn @@ -5,12 +5,12 @@ Getting Files Assuming you are interested in humans (like I am) the following will download the databases and schemas for the current release of dbsnp. -For humans, this is currently ≈60G, and takes about a while to retrieve. - - for a in {organism,shared}_{data,schema}; do - lftp -c "open ftp.ncbi.nlm.nih.gov; cd /snp/organisms/human_9606/database/; mirror $a"; - done; +For humans, this is currently ≈60G, and takes a while to retrieve +(about 24 hours or so). + rsync -rvP --include 'organism_**' --exclude '**' rsync://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/database/ . + rsync -rvP rsync://ftp.ncbi.nlm.nih.gov/snp/database/shared_data . + rsync -rvP rsync://ftp.ncbi.nlm.nih.gov/snp/database/shared_schema . Preparing SQL Schemas --------------------- @@ -78,6 +78,45 @@ and constraints. It looks like the following: ${UTIL_DIR}/human_gty1_indexes_creation.pl index |psql snp; ) +Permissions on the database +--------------------------- + +Since I have my database on a server separate from the workstations +(and other machines) that I often do work on, I need remote access to +the database. To make this easy (and avoid having to hard code +database details into the few dozen scripts I use), I created a +postgresql service called *snp*. + +An entry like this: + + [snp] + dbname=snp + user=snpuser + password=somepassword + port=9212 + host=snpdb.donarmstrong.com + +in the pg_service.conf file (in `/etc/postgresql-common` or +`PGSYSCONFDIR`) will configure the service. + +You then need to make sure that the database server is listening on +the appropriate ip address (edit the database's `postgresql.conf` +file), and that *snpuser* has select privileges and can connect. +A line like the following in pg_hba.conf + + host snp snpuser 192.168.0.0/24 md5 + +and the following sql will set that up for you. + + CREATE USER snpuser WITH PASSWORD ('somepassword'); + GRANT SELECT ON ALL TABLES IN SCHEMA public TO snpuser; + +Then, to test, you should be able to run: + + psql "service=snp" -c 'SELECT * FROM snp LIMIT 5'; + +On another machine. + Querying the database --------------------- @@ -85,24 +124,24 @@ Once the process above has finished, you can actually query the database. For example, to select information about rs17849502 with its chromosome and position, you do something like the following: - SELECT scpr.snp_id AS snp_id, - scpr.chr AS chr, - scpr.pos AS pos, - scpr.orien AS orien, - scl.allele AS ref, - uv.var_str AS var_str, - ruv.var_str AS rev_var_str, - ml.locus_id AS uid, - ml.locus_symbol AS symbol, - gitn.gene_name AS description - FROM snp s - JOIN b135_snpchrposonref_37_3 scpr ON s.snp_id=scpr.snp_id - JOIN b135_snpcontigloc_37_3 scl ON scpr.snp_id=scl.snp_id - JOIN b135_contiginfo_37_3 ci ON scl.ctg_id = ci.ctg_id - LEFT OUTER JOIN b132_snpcontiglocusid_37_1 ml ON s.snp_id=ml.snp_id - LEFT OUTER JOIN geneidtoname gitn ON ml.locus_id=gitn.gene_id - JOIN univariation uv ON s.univar_id=uv.univar_id - JOIN univariation ruv ON uv.rev_univar_id=ruv.univar_id - WHERE ci.group_term LIKE 'GRCh%' AND s.snp_id='17849502'; + SELECT scpr.snp_id AS snp_id, + scpr.chr AS chr, + scpr.pos AS pos, + scpr.orien AS orien, + scl.allele AS ref, + uv.var_str AS var_str, + ruv.var_str AS rev_var_str, + ml.locus_id AS uid, + ml.locus_symbol AS symbol, + gitn.gene_name AS description + FROM snp s + JOIN b135_snpchrposonref_37_3 scpr ON s.snp_id=scpr.snp_id + JOIN b135_snpcontigloc_37_3 scl ON scpr.snp_id=scl.snp_id + JOIN b135_contiginfo_37_3 ci ON scl.ctg_id = ci.ctg_id + LEFT OUTER JOIN b132_snpcontiglocusid_37_1 ml ON s.snp_id=ml.snp_id + LEFT OUTER JOIN geneidtoname gitn ON ml.locus_id=gitn.gene_id + JOIN univariation uv ON s.univar_id=uv.univar_id + JOIN univariation ruv ON uv.rev_univar_id=ruv.univar_id + WHERE ci.group_term LIKE 'GRCh%' AND s.snp_id='17849502'; [I personally use this very query in a program called snp_info, which I'll probably share later.]