+ \item Spaced seeds are longer seeds in which only a subset of the
+ positions are used
+ \item For example, if
+ \begin{itemize}
+ \item the sequence was ABCDEFGHI
+ \item the seed shape was
+ 11100010
+ \item then you would query into the index with ABCG
+ \end{itemize}
+ \item Originally presented in PatternHunter\cite{Ma.ea2002:PatternHunterfasterandmore}
+ \item Why is this better than consecutive seeds?
+\end{itemize}
+\end{frame}
+
+\begin{frame}{Consecutive Seeds vs Spaced Seeds}
+ \begin{itemize}
+ \item Target Sequence: ABCDEFGHIJK
+ \item Sequenced Sequence: ABC\textcolor{red}{Z}EF\textcolor{red}{Y}HI\textcolor{red}{X}K
+ \item Seed Shape: 11100010 (4) and Consecutive: 1111 (4)
+ \end{itemize}
+% ABCZEFGHIYK
+% ABCDEFGHIJK
+% 11000010---
+% -11000010--
+% --11000010-
+% ---11000010
+
+ \begin{columns}
+ \column{0.6\textwidth}
+\begin{block}{Pathological example}
+ \begin{tabular}{c c c}
+ Shift & Spaced & Consecutive \\
+ 0 & ABCF=ABCF & ABCD≠ABCZ \\
+ 1 & BCDG≠BCZY & BCDE≠BCZE \\
+ 2 & CDEH≠CZEH & CDEF≠CZEF \\
+ 3 & DEFI≠ZEFI & DEFG≠ZEFY \\
+ 4 & EFGJ≠EFYW & EFGH≠EFYH \\
+ 5 & FGHK≠FYHK & FGHI≠FYHI \\
+ 6 & & GHIJ≠YHIW \\
+ 7 & & HIJK≠HIWK \\
+ \end{tabular}
+\end{block}
+ \column{0.4\textwidth}
+ \begin{itemize}
+ \item Spaced seed matches once
+ \item Consecutive seed never matches
+ \item Consecutive seed does more comparisons and may match
+ repeatedly
+ \end{itemize}
+\end{columns}
+\end{frame}
+
+\begin{frame}{Optimal Spaced Seed}
+ \begin{itemize}
+ \item Fewest overlaps with shifted seed
+ \item Longer seeds are better
+ \item Equivalent weight
+ \item Use dynamic programming to calculate optimal seed for given
+ length
+ \end{itemize}
+ \begin{columns}
+ \column{0.6\textwidth}
+ \begin{block}{DIAMOND Seeds (Fast)}
+ \begin{itemize}
+ \item 111101011101111 (12)
+ \item 111011001100101111 (12)
+ \item 1111001001010001001111 (12)
+ \item 111100101000010010010111 (12)
+ \end{itemize}
+ \end{block}
+\end{columns}
+\end{frame}
+
+\begin{frame}{Double Indexing}
+ \begin{itemize}
+ \item Blastx indexes the database
+ \item Blastx runs the queries in input order
+ \item DIAMOND indexes both the database and the queries
+ \item DIAMOND runs queries in index order
+ \item Why is this faster?
+ \end{itemize}
+\end{frame}
+
+\begin{frame}{Double Indexing: Why it's faster}
+ \begin{itemize}
+ \item Cache architecture
+ \begin{itemize}
+ \item On CPU Cache -- L1,L2
+ \item Shared CPU Cache L3
+ \item Much faster than main memory
+ \end{itemize}
+ \item Each cache miss must hit main memory (must hit northbridge,
+ which has significantly more latency than main cache, and takes
+ hundreds of cycles)
+ \item Dictionary Example: Is it faster to look up
+ \begin{itemize}
+ \item “apple”, “xylophone”, “appliance”, “xylem”
+ \item or “apple”, “appliance”, “xylem”, “xylophone”?
+ \end{itemize}
+ \end{itemize}
+\end{frame}
+
+\section{Usage}
+
+\begin{frame}{DIAMOND Usage}
+ \begin{itemize}
+ \item Make the diamond database:
+ \texttt{diamond makedb --in foo.fasta --db foo.dmnd;}
+ \item Run the diamond query:
+ \texttt{diamond blastx --db foo.diamond --threads 24 --query bar.fasta --daa bar\_diamond.txt}
+ \end{itemize}