update kinetic formalism with program notes and add omer points

[ool/lipid_simulation_formalism.git] / kinetic_formalism.Rnw
diff --git a/kinetic_formalism.Rnw b/kinetic_formalism.Rnw

index afd5984a9bc79ceee84952b0f991e12ac80d50ee..c3541f44b57fbf7cb75b290ae7213cb4387140ad 100644 (file)
--- a/kinetic_formalism.Rnw
+++ b/kinetic_formalism.Rnw
@@ -81,9 +81,10 @@ $\left[C_{i_\mathrm{monomer}}\right]$, the surface area of the vesicle
  $S_\mathrm{ves}$, the base backwards kinetic parameter for the $i$th
  specie $k_{bi}$ which is also dependent on lipid type, its adjustment
  parameter $k_{bi\mathrm{adj}}$ (see~\fref{eq:kb_adj}), and the molar
-concentration of the $i$th specie in the vesicle $C_{i_\mathrm{ves}}$,
-the change in concentration of the $i$th specie in the vesicle per
-change in time $\frac{d C_{i_\mathrm{ves}}}{dt}$ can be calculated:
+concentration of the $i$th specie in the vesicle
+$\left[C_{i_\mathrm{ves}}\right]$, the change in concentration of the
+$i$th specie in the vesicle per change in time $\frac{d
+  C_{i_\mathrm{ves}}}{dt}$ can be calculated:
  
  \begin{equation}
    \frac{d C_{i_\mathrm{ves}}}{dt} = k_{fi}k_{fi\mathrm{adj}}\left[C_{i_\mathrm{monomer}}\right]S_\mathrm{ves} -
@@ -111,6 +112,7 @@ $\mathrm{n}$, Thus, we have
  The 1000 isn't in \fref{eq:state} above, because it is unit-dependent.
  
  \subsection{Forward adjustments ($k_{fi\mathrm{adj}}$)}
+\label{sec:forward_adj}
  
  The forward rate constant adjustment, $k_{fi\mathrm{adj}}$ takes into
  account unsaturation ($un_f$), charge ($ch_f$), curvature ($cu_f$),
@@ -639,6 +641,175 @@ rm(grid)
  @ 
  
  
+\section{Simulation Methodology}
+
+\subsection{Overall Architecture}
+
+The simulation is currently run by single program written in perl
+using various different modules to handle the subsidiary parts. It
+produces output for each generation, including the step immediately
+preceeding and immediately following a vesicle split (and optionally,
+each step) that is written to a state file which contains the state of
+the vesicle, environment, kinetic parameters, program invocation
+options, time, and various other parameters necessary to recreate the
+state vector at a given time. This output file is then read by a
+separate program in perl to produce different output which is
+generated out-of-band; output which includes graphs and statistical
+analysis is performed using R (and various grid graphics modules)
+called from the perl program.
+
+The separation of simulation and output generation allows refining
+output, and simulations performed on different versions of the
+underlying code to be compared using the same analysis methods and
+code. It also allows later simulations to be restarted from a specific
+generation, utilizing the same environment.
+
+Random variables of different distributions are calculated using
+Math::Random, which is seeded for each run using entropy from the
+linux kernel's urandom device.
+
+\subsection{Environment Creation}
+
+\subsubsection{Components}
+The environment contains different concentrations of different
+components. In the current set of simulations, there are
+\Sexpr{1+4*5*7} different components, consisting of PC, PE, PS, SM,
+and CHOL, with all lipids except for CHOL having 5 possible
+unsaturations rangiong from 0 to 4, and 7 lengths from $12,14,...,22$
+($7\cdot 5\cdot4+1=\Sexpr{1+4*5*7}$). In cases where the environment
+has less than the maximum number of components, the components are
+selected in order without replacement from a randomly shuffled deck of
+component (with the exception of CHOL) represented once until the
+desired number of unique components are obtained. CHOL is over
+representated $\Sexpr{5*7}$ times to be at the level of other lipid
+types, ensures that the probability of CHOL being asbent in the
+environment is the same as the probability of one of the other lipid
+types (PS, PE, etc.) being entirely absent. This reduces the number of
+simulations with a small number of components which are entirely
+devoid of CHOL.
+
+\subsubsection{Concentration}
+Once the components of the environment have been selected, the
+concentration of thoes components is determined. In experiments where
+the environmental concentration is the same across all lipid
+components, the concentration is set to $10^{-10}\mathrm{M}$. In other
+cases, the environmental concentration is set to a random number from
+a gamma distribution with shape parameter 2 and an average of
+$10^{-10}\mathrm{M}$. The base concentration ($10^{-10}\mathrm{M}$)
+can also be altered in the initialization of the experiment to
+specific values for specific lipid types or components.
+
+\subsection{Initial Vesicle Creation}
+
+Initial vesicles are seeded by selecting lipid molecules from the
+environment until the vesicle reaches a specific starting size. The
+vesicle starting size has gamma distribution with shape parameter 2
+and a mean of the experimentally specified starting size, with a
+minimum of 5 lipid molecules. Lipid molecules are then selected to be
+added to the lipid membrane according to four different methods. In
+the constant method, molecules are added in direct proportion to their
+concentration in the environment. The uniform method adds molecules in
+proportion to their concentration in the environment scaled by a
+uniform random value, whereas the random method adds molecules in
+proportion to a uniform random value. The final method is a binomial
+method, which adjust the porbability of adding a molecule of a
+specific component by the concentration of that component, and then
+adds the molecules one by one to the membrane. This final method is
+also used in the first three methods to add any missing molecules to
+the starting vesicle which were unallocated due to the requirement to
+add integer numbers of molecules.
+
+\subsection{Simulation Step}
+
+Once the environment has been created and the initial vesicle has been
+formed, molecules join and leave the vesicle based on the kinetic
+parameters and state equation discussed until the vesicle splits
+forming two daughter vesicles, one of which the program continues to
+follow.
+
+\subsubsection{Calculation of Vesicle Properties}
+
+\label{sec:ves_prop}
+$S_\mathrm{ves}$ is the surface area of the vesicle, and is the sum of
+the surface area of all of the individual lipid components; each lipid
+type has a different surface area; we na\"ively assume that the lipid
+packing is optimal, and there is no empty space.
+
+\subsubsection{Joining and Leaving of Lipid Molecules}
+
+Determining the number of molecules to add to the lipid membrane
+($n_i$) requires knowing $k_{fi\mathrm{adj}}$, the surface area of the
+vesicle $S_\mathrm{ves}$ (see~\fref{sec:ves_prop}), the time interval
+$dt$ during which lipids are added, the base $k_{fi}$, and the
+concentration of the monomer in the environment
+$\left[C_{i\mathrm{ves}}\right]$ (see~\fref{eq:state}).
+$k_{fi\mathrm{adj}}$ is calculated (see~\fref{eq:kf_adj}) based on the
+vesicle properties and their hypothesized effect on the rate (in as
+many cases as possible, experimentally based)
+(see~\fref{sec:forward_adj} for details). $dt$ can be varied
+(see~\fref{sec:step_duration}), but for a given step is constant. This
+leads to the following:
+
+$n_i = k_{fi}k_{fi\mathrm{adj}}\left[C_{i_\mathrm{monomer}}\right]S_\mathrm{ves}dt\mathrm{NA}$
+
+In the cases where $n_i > 1$, the integer number of molecules is
+added. Fractional $n_i$ or the fractional remainder after the addition
+of the integer molecules are the probability of adding a specific
+molecule, and are compared to a uniformly distributed random value
+between 0 and 1. If the random value is less than or equal to the
+fractional part of $n_i$, an additional molecule is added.
+
+Molecules leaving the vesicle are handled in a similar manner, with 
+
+$n_i = k_{bi}k_{bi\mathrm{adj}}C_{i_\mathrm{ves}}dt\mathrm{NA}$.
+
+While programatically, the molecule removal happens after the
+addition, the properties that each operates on are the same, so they
+can be considered to have been added and removed at the same instant.
+[This also avoids cases where a removal would have resulted in
+negative molecules for a particular type, which is obviously
+disallowed.]
+
+\subsubsection{Step duration}
+\label{sec:step_duration}
+
+$dt$ is the time taken for each step of joining, leaving, and checking
+split. In the current implementation, it starts with a value of
+$10^{-6}\mathrm{s}$ but this is modified in between each step if the
+number of molecules joining or leaving is too large or small. If more
+than half of the starting vesicle size molecules join or leave in a
+single step, $dt$ is reduced by half. If less than the starting
+vesicle size molecules join or leave in 100 steps, $dt$ is doubled.
+This is necessary to curtail run times and to automatically adjust to
+different experimental runs.
+
+(In every run seen so far, the initial $dt$ was too small, and was
+increased before the first generation occured; at no time was $dt$ too
+large.)
+
+\subsubsection{Vesicle splitting}
+
+If a vesicle has grown to a size which is more than double the
+starting vesicle size, the vesicle splits. More elaborate mechanisms
+for determining whether a vesicle should split are of course possible,
+but not currently implemented. Molecules are assigned to the daughter
+vesicles at random, with each daughter vesicle having an equal chance
+of getting a single molecule. The number of molecules to assign to the
+first vesicle has binomial distribution with a probability of an event
+in each trial of 0.5 and a number of trials equal to the number of
+molecules.
+
+\subsection{Output}
+
+The environment, initial vesicle, and the state of the vesicle
+immediately before and immediately after splitting are stored to disk
+to produce later output.
+
+\section{Analyzing output}
+
+\subsection{PCA plots}
+
+\subsection{Carpet plots}