From 0aaf1bc43b61120a6f93ebc334942780e497e24d Mon Sep 17 00:00:00 2001
From: don <don@25fa0111-c432-4dab-af88-9f31a2f6ac42>
Date: Mon, 9 Aug 2010 06:34:10 +0000
Subject: [PATCH] update kinetic formalism with program notes and add omer
 points

git-svn-id: svn+ssh://hemlock.ucr.edu/srv/svn/misc/trunk/origins_of_life@595 25fa0111-c432-4dab-af88-9f31a2f6ac42
---
 kinetic_formalism.Rnw | 177 +++++++++++++++++++++++++++++++++++++++++-
 omer_points.txt       |  28 +++++++
 2 files changed, 202 insertions(+), 3 deletions(-)
 create mode 100644 omer_points.txt

diff --git a/kinetic_formalism.Rnw b/kinetic_formalism.Rnw
index afd5984..c3541f4 100644
--- a/kinetic_formalism.Rnw
+++ b/kinetic_formalism.Rnw
@@ -81,9 +81,10 @@ $\left[C_{i_\mathrm{monomer}}\right]$, the surface area of the vesicle
 $S_\mathrm{ves}$, the base backwards kinetic parameter for the $i$th
 specie $k_{bi}$ which is also dependent on lipid type, its adjustment
 parameter $k_{bi\mathrm{adj}}$ (see~\fref{eq:kb_adj}), and the molar
-concentration of the $i$th specie in the vesicle $C_{i_\mathrm{ves}}$,
-the change in concentration of the $i$th specie in the vesicle per
-change in time $\frac{d C_{i_\mathrm{ves}}}{dt}$ can be calculated:
+concentration of the $i$th specie in the vesicle
+$\left[C_{i_\mathrm{ves}}\right]$, the change in concentration of the
+$i$th specie in the vesicle per change in time $\frac{d
+  C_{i_\mathrm{ves}}}{dt}$ can be calculated:
 
 \begin{equation}
   \frac{d C_{i_\mathrm{ves}}}{dt} = k_{fi}k_{fi\mathrm{adj}}\left[C_{i_\mathrm{monomer}}\right]S_\mathrm{ves} -
@@ -111,6 +112,7 @@ $\mathrm{n}$, Thus, we have
 The 1000 isn't in \fref{eq:state} above, because it is unit-dependent.
 
 \subsection{Forward adjustments ($k_{fi\mathrm{adj}}$)}
+\label{sec:forward_adj}
 
 The forward rate constant adjustment, $k_{fi\mathrm{adj}}$ takes into
 account unsaturation ($un_f$), charge ($ch_f$), curvature ($cu_f$),
@@ -639,6 +641,175 @@ rm(grid)
 @ 
 
 
+\section{Simulation Methodology}
+
+\subsection{Overall Architecture}
+
+The simulation is currently run by single program written in perl
+using various different modules to handle the subsidiary parts. It
+produces output for each generation, including the step immediately
+preceeding and immediately following a vesicle split (and optionally,
+each step) that is written to a state file which contains the state of
+the vesicle, environment, kinetic parameters, program invocation
+options, time, and various other parameters necessary to recreate the
+state vector at a given time. This output file is then read by a
+separate program in perl to produce different output which is
+generated out-of-band; output which includes graphs and statistical
+analysis is performed using R (and various grid graphics modules)
+called from the perl program.
+
+The separation of simulation and output generation allows refining
+output, and simulations performed on different versions of the
+underlying code to be compared using the same analysis methods and
+code. It also allows later simulations to be restarted from a specific
+generation, utilizing the same environment.
+
+Random variables of different distributions are calculated using
+Math::Random, which is seeded for each run using entropy from the
+linux kernel's urandom device.
+
+\subsection{Environment Creation}
+
+\subsubsection{Components}
+The environment contains different concentrations of different
+components. In the current set of simulations, there are
+\Sexpr{1+4*5*7} different components, consisting of PC, PE, PS, SM,
+and CHOL, with all lipids except for CHOL having 5 possible
+unsaturations rangiong from 0 to 4, and 7 lengths from $12,14,...,22$
+($7\cdot 5\cdot4+1=\Sexpr{1+4*5*7}$). In cases where the environment
+has less than the maximum number of components, the components are
+selected in order without replacement from a randomly shuffled deck of
+component (with the exception of CHOL) represented once until the
+desired number of unique components are obtained. CHOL is over
+representated $\Sexpr{5*7}$ times to be at the level of other lipid
+types, ensures that the probability of CHOL being asbent in the
+environment is the same as the probability of one of the other lipid
+types (PS, PE, etc.) being entirely absent. This reduces the number of
+simulations with a small number of components which are entirely
+devoid of CHOL.
+
+\subsubsection{Concentration}
+Once the components of the environment have been selected, the
+concentration of thoes components is determined. In experiments where
+the environmental concentration is the same across all lipid
+components, the concentration is set to $10^{-10}\mathrm{M}$. In other
+cases, the environmental concentration is set to a random number from
+a gamma distribution with shape parameter 2 and an average of
+$10^{-10}\mathrm{M}$. The base concentration ($10^{-10}\mathrm{M}$)
+can also be altered in the initialization of the experiment to
+specific values for specific lipid types or components.
+
+\subsection{Initial Vesicle Creation}
+
+Initial vesicles are seeded by selecting lipid molecules from the
+environment until the vesicle reaches a specific starting size. The
+vesicle starting size has gamma distribution with shape parameter 2
+and a mean of the experimentally specified starting size, with a
+minimum of 5 lipid molecules. Lipid molecules are then selected to be
+added to the lipid membrane according to four different methods. In
+the constant method, molecules are added in direct proportion to their
+concentration in the environment. The uniform method adds molecules in
+proportion to their concentration in the environment scaled by a
+uniform random value, whereas the random method adds molecules in
+proportion to a uniform random value. The final method is a binomial
+method, which adjust the porbability of adding a molecule of a
+specific component by the concentration of that component, and then
+adds the molecules one by one to the membrane. This final method is
+also used in the first three methods to add any missing molecules to
+the starting vesicle which were unallocated due to the requirement to
+add integer numbers of molecules.
+
+\subsection{Simulation Step}
+
+Once the environment has been created and the initial vesicle has been
+formed, molecules join and leave the vesicle based on the kinetic
+parameters and state equation discussed until the vesicle splits
+forming two daughter vesicles, one of which the program continues to
+follow.
+
+\subsubsection{Calculation of Vesicle Properties}
+
+\label{sec:ves_prop}
+$S_\mathrm{ves}$ is the surface area of the vesicle, and is the sum of
+the surface area of all of the individual lipid components; each lipid
+type has a different surface area; we na\"ively assume that the lipid
+packing is optimal, and there is no empty space.
+
+\subsubsection{Joining and Leaving of Lipid Molecules}
+
+Determining the number of molecules to add to the lipid membrane
+($n_i$) requires knowing $k_{fi\mathrm{adj}}$, the surface area of the
+vesicle $S_\mathrm{ves}$ (see~\fref{sec:ves_prop}), the time interval
+$dt$ during which lipids are added, the base $k_{fi}$, and the
+concentration of the monomer in the environment
+$\left[C_{i\mathrm{ves}}\right]$ (see~\fref{eq:state}).
+$k_{fi\mathrm{adj}}$ is calculated (see~\fref{eq:kf_adj}) based on the
+vesicle properties and their hypothesized effect on the rate (in as
+many cases as possible, experimentally based)
+(see~\fref{sec:forward_adj} for details). $dt$ can be varied
+(see~\fref{sec:step_duration}), but for a given step is constant. This
+leads to the following:
+
+$n_i = k_{fi}k_{fi\mathrm{adj}}\left[C_{i_\mathrm{monomer}}\right]S_\mathrm{ves}dt\mathrm{NA}$
+
+In the cases where $n_i > 1$, the integer number of molecules is
+added. Fractional $n_i$ or the fractional remainder after the addition
+of the integer molecules are the probability of adding a specific
+molecule, and are compared to a uniformly distributed random value
+between 0 and 1. If the random value is less than or equal to the
+fractional part of $n_i$, an additional molecule is added.
+
+Molecules leaving the vesicle are handled in a similar manner, with 
+
+$n_i = k_{bi}k_{bi\mathrm{adj}}C_{i_\mathrm{ves}}dt\mathrm{NA}$.
+
+While programatically, the molecule removal happens after the
+addition, the properties that each operates on are the same, so they
+can be considered to have been added and removed at the same instant.
+[This also avoids cases where a removal would have resulted in
+negative molecules for a particular type, which is obviously
+disallowed.]
+
+\subsubsection{Step duration}
+\label{sec:step_duration}
+
+$dt$ is the time taken for each step of joining, leaving, and checking
+split. In the current implementation, it starts with a value of
+$10^{-6}\mathrm{s}$ but this is modified in between each step if the
+number of molecules joining or leaving is too large or small. If more
+than half of the starting vesicle size molecules join or leave in a
+single step, $dt$ is reduced by half. If less than the starting
+vesicle size molecules join or leave in 100 steps, $dt$ is doubled.
+This is necessary to curtail run times and to automatically adjust to
+different experimental runs.
+
+(In every run seen so far, the initial $dt$ was too small, and was
+increased before the first generation occured; at no time was $dt$ too
+large.)
+
+\subsubsection{Vesicle splitting}
+
+If a vesicle has grown to a size which is more than double the
+starting vesicle size, the vesicle splits. More elaborate mechanisms
+for determining whether a vesicle should split are of course possible,
+but not currently implemented. Molecules are assigned to the daughter
+vesicles at random, with each daughter vesicle having an equal chance
+of getting a single molecule. The number of molecules to assign to the
+first vesicle has binomial distribution with a probability of an event
+in each trial of 0.5 and a number of trials equal to the number of
+molecules.
+
+\subsection{Output}
+
+The environment, initial vesicle, and the state of the vesicle
+immediately before and immediately after splitting are stored to disk
+to produce later output.
+
+\section{Analyzing output}
+
+\subsection{PCA plots}
+
+\subsection{Carpet plots}
 
 
 
diff --git a/omer_points.txt b/omer_points.txt
new file mode 100644
index 0000000..d5ed105
--- /dev/null
+++ b/omer_points.txt
@@ -0,0 +1,28 @@
+
+
+Points for Pap318 comparing R-GARD to A-GARD
+
+0. A main goal is to compare the two abovementioned GARD models and point
+out similarities and differences. We note the constraints on such a
+comparison, as the models are inherently different in some respects. But a
+later part of the paper will also deal with analyses and properties of R-
+GARD irrespective of such comparison.
+
+1. To better compare R-GARD with A- GARD, IL will generate A-GARD carpets
+using Euclidean distance instead of the dot product (H), which we currently
+use.  We will also compare internally the difference between these two
+metrics when applied to the same data.
+
+2.  We will also calculate the compotype counts of A-GARD with kmeans
+clustering by using Euclidean distance instead of dot product.  We should
+check how this change affects the resulting number of compotypes. If for
+some reason this does not work easily, we will try a more phenomenological
+comparison.
+
+3.  Choose carpets from R-Gard results for us to compare with similar
+carpets from A-GARD.  These carpets should exhibit repeating compotypes,
+drift, and composomes.
+
+4.  Calculate and display statistics to emphasize the dependence of various
+results on the parameters of R-GARD.  Example:  number of compotypes
+dependency on split size.
-- 
2.39.2