This product includes software developed by the Apache Software Foundation (http://www.apache.org).

This product includes software developed by the University of California, Berkeley and its contributors.

Arachne 2.0 Manual

Overview
About your data
Installation
Preparing your data for assembly
Running Arachne
Output
Generating ace files
Contacting us

Overview

Arachne is a tool for assembling genome sequence from whole-genome shotgun reads, mostly in forward-reverse pairs obtained by sequencing clone ends.

As input, Arachne requires the base calls and associated quality scores of each read (as produced by most base-calling software, such as PHRED), as well as ancillary information about each read (in a standard format described herein).

As output, Arachne produces a list of supercontigs ("scaffolds") -- each of which consists of an ordered list of contigs, all forward-oriented, and estimates for the gaps between them within the supercontig. Base calls and quality scores are provided for each contig, along with the approximate locations of the reads used to build it. Arachne also produces a summary and brief analysis of the assembly.

Many of Arachne's algorithms are described in "ARACHNE: A Whole-Genome Shotgun Assembler", Genome Research, January 2002, and "Whole-Genome Sequence Assembly for Mammalian Genomes: ARACHNE 2", Genome Research, January 2003.

About your data

We explain here the assumptions Arachne makes about your sequence reads.

These reads come from the entire genome of an organism or a cloned fragment thereof (but not both simultaneously), which we call the target. The target should be from a single haplotype: Arachne does not support the assembly of polymorphic data at this time.

Regardless of the target, Arachne assumes most sequence reads have been obtained randomly from it, either as single reads ("unpaired production reads") or as clone-end pairs ("paired production reads"). The bulk of reads provided to Arachne should be pairs from the latter category, because this pairing information is needed to assemble correctly.

To use Arachne, make a sensible division of your sequence reads into libraries (to use the term loosely), preferably placing similar reads together. Thus, there may be several libraries of paired production reads, each having different insert length statistics (mean and standard deviation, which must be the same for all reads in a single library). These library statistics must be provided to Arachne via the ancillary data files or the configuration file, as will be described later.

Arachne does have limited support for transposon-based read pairs, but does not handle other types of finishing reads at this time. If such reads are presented to the program as unpaired production reads, the performance may be acceptable, but Arachne will treat the reads as though they were obtained randomly from the genome.

Installation

Arachne is available as compiled binaries for a single platform: Compaq Alpha hardware, running Tru64 Unix, operating system version 5.1. This exact platform is required.

The Arachne source code, while unsupported, is available at ftp://ftp.broad.mit.edu/pub/wga/Arachne/Arachne_src.tar.gz.

Here is the procedure for installing Arachne from the compiled binaries:

Step 1: Install prerequisite software

The LaTeX suite of text-processing software, including executables for latex and dvips. These items may already be on your system. Otherwise, install the LateX suite, for example by downloading one of the free implementations from http://www.tug.org. We use the web2c implementation.
The compression utility gzip and the traceback utility addr2line (from the binutils package), provided by the Free Software Foundation (http://www.gnu.org).

Step 2: Create the Arachne binary directory

Pick a location on your system for the Arachne binaries, then get and unpack ftp://ftp.broad.mit.edu/pub/wga/Arachne/Arachne_bin.tar.gz into that location.

Step 3: Create the main Arachne data directory

Pick a location on your system where your data will go, then get and unpack ftp://ftp.broad.mit.edu/pub/wga/Arachne/Arachne_data.tar.gz into that location. All users of Arachne must set the environment variable ARACHNE_PRE to the full path of Arachne_data.

Step 4: Supplement the vector file

In the main Arachne data directory, the file vector/contigs.fasta includes a broad selection of vector sequences. Be sure to add your sequencing vectors to this file if they are not already in it.

Step 5: Test Arachne on a small mouse project

Go to the Arachne binary directory and type

Assemble DATA=mouse_example RUN=run

Wait until it finishes. A report about the project should be produced as mouse_example/run/assembly.ps, inside the main Arachne data directory. View this file to confirm that the assembly yielded one supercontig, consisting of 52 contigs, one of which is misassembled. Verify that mouse_example/run/assembly.log ends with a message regarding normal termination.

This does not guarantee that Arachne is installed correctly, however it should reveal any major problems.

Note. If upon typing Assemble, you got an error message about "command not found", then it is probably because you need to add "." to your Unix path. You can see your path by typing "set | grep path".

Note. If upon typing Assemble, you got an error message from the loader, it may be because you do not have version 5.1 of the operating system. Type "uname -r" to find out.

Preparing your data for assembly

Data and run directories

For each sequencing project to be assembled, create a subdirectory (referred to hereafter as DATA) of the main Arachne data directory, that contains the source data for the project. Inside DATA, create a subdirectory for each particular assembly (referred to as RUN), into which assembly output files are to be placed. We use DATA and RUN relative to their parents. For example, if the main Arachne data directory is /seq/Arachne_stuff, and DATA=sequoia and RUN=May1, then DATA really refers to /seq/Arachne_stuff/sequoia, and RUN really refers to /seq/Arachne_stuff/sequoia/May1.

Notes: DATA is simply a partial path to a subdirectory of the main Arachne data directory, so nesting is allowed. For example, if in the above example we were to specify that DATA=projects/yeast, it would refer to /seq/Arachne_stuff/projects/yeast. Also, soft linking is allowed and probably necessary for very large sequencing projects. Finally, the RUN directory is automatically created by Arachne if it did not exist beforehand.

Data files

The data files provided to Arachne as input reside in the DATA directory:

read sequence files in fasta format. Any and all files of the form reads.fasta, reads.fasta.gz, fasta/fasta.*, fasta/*.fasta, or fasta/*.fasta.gz will be used.
read quality score files in fasta format. Any and all files of the form reads.qual, reads.qual.gz, qual/qual.*, qual/*.qual, qual/*.qual.gz will be used. The quality score files must match the read sequence files on a file-by-file basis.
XML ancillary files: any files of the form traceinfo/*xml* are used. The requirements regarding these files are described later.
reads_config.xml: the assembly configuration file, described later.
reads.to_exclude: the optional read exclusion file, described later.
genome.size: a text file containing the estimated genome size on the first line, and nothing else.
nhaplotypes: an optional text file containing the number of haplotypes in the data set on the first line, and nothing else (allowed values: 1 or 2; default value: 2)
mitochondrial.fasta (optional): a fasta file containing sequence contigs for the mitochondrial genome of the organism being sequenced. Sequence reads matching these contigs are not used in the assembly.
contigs.fasta (optional, but highly recommended): sequence contigs, called known contigs, which Arachne uses in report generation to evaluate the quality of the assembly by aligning its contigs against the known contigs.
defaults (optional): arguments that are always to be fed to the Assemble executable (as described below), but which may be overridden by command-line arguments given directly to it.

XML ancillary files

The files DATA/traceinfo/*xml* contain ancillary data about the reads, which is in the Trace Archive XML format (http://www.ncbi.nlm.nih.gov/Traces/TraceArchiveRFC.html). As described in the next section, this ancillary data may be modified and supplemented with the aid of the configuration file.

We use only a subset of the fields specified in the Trace Archive XML definition:

trace_name: The name of the read, which must be unique. Required.
plate_id: The name of the plate on which the read resides. For paired production reads, normal practice to designate the same plate_id for two physical plates, one having the forward reads and the other having the reverse reads. All reads with a given plate_id must be in the same library. Required.
well_id: The well on the plate that the read came from. Required.
template_id: The name of the template (insert). Arachne identifies forward-reverse read pairs as those sharing the same template_id. Required for reads designated "paired production" or "transposon". The concept of a template for a transposon here is simply a kludge to associate a pair of transposon reads from the same transposon event, so there should be a different template id for each transposon event. Reads with the same template_id must be in the same library.
insert_size: The estimated insert size, in bases, for paired production reads. The estimated separation, in bases, for transposon reads. Required to be non-zero for reads designated as "paired production" or "transposon" (see the type field below). Moreover, for paired production reads the insert_size must be at least 400; this requirement is intended to catch situations where, e.g., a length of 2 was used to mean 2000.
insert_stdev: The standard deviation of the insert size, in bases, for paired production reads. The standard deviation of the separation, in bases, for transposon reads. Required to be non-zero for reads designated "paired production" or "transposon".
trace_end: The direction of the read on its insert (either F for forward or R for reverse). Required for reads designated "paired production" or "transposon".
seq_lib_id: The name of the library containing the read. Some centers instead use the field library_id for this, and Arachne will look for a library_id if no seq_lib_id is specified. Required.
center_name: The center from which the read came. Optional.
ti: The trace archive number. Optional.

In addition, we have a field that is not part of the Trace Archive Format and therefore must be set using the configuration file (see above for a brief description).

type: "paired_production", "unpaired_production", or "transposon". Required.

Important: Every read must appear in exactly three places in the Arachne input files: in a read sequence file, in a read quality score file, and in an XML ancillary file. Read identities are defined by read names, and read names are determined as follows. For read sequence and read quality score files, we take the rightmost white-space-free string on a ">" line. For example, ">gnl|ti|3 G10P69425RH3.T0" would yield the read name "G10P69425RH3.T0". For XML ancillary files, read names are defined by the TRACE_NAME field.

Configuration file

The configuration file (reads_config.xml) allows you to correct and augment the information presented to Arachne via the XML files.

For example, if any of the required fields are missing from your XML ancillary files, you do not need to modify the XML file itself before running Arachne. You can simply write a configuration file that will provide the missing information to Arachne.

Also, the configuration file allows for an easy way to set parameters that are common to a group of reads. For instance, below we demonstrate how to set insert size and insert size standard deviation for a particular library.

Note that you must use the configuration file to set type field. If you try to include the type field in the XML file, Arachne will fail because the XML file will not conform to the Trace Archive Format specification (see above).

Here we give a somewhat informal explanation of how the configuration files are constructed. However, as an XML file, the configuration file has a formal "document type" definition, that can be found in the file dtds/configuration.dtd (in the main Arachne data directory).

Begin the configuration file with the following text:

<?xml version="1.0"?> 
<!DOCTYPE configuration SYSTEM "configuration.dtd"> 
<configuration>

and end it with:

</configuration>

The types of constructs that can put in between are comments, macros, and most importantly, rules.

Comments are in the standard XML format, for example:

	<!-- ******** Some contaminated reads, to be tossed ******** -->

Macros facilitate abbreviation, pointless or otherwise, for example:

	<macro name="gh">gringlehopper</macro>

would change every subsequent occurrence of the string $gh to the string gringlehopper. Any text could have been used in place of gringlehopper.

Rules require more explanation, because they have nontrivial syntax. For example,

        <rule> 
             <name> exclude probable human reads </name> 
             <match> 
                  <match_field>plate_id</match_field> 
                  <literal>G10P6007</literal>
             </match> 
	     <match> 
                  <match_field>plate_id</match_field> 
                  <regex>^G10P613[01]$</regex>
             </match> 
	     <action><remove /></action> 
	</rule>

would cause all reads having plate_id G10P6007, G10P6130, or G10P6311 to be ignored by Arachne.

More generally, a rule is defined by three fields:

<name>: Explanatory title. One per rule or none at all.
<match>: Defines which reads are affected by the rule. One or more per rule.
<action>: Defines what happens to those reads affected by the rule, namely those reads specified in one or more of the match fields. Exactly one per rule.

A given rule can have more than one <match> tag. If there is more than one, then the rule is applied to reads that match any of those <match> tags.

Each <match> tag contains the names of the fields it checks (in <match_field> tags) and the values it expects in those fields (in <literal> or <regex> tags). If all of the specified fields match the expected values, the <match> is made and the rule's action is applied.

A <match> tag requires one or more <match_field><literal> or <match_field><regex> pairs.

<match_field>: The field from the XML ancillary data to test for a matching read, in lower-case. This should be followed by either a <literal> tag or a <regex> tag.
<literal>: A literal string to match against the contents of the specified <match_field>.
<regex>: A regular expression to match against the contents of the specified <match_field>.
Since it is a regular expression, be aware that a regex like <regex>0</regex> will match not only zero, but any string that contains a zero, such as "500" or "asdf0qwerty". If your intention is to match exactly some value, use the <literal> tag or use the start- and end-of-line markers ("^" and "$") to restrict your regular expression to match exactly the entire string in that field, e.g. <literal>0</literal> or <regex>^0$</regex>.

The <action> tag requires one of the following sub-tags. Only one type of sub-tag is allowed for a single <action> tag, though multiple <set> tags are allowed in a single <action> tag.

<remove />: Remove any matching read. Only one <remove /> tag is allowed in each <action> tag.
<unpair />: Remove any pairing information for matching reads. Only one <unpair /> tag is allowed in each <action> tag.
<set>: Set a field for any matching read. The syntax is
```
	<set> 
             <set_field> ... </set_field> 
             <value> ... </value> 
        </set>
```
but there may be more than one set tag within a given <action>.

Values of other fields associated with the matching read may be referred to in <value> tags by prepending the name of the field with an "@". Also, integer arithmetic evaluation will occur when setting numeric fields such as insert_size and insert_stdev. For example,

      <set>
         <set_field> insert_stdev </set_field>
         <value>@insert_size/10</value>
      </set>

will set insert_stdev to 10% of insert_size.

Here is another example of a rule that sets the insert statistics for all reads whose names begin with G20:

      <rule> 
         <name> set insert stats for G20 reads </name> 
         <match>
            <match_field>trace_name</match_field>
            <regex>^G20</regex>
         </match> 
         <action> 
            <set>
               <set_field>insert_size</set_field>
               <value>4000</value>
            </set> 
            <set>
               <set_field>insert_stdev</set_field>
               <value>400</value>
            </set> 
         </action> 
      </rule>

Finally, we give an example that shows how to designate every read as being a paired production read:

      <rule> 
         <name> all reads are paired production reads </name> 
         <match>
            <match_field>trace_name</match_field>
            <regex>.</regex>
         </match> 
         <action> 
            <set>
               <set_field>type</set_field>
               <value>paired_production</value>
            </set> 
         </action> 
      </rule>

Rules are applied in the order which they appear in the configuration file. Interactions between the rules are possible, and consequently, the order in which the rules appear may matter.

Additionally, one may provide an exclusion file, a list of read names to be excluded from the assembly. This file should be named "reads.to_exclude" and should be located in the DATA directory. The reads in this file will be excluded prior to the application of any rules.

Running Arachne

An Arachne assembly must be initiated from the Arachne binary directory, by invoking the main Arachne executable, Assemble, as follows:

Assemble DATA=the_project_directory RUN=the_results_directory

where "the_project_directory" is the name of the data directory and "the_results_directory" is the name of the run directory.

Note. Before running Assemble on your own data, be sure to test your installation by running it on mouse_example, as per the instructions given earlier.

Note. Some experimentation is needed to determine how much memory and disk space are needed for any given Arachne assembly. Running out of either will have unpredictable consequences.

Note. Simultaneous Assemble processes can share the same data directory, but not the same run directory.

Assemble accepts a number of additional command-line arguments, all optional:

Input Processing Options:

num_cpus: The number of cpus Arachne will try to multithread over. In the present release, this only affects the early stages of read processing, and only if the reads and quality scores have been distributed over multiple files. The default value is 1.
config_file: The name of an alternate configuration file (in place of the default file name reads_config.xml).
exclusion_file: The name of an alternate exclusion file (in place of the default file name reads.to_exclude).

Assembly Quality Options:

FAST_RUN=True: By default, Arachne assembles in two passes: the first pass combines some reads together (to form larger "reads"), and then the second pass assembles these. This two-pass approach will usually produce a better assembly, but will also cause Arachne to use more time and memory. If you want to turn off the two-pass feature, put "FAST_RUN=True" on the command line. This is what we have done for mouse assemblies.
maxcliq1: Arachne builds read-read alignments via seed sequences of length 24. If a seed sequence occurs more than maxcliq1 times in the reads, it is ignored. If maxcliq1 is too small, Arachne will not see valid alignments. If it is too large, Arachne will be overwhelmed by alignments between repeat sequences (and consequently, be slow and use more memory). A reasonable value would be 5 to 10 times the coverage of the genome by the reads, although we have used larger values. The default value is 50.
maxcliq2: This is like maxcliq1, but governs the second pass of Arachne assembly if FAST_RUN=False (the default). The default value is 50.

Assembly Improvement Options:

Genome Research

ENLARGE_CONTIGS=True: Attempt to extend contigs by creating new contigs using unplaced partners of placed reads and by merging contigs whose linking information indicates an overlap. This can cause Arachne to be slow and use more memory.
IMPROVE_SUPERS=True: Attempt to extend and improve existing supercontigs, using a variety of strategies, described in the paper cited above. There are a two sub-options described below that affect the algorithms used in the IMPROVE_SUPERS section.
A pair of these parameters affects the "positive breaking" algorithm, where Arachne attempts to find evidence that indicates that two supercontigs should be broken and one piece from each joined together instead. This evidence takes the form of long links from the middle of one supercontig to the middle of another.
Arachne requires that there be a minimum number of these links and that the links be spread over some minimum distance in each supercontig. These parameters are specified on the Assemble command line as min_cluster_size_to_break and min_cluster_spread_to_break, with default values of 5 and 50000, respectively. Both can be any positive (non-zero) value, though we recommend that the min_cluster_spread_to_break should be a significant fraction of the long links' estimated insert size.
Note that this code was designed for whole genome shotgun assemblies, and may not be applicable to smaller assemblies, such as BACs.
PATCH_GAPS=True: Attempt to cross gaps between neighboring contigs by placing partners of reads in those contigs and reads to which those partners align into those gaps. This option may cause Arachne to run more slowly and use more memory. There are two sub-options described below that adjust the parameters of the gap-patching algorithm.
The first of these sub-options is patch_gaps_loops1, which affects how inclusive the algorithm is in selecting reads that might patch a gap. The higher the value of this parameter, the larger this set of reads will be and the better chance you have of collecting a set of reads that will successfully patch a gap. However, using a higher value also increases the possibility of performing an incorrect patch, and the runtime and memory usage of the gap-patching modules will increase. Conversely, by decreasing the value of this parameter, you decrease the chances of successfully patching gaps, but you will reduce runtime and memory usage. The default value of patch_gaps_loops1 is 5, but any non-negative integer is valid.
The second of these sub-options is patch_gaps_max_deviance, which affects how closely a possible patch must correlate to the linking information in that region. If a prospective patch of a gap would stretch the links over that gap too much, the patch is abandoned. The lower this value, the stricter the correspondence must be. The default value of patch_gaps_max_deviance is 4.0, but any non-negative floating point number is valid.
Note that this code may produce assemblies with multiply-placed reads, i.e. reads which are placed into more than one contig, though not twice in the same supercontig. Arachne attempts to resolve as many of these as it can, but in some cases it is not clear which location is better, and both placements are left untouched.
PLACE_BAC_ENDS=True: Attempt to place as-yet-unplaced long insert ends and use this new linking information to extend and improve existing supercontigs. These reads are placed only if there appears to be an unambiguous location for them in the assembly. The command-line option min_bac_insert_size is used to specify which reads to attempt to place. Reads with estimated insert sizes of greater than min_bac_insert_size are considered. The default value is 100000, though any non-zero positive value is accepted.

Output Options:

REINDEX_SUPERS=True: Reorder the supercontigs in the assembly by their estimated size (starting at 0) and reorder the contigs by their occurrence in those reordered supercontigs (also starting at 0). For example, the largest supercontig will have id 0 and could contain contigs 0 through 9, the second largest supercontig will have id 1 and could contain contigs 10 to 17, and so on.
ACE=True: Automatically generate ace files for all supercontigs in the assembly (as could be done by CreateAceFile, described below), placing them in the subdirectory acefiles of the run directory. If the option one_ace_file=True is used, one ace file containing all the contigs in the assembly will be generated.

Output

The output of the assembly consists of the following files, found in the RUN directory:

assembly.ps: A report about the assembly, in postscript form.
assembly.log: The main log file for Arachne. The last item written to the file should describe how Arachne terminated.
assembly.bases.gz: The fasta file containing the sequence of bases for the contigs, gzipped.
assembly.quals.gz: The fasta file containing the sequence of quality scores for the contigs, gzipped.

assembly.links: A file describing the supercontigs in the assembly.

This tab-delimited file has the following fields, one row per contig, ordered by supercontig id and the ordinal number of the contig in the supercontig:

Type Meaning

Integer Id of the supercontig containing this contig

Integer Length of the supercontig containing this contig (including estimated gap sizes)

Integer Number of contigs in the supercontig containing this contig

Integer Ordinal number of this contig in the supercontig

Integer Id of this contig

Integer Length of this contig

Integer Estimated length of gap before this contig (zero if first contig in supercontig)

Integer Estimated length of gap after this contig (zero if last contig in supercontig)

assembly.reads: A file describing the placement of reads in the assembly.

This tab-delimited file has the following fields, one row per placed read, ordered by the id of the contig containing the read and the approximate coordinate of the first base of the trimmed read in the contig:

Type Meaning
String Name of read

String Status of read

Integer Untrimmed read length

Integer Coordinate of first base of trimmed read in untrimmed read (zero-based)

Integer Length of trimmed read in untrimmed read

Integer Id of contig containing read

Integer Length of contig containing read

Integer Approximate coordinate of first base of trimmed read in contig (zero-based)

Integer Approximate coordinate of last base of trimmed read in contig (zero-based)

'+' or '-' Strand (orientation of read on contig)

String Name of this read's partner (empty if unpaired)

String Status of this read's partner

Integer Id of the contig containing this read's partner (empty if unpaired or partner unplaced)

Integer Observed insert size (empty if unpaired, partner unplaced, or partner in different supercontig)

Integer Given insert size (empty if unpaired)

Integer Given insert size standard deviation (empty if unpaired)

Float Observed insert size deviation measure (empty if observed insert size is empty)

The status of the read is a set of characters used to flag conditions of note. Currently that field will either be empty or contain one or more of the following one-letter codes: M, S, and T.

If a read is multiply placed, its status will include M, and no observed insert size or deviation measure will be given for that pairing.

If a read's partner is multiply placed, the partner's status will include M, and no contig will be given for the partner, and no observed insert size or deviation measure will be given for that pairing.

If a read and its partner are on the same supercontig and have the same orientation, the status of both will include S, and no observed insert size or dev deviation measure will be given for that pairing.

If a read is a transposon, its status will include T, and the observed insert size will be the observed separation of the transposon reads and its partner.

Note that the observed insert size may include estimated gap sizes between contigs unless the read and its partner are located in the same contig.

The observed insert size deviation measure field contains the result of the calculation:

observed insert size - given insert size

given insert size standard deviation

This gives you a signed measure of the observed insert size relative to the given insert size.

assembly.unplaced: An accounting of unplaced reads.

This tab-delimited file contains the names of the reads that were not placed in the assembly and why.

Each row contains a read name and a keyword indicating the reason the read was not placed in the assembly. The values of that field can be:

Value Meaning

deliberate excluded by configuration file

low_quality nothing left after quality-based trimming

vector_or_host matches vector or bacterial host sequence

mitochondrial matches mitochondrial sequence

other_contaminant matches sequence in DATA/contaminants.fasta

same_name had the same name as some other read

no_metainfo had no metainformation in the XML files

chimera suspected of being chimeric

unplaced no problem with read, but not placed in contig

other some other reason

The RUN directory also contains a subdirectory "work", in which large numbers of internal assembly files reside.

Generating ace files

Ace files are the main input files for Consed, a tool for viewing assemblies by graphically showing the aligned reads on a contig-by-contig basis. To get ace files from an Arachne assembly, either specify ACE=True on the Assemble command line or invoke the tool CreateAceFile manually from the Arachne binary directory. This will provide enough data to run Consed, although its functionality will be greater if the .scf and .phd files which PHRED produces are available. The acefiles created by Arachne have only been tested with consed releases 7.52 and 12.

A typical use of CreateAceFile would be

  CreateAceFile  DATA=... RUN=... ACEDIR=... AceFile=ace_file_name  Type=Index  Index='[1-3,5]'

where DATA and RUN are set in the same manner as with Assemble, except that "/work" should be appended to the RUN parameter. This command would produce four .ace files in the ACEDIR directory (where ACEDIR is a subdirectory of DATA): ace_file_name.1, ace_file_name.2, ace_file_name.3, and ace_file_name.5, corresponding to supercontigs 1, 2, 3, and 5.

If "Type=Index" is changed to "Type=All" and "Index=..." is omitted, then ace files will be generated for all supercontigs, in multiple files as in the example. Alternately, acefiles for the n largest supercontigs can be produced by using "Type=Top TopN=n", where n is a positive integer. In all cases, an additional argument of the form "Cutoff=k" will cause the omission of ace files for supercontigs whose constituent contigs are all shorter than k bases.

If ONE_FILE=True is used, CreateAceFile will place all the contigs in the assembly in one ace file.

Contacting us

We would like to hear from you! You may contact us at wga@broad.mit.edu.

If you experience difficulty while running Arachne, please send us a description of the problem encountered along with the assembly.log file from the relevant RUN directory. You may find it helpful to look at the list of Frequently Asked Questions.

Type	Meaning
Integer	Id of the supercontig containing this contig
Integer	Length of the supercontig containing this contig (including estimated gap sizes)
Integer	Number of contigs in the supercontig containing this contig
Integer	Ordinal number of this contig in the supercontig
Integer	Id of this contig
Integer	Length of this contig
Integer	Estimated length of gap before this contig (zero if first contig in supercontig)
Integer	Estimated length of gap after this contig (zero if last contig in supercontig)

Type	Meaning
String	Name of read
String	Status of read
Integer	Untrimmed read length
Integer	Coordinate of first base of trimmed read in untrimmed read (zero-based)
Integer	Length of trimmed read in untrimmed read
Integer	Id of contig containing read
Integer	Length of contig containing read
Integer	Approximate coordinate of first base of trimmed read in contig (zero-based)
Integer	Approximate coordinate of last base of trimmed read in contig (zero-based)
'+' or '-'	Strand (orientation of read on contig)
String	Name of this read's partner (empty if unpaired)
String	Status of this read's partner
Integer	Id of the contig containing this read's partner (empty if unpaired or partner unplaced)
Integer	Observed insert size (empty if unpaired, partner unplaced, or partner in different supercontig)
Integer	Given insert size (empty if unpaired)
Integer	Given insert size standard deviation (empty if unpaired)
Float	Observed insert size deviation measure (empty if observed insert size is empty)

Value	Meaning
deliberate	excluded by configuration file
low_quality	nothing left after quality-based trimming
vector_or_host	matches vector or bacterial host sequence
mitochondrial	matches mitochondrial sequence
other_contaminant	matches sequence in DATA/contaminants.fasta
same_name	had the same name as some other read
no_metainfo	had no metainformation in the XML files
chimera	suspected of being chimeric
unplaced	no problem with read, but not placed in contig
other	some other reason