Editseq user guide

#EDITSEQ USER GUIDE FULL#

Sample sample sequences by number or proportion Rmdup remove duplicated sequences by ID/name/sequence Restart reset start position for circular genome Replace replace name/sequence by regular expression Range print FASTA/Q records in a range (start:end) Pair match up paired-end reads from two fastq files Mutate edit sequence (point mutation, insertion, deletion) Locate locate subsequences/motifs, mismatch allowed Head-genome print sequences of the first genome with common prefixes in name Grep search sequences by ID/name/sequence/sequence motifs, mismatch allowed Genautocomplete generate shell autocompletion script (bash|zsh|fish|powershell) Therefore there's no need to pipe the result to gzip/pigz.Īmplicon extract amplicon (or specific region around it) via primer(s)īam monitoring and online histograms of BAM record featuresĬommon find common sequences of multiple files by id/name/sequenceĬompletion generate the autocompletion script for the specified shellĬoncat concatenate sequences with same ID from multiple filesĬonvert convert FASTQ quality encoding between Sanger, Solexa and Illuminaįaidx create FASTA index file and extract subsequenceįish look for short sequences in larger sequences using local alignmentįx2tab convert FASTA/Q to tabular format (and length, GC content, average quality.) Seqkit writes gzip files very fast, much faster than the multi-threaded pigz, Read and write gzip file, and the outputted gzip file would be slighty seqkit SeqKit - a cross-platform and ultrafast toolkit for FASTA/Q file manipulation Reproduced in different environments with same random seed. This makes sure that sampling result could be Subcommands sample and shuffle use random function, random seed could be They use FASTA index for rapid acccess of sequences and reducing memory occupation.

2 ( -two-pass), including sample, split, shuffle and sort. Some subcommands could either read all records or read the files twice by flag You could use -chr to specify chromesomes and -feature to limit features. Note that when using subseq -gtf | -bed, if the GTF/BED files are too Including stat, fq2fa, fx2tab, tab2fx, grep, locate, replace, Most of the subcommands do not read whole FASTA/Q records in to memory, Performance bottleneck, and using more threads will not increase the speed.įew commands could benefit from multiple (>4) threads:

Using four threads is fast enough for most commands where FASTA/Q reading and writing is the The concurrency number is configurable with global The Parallelization is implemented by multiple goroutines in golang The pgzip package reads and write gzip files in parallel. Parsing of line-based files, including BED/GFF file and ID list file are also parallelized. The validation of sequences bases and complement process of sequences

#EDITSEQ USER GUIDE FULL#

SeqKit uses full sequence head instead of just ID as key. seqkit.fai file created by SeqKit is a little different from. Rapid access of sequences and reducing memory occupation.ĪTTENTION: the. When input files are (plain or gzipped) FASTA files, FASTA indexįor some commands, including subseq, split, sort and shuffle, The gi number, then use -id-regexp "^gi\|(+)\|". id-regexp "\|(+)\| " or just use flag -id-ncbi. In this case, we could set sequence ID parsing regular expression by global flag Seqkit accepts input data from standard input (STDIN) and plain or gzip-compressed files.įiles can be given via positional arguments or the flag -infile-list. seqtk+pigz: seqtk pipes data to the multithreaded pigz which uses 4 threads here.seqtk+gzip: seqtk pipes data to the single-threaded gzip.SeqKit uses the author's lightweight and high-performance bioinformatics package Technical details and guides for use FASTA/Q format parsing and writing Edit: concat, replace, restart, mutate,.Searching: grep, locate, amplicon, fish.Format conversion: fq2fa, fx2tab, tab2fx,.