Finding AdK sequences

Sequence of AdK

  1. Start with PDB 1AKE in the Protein Databank.

  2. Find the UniProtKB accession number P69441 from the PDB page.

  3. Find P69441 in UniProt (P69441 (KAD_ECOLI)).

  4. Browse the KAD_ECOLI page. Find the KAD_ECOLI sequence

    >sp|P69441|KAD_ECOLI Adenylate kinase OS=Escherichia coli (strain K12) GN=adk PE=1 SV=1
    MRIILLGAPGAGKGTQAQFIMEKYGIPQISTGDMLRAAVKSGSELGKQAKDIMDAGKLVT
    DELVIALVKERIAQEDCRNGFLLDGFPRTIPQADAMKEAGINVDYVLEFDVPDELIVDRI
    VGRRVHAPSGRVYHVKFNPPKVEGKDDVTGEELTTRKDDQEETVRKRLVEYHQMTAPLIG
    YYSKEAEAGNTKYAKVDGTKPVAEVRADLEKILG
    

    (This is the sequence in FASTA format, one of the common sequence formats.)

Finding other AdK sequences with BLAST

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences, which can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

Use blastp (Protein BLAST) with the following settings (everything else can be left at defaults):

Enter Query Sequence:

  • Enter accession number(s), gi(s), or FASTA sequence(s): paste the KAD_ECOLI FASTA sequence into the search box

Choose Search Set:

  • Database: Non-redundant protein sequences (nr)

Program Selection

  • Algorithm: blastp (protein-protein BLAST)

This finds many sequences that are all almost identical (identity 99% - 100%, E values around 1e-150). One would have to create a data set that removes some of the nearly identical sequences but this is beyond this introduction.

PFAM

Use the sequence to search PFAM

Find Clan CL0023 (bit score 204.7, E-value 6.8e-61) and there family ADK PF00406.

  • Domain organization: There are 2742 sequences with the following architecture: ADK, ADK_lid

Even the Adk_lid pfam PF05191 is still too big. The view of PDB structures is useful (and could be used with MultiSeq).