← 返回
未分类 中文

UniProt Query

Query UniProt database for protein sequences, metadata, and search by criteria. Use this skill when: (1) Looking up protein information by UniProt accession...
查询 UniProt 数据库中的蛋白质序列、元数据并按条件检索。使用此技能的场景包括:(1) 通过 UniProt 登录号查找蛋白质信息...
hollyya
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 361
下载
💾 0
安装
1
版本
#latest

概述

UniProt Query

Query the UniProt knowledgebase for comprehensive protein information.

When to Use

  • Look up protein by UniProt accession (e.g., P00533 for EGFR)
  • Search proteins by gene name, organism, or keywords
  • Retrieve protein metadata: function, domains, diseases, PTMs
  • Get protein sequences and structural annotations

Workflow

Use Case 1: Protein Lookup by ID

Fetch complete protein information including metadata.

from open_biomed.tools.tool_registry import TOOLS
import requests
import json

# Get protein sequence (existing tool)
tool = TOOLS["protein_uniprot_request"]
proteins, _ = tool.run(accession="P0DTC2")  # SARS-CoV-2 Spike
protein = proteins[0]

# Fetch full metadata from UniProt API
url = f"https://rest.uniprot.org/uniprotkb/P0DTC2?format=json"
response = requests.get(url)
metadata = parse_uniprot_entry(response.json())

See examples/lookup_by_id.py for complete implementation.

Use Case 2: Search by Criteria

Search UniProt by gene name, organism, keywords, or disease.

import requests

base_url = "https://rest.uniprot.org/uniprotkb/search"

# Example queries:
queries = {
    "gene_exact:EGFR AND organism_id:9606": "Human EGFR",
    "gene_exact:S AND organism_id:2697049": "SARS-CoV-2 Spike",
    "keyword:Kinase AND organism_id:9606": "Human kinases",
    "diabetes AND organism_id:9606": "Diabetes-related proteins",
}

params = {
    "query": "gene_exact:EGFR AND organism_id:9606 AND reviewed:true",
    "fields": "accession,gene_primary,protein_name,organism_name,length",
    "format": "json",
    "size": 10
}
response = requests.get(base_url, params=params)

See examples/search_by_criteria.py for complete implementation.

Query Syntax Reference

FieldExampleDescription
-----------------------------
gene_exactgene_exact:EGFRExact gene name match
genegene:BRCAGene name (partial match)
organism_idorganism_id:9606Organism by TaxID
organismorganism:"Homo sapiens"Organism by name
protein_nameprotein_name:kinaseProtein name search
keywordkeyword:KinaseUniProt keyword
cc_diseasecc_disease:diabetesDisease association
reviewedreviewed:trueSwiss-Prot only (curated)

Common Organism IDs: Human (9606), Mouse (10090), SARS-CoV-2 (2697049), E. coli (83333)

Combine queries: Use AND, OR to combine criteria:

  • gene_exact:EGFR AND organism_id:9606 AND reviewed:true

Expected Outputs

Metadata JSON (lookup_by_id)

{
  "accession": "P0DTC2",
  "uniProtId": "SPIKE_SARS2",
  "protein": {"name": "Spike glycoprotein"},
  "gene": {"primary": "S", "synonyms": []},
  "organism": {"scientific_name": "...", "taxon_id": 2697049},
  "sequence": {"length": 1273, "mass": 141178},
  "function": ["Attaches the virion to host receptor..."],
  "domains": [{"type": "Domain", "description": "RBD", "location": "319-541"}],
  "keywords": ["Glycoprotein", "Transmembrane", "Viral attachment"],
  "subcellular_location": ["Virion membrane"]
}

Text Report

======================================================================
UNIPROT PROTEIN REPORT
======================================================================
Accession:     P0DTC2
Protein:       Spike glycoprotein
Gene:          S
Organism:      Severe acute respiratory syndrome coronavirus 2
Length:        1273 aa

FUNCTION
Attaches the virion to the cell membrane by interacting with
host receptor ACE2...

DOMAINS
• Domain: BetaCoV S1-NTD (14-303)
• Region: Receptor-binding domain (319-541)

Search Results JSON

{
  "query": "gene_exact:S AND organism_id:2697049",
  "total_results": 10,
  "results": [
    {"accession": "P0DTC2", "gene": "S", "protein_name": "Spike glycoprotein", ...}
  ]
}

Error Handling

ErrorSolution
-----------------
Accession not foundVerify UniProt ID format (e.g., P00533, not EGFR)
No search resultsBroaden query, remove reviewed:true, check organism ID
TimeoutReduce size parameter, simplify query
Rate limitedWait and retry; UniProt allows 10 requests/second

Available Tools

ToolPurpose
---------------
protein_uniprot_requestFetch protein sequence by accession (existing)

The workflows in this skill extend the basic tool with full metadata retrieval via UniProt REST API.

References

  • references/query_fields.md - Complete query field reference
  • references/metadata_fields.md - Available metadata fields
  • UniProt API Docs: https://www.uniprot.org/api-documentation

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 14:43 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

STRING PPI Query

hollyya
查询STRING数据库获取蛋白质-蛋白质相互作用及置信度评分。适用于以下场景:(1) 查找目标蛋白的相互作用伙伴。
★ 0 📥 396

KEGG Query

hollyya
Query KEGG database for drug information, pathway analysis, and disease-drug-target discovery. Use this skill when: (1)
★ 0 📥 396

Biomedical Literature Search

hollyya
在PubMed和bioRxiv上检索生物医学文献,获取研究论文。适用于:(1)查找特定主题或疾病的研究论文;(2)……
★ 1 📥 613