Avitai Knowledge

Multi-Modal Knowledge Extraction

Extract, structure, and query knowledge from scientific literature, databases, and experimental data. Multi-modal AI for biological understanding.

View on GitHub Read Documentation

Repository Coming Soon

This project is under active development and will be open-sourced soon

Overview

Avitai Knowledge is a comprehensive platform for extracting, structuring, and querying scientific knowledge. In an era where millions of papers are published annually and biological databases contain petabytes of data, finding relevant information has become a major bottleneck in research. Avitai Knowledge solves this problem.

The platform uses cutting-edge multi-modal AI to understand scientific content in all its forms. It doesn't just read text – it interprets figures, parses tables, recognizes chemical structures, and understands the relationships between different types of biological entities. This deep understanding enables powerful capabilities like semantic search, knowledge graph construction, and automated synthesis of findings.

What makes Avitai Knowledge particularly powerful is its ability to work with both public and private data. While it can mine knowledge from PubMed and other public sources, it can also process your internal documents, experimental data, and proprietary information. This creates a unified knowledge base that combines what's known publicly with your organization's unique insights.

The platform is designed for both programmatic access and interactive exploration. Researchers can use natural language queries to find information, browse knowledge graphs visually, or integrate the platform's APIs into computational workflows. Whether you're conducting a literature review, validating a hypothesis, or exploring a new research area, Avitai Knowledge helps you find what you need faster.

Avitai Knowledge also powers the Research foundation model in our main platform, providing the vast biological knowledge required for AI-assisted scientific discovery. By open-sourcing the core technology, we enable the research community to build their own knowledge extraction systems tailored to specific domains.

Key Features

Literature Mining

Extract structured knowledge from millions of scientific papers. Identify entities, relationships, and claims with state-of-the-art NLP models.

Multi-Modal Learning

Understand scientific content across text, tables, figures, and chemical structures. Unified representation of diverse data types.

Knowledge Graphs

Build and query biological knowledge graphs connecting genes, proteins, pathways, diseases, and compounds with rich metadata.

Semantic Search

Find relevant information using natural language queries. Search across literature, databases, and internal experimental data.

Document Understanding

Process complex scientific documents including PDFs, supplementary materials, and patents. Extract methods, results, and conclusions.

Database Integration

Connect to major biological databases (UniProt, PDB, ChEMBL, etc.) and integrate external knowledge into your workflows.

Use Cases

Automated literature review for drug target identification

Extract experimental protocols from published papers

Build company-specific knowledge bases from internal documents

Find similar experiments or compounds across literature

Track research trends and emerging technologies in real-time

Generate research hypotheses by connecting disparate findings

Question answering over scientific literature and databases

Prior art searches for patent applications

Installation

# Install from PyPI
pip install avitai-knowledge

# With all NLP models (requires significant disk space)
pip install avitai-knowledge[full]

# Or install from source
git clone https://github.com/avitai/avitai-knowledge.git
cd avitai-knowledge
pip install -e .

Quick Start

from avitai_knowledge import KnowledgeExtractor, KnowledgeGraph
from avitai_knowledge.search import SemanticSearch

# Extract knowledge from papers
extractor = KnowledgeExtractor()
knowledge = extractor.process_papers([
    "path/to/paper1.pdf",
    "path/to/paper2.pdf"
])

# Build a knowledge graph
kg = KnowledgeGraph()
kg.add_knowledge(knowledge)

# Semantic search
search = SemanticSearch(kg)
results = search.query(
    "What are the protein targets of aspirin?"
)

# Or query the knowledge graph
proteins = kg.find_proteins_interacting_with("EGFR")

Built With

TransformersLangChainspaCySciBERTPubMedBERTNeo4jElasticsearchFAISSRDKitBioPython

Ready to Get Started?

Explore the documentation, try examples, or contribute to the project.

View Documentation View on GitHub