At TUM Venture Labs Software & AI, we are committed to empowering the next generation of entrepreneurs. From students to researchers, we offer the resources, knowledge, and community needed to turn your innovative ideas into reality. As a team of entrepreneurs, our mission is to help students and researchers in the successful transformation towards entrepreneurship. Therefore, we build an ecosystem that we would have wanted when we first started thinking about entrepreneurship.
Paper Pulse is an emerging technology, spun out of UTUM, developing a sophisticated Software-as-a-Service (SaaS) platform to innovate the process of technology scouting. The platform is designed to assist organizations in screening and evaluating scientific literature to identify its commercialization potential. A minimum viable product is currently operational at the Technical University of Munich (TUM), and the venture has garnered significant interest from other leading institutions.
Track 1: Schematic Knowledge
- Primary Objective: Construct and analyze a massive-scale knowledge graph from 20 million research and patent documents, thereby enabling the discovery of non-obvious connections and emerging technological trends.
- Design of a comprehensive graph data model (ontology) to formally represent the complex relationships between papers, patents, authors, organizations, and technical concepts.
- Implementation of a data pipeline to populate a graph database (e.g., Neo4j) with over 20 million nodes and their corresponding, semantically-defined relationships.
- Development and empirical testing of novel graph algorithms to analyze the network's topological structure for strategic insights, such as identifying influential researchers or detecting emerging technology clusters.
- Creation of sophisticated queries (e.g., using the Cypher query language) to investigate complex hypotheses regarding the innovation landscape, such as identifying researchers who bridge disparate fields of study.
Track 2: Data Enrichment
- Primary Objective: Design, implement, and validate a scalable data pipeline for the transformation of millions of unstructured scientific documents into a clean, structured, and annotated dataset suitable for advanced artificial intelligence analysis.
- Design and implementation of a scalable Extract, Transform, Load (ETL) pipeline capable of ingesting and processing millions of documents from diverse sources.
- Development of an intelligent PDF parsing system that dynamically routes various document structures (e.g., text-intensive versus formula-intensive) to specialized parsing tools (such as PyMuPDF or Nougat) to ensure optimal data extraction.
- Leveraging Large Language Models (LLMs) for advanced Named Entity Recognition (NER) to identify and extract key semantic information, including technologies, methodologies, and equipment, from unstructured text.
- Creation of a "human-in-the-loop" smart annotation workflow designed to efficiently validate AI-generated labels, thereby constructing a high-quality dataset for subsequent machine learning model training.
Application Project for Master Students of Data Engineering and Analytics!
Download Full Description Track 1 and Full Description Track 2
Begin: Winter semester 25/26; Application Deadline Oct 1, 2025 / Registration Deadline provided by CIT, see https://www.cit.tum.de/en/cit/studies/degree-programs/master-data-engineering-and-analytics/ ‘During the degree program: Application Project’
Duration: 6 months
Supervisor ERI: Milena Barg
Application: Interested candidates are invited to submit their curriculum vitae, a current transcript of records, and a brief statement of motivation to bastian.burger(at)unternehmertum.de