GPUStack

University-internal "LLM-as-a-Service" platform with which AI language models can be easily used via a standardised API for applications, agents, chats or data pipelines.

GPUStack management

Picture: gpustack.ai

With GPUStack, we offer researchers centralised, low-threshold access to powerful open-weight models for AI-supported applications. This instance bundles the GPU computing power available at the University Computer Centre and makes it available as a high-performance, scalable AI engine. This enables researchers and students to use complex language and analysis models (LLMs) flexibly for their work without having to deal with the administration of servers or the management of graphics card resources.

Service URLExternal link (use with university login via sso.hs-itz.de)
DocumentationExternal link

Functionalities

Our GPUStack instance permanently provides various AI language models (Large Language Models, LLMs) and can serve as a local, high-performance and data protection-compliant alternative to OpenAI, Microsoft Azure or Anthropic. An intuitive user interface lists all currently deployed models and enables their direct use, minimising the technical overhead for scientific staff.

You can do the following things with this instance:

Seamlessly integrate with existing tools (OpenAI drop-in): Thanks to standard OpenAI-compatible APIs, the GPUStack instance can be integrated directly into existing scripts, Jupyter Notebooks or frameworks (such as LangChain, LlamaIndex, Dify) without any code adjustments. All you have to do is replace the OpenAI URL with the GPUStack address and your own API key in the code to immediately use the permanently provided LLMs in your AI pipelines.
Operation of AI assistants and chat interfaces: As the LLMs are permanently ready for use, you can effortlessly connect your own customised user interfaces. In the development area, plugins such as Continue.dev or Cody enable the integration of AI assistants for code autocompletion directly in VS Code or JetBrains IDEs. For general use, data protection-compliant chat applications can be realised via open source interfaces such as Open WebUI, LibreChat or NextChat, which transparently access the internal GPUStack models in the background.
Set up RAG systems (knowledge management): Thanks to the LLMs and embedding models provided, you can build your own Retrieval Augmented Generation (RAG) systems to retrieve internal knowledge. PDFs, wikis (e.g. Confluence/Notion), SharePoint or databases can be seamlessly connected. As all data remains within the university's own GPUStack instance, even highly sensitive research and administrative documents can be analysed in full compliance with data protection regulations.
Automation and batch processing: Thanks to efficient request bundling and dynamic load distribution across the available GPUs, the instance is optimised for the automation of data-intensive background processes. This allows you to automatically analyse unstructured data volumes - such as thousands of emails or support tickets - and convert them into structured JSON formats. Large-scale text generation, translations or summaries can be realised just as efficiently in a batch process.

Information

GPUStack is currently in a test or setup phase and we are awaiting your feedback for improvements. It is planned to transfer this instance into a regular URZ or HS-ITZ service. In any case, your data will remain locally on university servers.