• Home
  • Press Release - 26/03/26
  • What We Do
  • Who We Work With
  • Digital Transformation
  • Food Security
  • Our Core Business Groups
  • Distillery Transformation
  • Fresh Oak Maturation
  • Commercial Enablement
  • WAFEnity
  • Integrated Across
  • Ceramax
  • Digital Services
  • Our Services
  • Gallery
  • WAFEnity FAQ
  • More
    • Home
    • Press Release - 26/03/26
    • What We Do
    • Who We Work With
    • Digital Transformation
    • Food Security
    • Our Core Business Groups
    • Distillery Transformation
    • Fresh Oak Maturation
    • Commercial Enablement
    • WAFEnity
    • Integrated Across
    • Ceramax
    • Digital Services
    • Our Services
    • Gallery
    • WAFEnity FAQ
  • Home
  • Press Release - 26/03/26
  • What We Do
  • Who We Work With
  • Digital Transformation
  • Food Security
  • Our Core Business Groups
  • Distillery Transformation
  • Fresh Oak Maturation
  • Commercial Enablement
  • WAFEnity
  • Integrated Across
  • Ceramax
  • Digital Services
  • Our Services
  • Gallery
  • WAFEnity FAQ

Watercloud Global Infrastructure

Watercloud Global InfrastructureWatercloud Global InfrastructureWatercloud Global Infrastructure

Safest, Fastest GPU Honeycomb Infrastructure

Safest, Fastest GPU Honeycomb Infrastructure Safest, Fastest GPU Honeycomb Infrastructure Safest, Fastest GPU Honeycomb Infrastructure Safest, Fastest GPU Honeycomb Infrastructure

Frequently Asked Questions

Please reach us at admin@watercloud1.com if you cannot find an answer to your question.

Inference is the process of using a trained AI model to produce results in real time.


In operational terms, inference is the live production workload running inside the data centre. It is the activity that generates revenue from AI infrastructure.


Training builds the model.
Inference runs the service.


Inference runs continuously and consumes the majority of compute resources.

Most AI data centres operate with:

80–95% inference workloads
5–20% training workloads

Inference determines:

  • revenue generation
  • power demand
  • rack utilisation
  • customer performance
  • infrastructure scaling


Acceleration refers to specialised hardware or software designed to increase the speed and efficiency of compute workloads.

Accelerators perform complex calculations significantly faster than standard processors.

Acceleration reduces:

  • processing time
  • energy consumption
  • cost per computation


Inference is the workload.
Acceleration is the technology that enables the workload to run efficiently.

Inference generates output.
Acceleration improves performance.


An accelerator is a processor designed specifically for high-performance computing tasks such as artificial intelligence, machine learning, and data analytics.

Common accelerator types include:

  • GPU (Graphics Processing Unit)
  • NPU (Neural Processing Unit)
  • TPU (Tensor Processing Unit)
  • ASIC (Application-Specific Integrated Circuit)
  • FPGA (Field-Programmable Gate Array)


GPU utilisation is the percentage of time a GPU is actively processing workloads.


Example:

100 GPUs installed
80 GPUs running workloads

GPU utilisation:

80%

High utilisation indicates efficient infrastructure usage and strong revenue performance.


Latency is the time required for a system to process a request and return a response.

Latency is measured in milliseconds.

Lower latency results in:

  • faster response time
  • improved user experience
  • higher service reliability


Throughput is the total amount of work processed by a system within a specific time period.

Examples:

  • requests per second
  • tokens per second
  • transactions per second
  • images per second

Higher throughput indicates higher system capacity and productivity.


Rack density is the amount of electrical power consumed by equipment installed in a single rack.

Measured in:

kilowatts (kW) per rack

Typical ranges:

Traditional data centre:

5–10 kW per rack

AI data centre:

30–80 kW per rack

Next-generation AI infrastructure:

100–150 kW per rack


Performance per watt measures the amount of computing work delivered for each unit of electricity consumed.

This metric is critical because electricity is the largest operating cost in modern data centres.

Higher performance per watt results in:

  • lower operating costs
  • improved efficiency
  • increased profitability


Revenue per rack is determined by three primary factors:

Inference throughput
×
Utilisation rate
×
Service pricing

Supporting factors include:

  • accelerator performance
  • network capacity
  • power availability
  • cooling efficiency
  • workload demand


An inference cluster is a group of servers dedicated to running AI workloads in production.

Typical components include:

  • accelerator-equipped servers
  • high-speed networking
  • storage systems
  • load balancing software
  • monitoring systems

Inference clusters operate continuously and support real-time applications.


Electricity is the largest operating cost.

Typical operating cost distribution:

Power:

40–60%

Cooling:

20–30%

Hardware depreciation:

15–25%

Operations and staffing:

5–10%


Accelerators generate significant heat during operation.

Effective cooling prevents:

  • performance degradation
  • hardware failure
  • energy inefficiency
  • service interruption

Modern AI facilities commonly use:

  • liquid cooling
  • direct-to-chip cooling
  • hot aisle containment
  • rear door heat exchangers


An inference-optimised data centre is designed to support continuous, high-volume production workloads.

Key characteristics include:

  • high accelerator density
  • low latency networking
  • scalable power capacity
  • efficient thermal management
  • high system reliability


Scaling refers to increasing infrastructure capacity to support additional workloads.

Two scaling methods:

Vertical scaling:

Increasing the performance of a single system.

Horizontal scaling:

Adding more systems to the environment.

Modern AI data centres rely primarily on horizontal scaling.


Capacity is the maximum infrastructure capability.

Utilisation is the percentage of that capacity currently in use.

Example:

Installed capacity:

100 GPUs

Active usage:

70 GPUs

Utilisation:

70%


SLA stands for Service Level Agreement.

An SLA defines the performance and reliability commitments of a service provider.

Typical SLA metrics include:

  • system availability
  • response time
  • recovery time
  • support levels
  • uptime guarantee

Standard enterprise SLA target:

99.99% uptime


Networking determines how quickly data moves between systems within the data centre.

High-performance networking enables:

  • low latency communication
  • high throughput data transfer
  • reliable system scaling

Common data centre networking technologies include:

  • 100G Ethernet
  • 400G Ethernet
  • 800G Ethernet
  • InfiniBand
  • RDMA


Consistent infrastructure utilisation.

High utilisation ensures:

  • predictable revenue
  • efficient resource usage
  • lower cost per workload
  • sustainable operations

Idle infrastructure generates cost but no revenue.


Compute capacity is directly limited by available electrical power.

More power enables:

  • more servers
  • more accelerators
  • higher rack density
  • increased workload capacity

Power is the primary constraint in modern AI infrastructure.


CPU workloads handle general-purpose computing tasks.

Examples:

  • web services
  • databases
  • application hosting

Accelerator workloads handle high-performance computing tasks.

Examples:

  • artificial intelligence
  • machine learning
  • image processing
  • data analytics


A high-performance AI data centre delivers:

  • high accelerator utilisation
  • low latency response
  • high throughput capacity
  • stable power supply
  • efficient cooling systems
  • scalable infrastructure design


The operational objective is to maximise compute output while minimising energy consumption and downtime.

This is achieved through:

  • efficient acceleration
  • reliable power infrastructure
  • intelligent workload scheduling
  • continuous system monitoring
  • proactive maintenance


The core principle is:

Inference drives revenue.
Acceleration drives efficiency.
Utilisation drives profitability.


Copyright © 2026 Watercloud International (Singapore) Pte. Ltd.- All Rights Reserved.

Powered by WAFEnity™

  • Press Release - 26/03/26
  • What We Do
  • Who We Work With
  • Digital Transformation
  • Food Security
  • Our Core Business Groups
  • Distillery Transformation
  • Fresh Oak Maturation
  • Commercial Enablement
  • WAFEnity
  • Integrated Across
  • Ceramax
  • WAFEnity FAQ

This website uses cookies.

We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.

Accept