WAFEnity FAQ

Frequently Asked Questions

Please reach us at admin@watercloud1.com if you cannot find an answer to your question.

What is inference in data centre speak?

Inference is the process of using a trained AI model to produce results in real time.

In operational terms, inference is the live production workload running inside the data centre. It is the activity that generates revenue from AI infrastructure.

Training builds the model.
Inference runs the service.

Why is inference important to a data centre?

Inference runs continuously and consumes the majority of compute resources.

Most AI data centres operate with:

80–95% inference workloads
5–20% training workloads

Inference determines:

revenue generation
power demand
rack utilisation
customer performance
infrastructure scaling

What is acceleration in data centre speak?

Acceleration refers to specialised hardware or software designed to increase the speed and efficiency of compute workloads.

Accelerators perform complex calculations significantly faster than standard processors.

Acceleration reduces:

processing time
energy consumption
cost per computation

What is the difference between inference and acceleration?

Inference is the workload.
Acceleration is the technology that enables the workload to run efficiently.

Inference generates output.
Acceleration improves performance.

What is an accelerator?

An accelerator is a processor designed specifically for high-performance computing tasks such as artificial intelligence, machine learning, and data analytics.

Common accelerator types include:

GPU (Graphics Processing Unit)
NPU (Neural Processing Unit)
TPU (Tensor Processing Unit)
ASIC (Application-Specific Integrated Circuit)
FPGA (Field-Programmable Gate Array)

What is GPU utilisation?

GPU utilisation is the percentage of time a GPU is actively processing workloads.

Example:

100 GPUs installed
80 GPUs running workloads

GPU utilisation:

80%

High utilisation indicates efficient infrastructure usage and strong revenue performance.

What is latency in data centre speak?

Latency is the time required for a system to process a request and return a response.

Latency is measured in milliseconds.

Lower latency results in:

faster response time
improved user experience
higher service reliability

What is throughput in data centre speak?

Throughput is the total amount of work processed by a system within a specific time period.

Examples:

requests per second
tokens per second
transactions per second
images per second

Higher throughput indicates higher system capacity and productivity.

What is rack density?

Rack density is the amount of electrical power consumed by equipment installed in a single rack.

Measured in:

kilowatts (kW) per rack

Typical ranges:

Traditional data centre:

5–10 kW per rack

AI data centre:

30–80 kW per rack

Next-generation AI infrastructure:

100–150 kW per rack

What is performance per watt?

Performance per watt measures the amount of computing work delivered for each unit of electricity consumed.

This metric is critical because electricity is the largest operating cost in modern data centres.

Higher performance per watt results in:

lower operating costs
improved efficiency
increased profitability

What determines revenue per rack?

Revenue per rack is determined by three primary factors:

Inference throughput
×
Utilisation rate
×
Service pricing

Supporting factors include:

accelerator performance
network capacity
power availability
cooling efficiency
workload demand

What is an inference cluster?

An inference cluster is a group of servers dedicated to running AI workloads in production.

Typical components include:

accelerator-equipped servers
high-speed networking
storage systems
load balancing software
monitoring systems

Inference clusters operate continuously and support real-time applications.

What is the largest operating cost in an AI data centre?

Electricity is the largest operating cost.

Typical operating cost distribution:

Power:

40–60%

Cooling:

20–30%

Hardware depreciation:

15–25%

Operations and staffing:

5–10%

Why is cooling critical in an accelerated data centre?

Accelerators generate significant heat during operation.

Effective cooling prevents:

performance degradation
hardware failure
energy inefficiency
service interruption

Modern AI facilities commonly use:

liquid cooling
direct-to-chip cooling
hot aisle containment
rear door heat exchangers

What is an inference-optimised data centre?

An inference-optimised data centre is designed to support continuous, high-volume production workloads.

Key characteristics include:

high accelerator density
low latency networking
scalable power capacity
efficient thermal management
high system reliability

What is scaling in data centre operations?

Scaling refers to increasing infrastructure capacity to support additional workloads.

Two scaling methods:

Vertical scaling:

Increasing the performance of a single system.

Horizontal scaling:

Adding more systems to the environment.

Modern AI data centres rely primarily on horizontal scaling.

What is capacity versus utilisation?

Capacity is the maximum infrastructure capability.

Utilisation is the percentage of that capacity currently in use.

Example:

Installed capacity:

100 GPUs

Active usage:

70 GPUs

Utilisation:

70%

What is an SLA?

SLA stands for Service Level Agreement.

An SLA defines the performance and reliability commitments of a service provider.

Typical SLA metrics include:

system availability
response time
recovery time
support levels
uptime guarantee

Standard enterprise SLA target:

99.99% uptime

What role does networking play in inference performance?

Networking determines how quickly data moves between systems within the data centre.

High-performance networking enables:

low latency communication
high throughput data transfer
reliable system scaling

Common data centre networking technologies include:

100G Ethernet
400G Ethernet
800G Ethernet
InfiniBand
RDMA

What is the most important success factor for an AI data centre?

Consistent infrastructure utilisation.

High utilisation ensures:

predictable revenue
efficient resource usage
lower cost per workload
sustainable operations

Idle infrastructure generates cost but no revenue.

What is the relationship between power and compute capacity?

Compute capacity is directly limited by available electrical power.

More power enables:

more servers
more accelerators
higher rack density
increased workload capacity

Power is the primary constraint in modern AI infrastructure.

What is the difference between CPU and accelerator workloads?

CPU workloads handle general-purpose computing tasks.

Examples:

web services
databases
application hosting

Accelerator workloads handle high-performance computing tasks.

Examples:

artificial intelligence
machine learning
image processing
data analytics

What defines a high-performance AI data centre?

A high-performance AI data centre delivers:

high accelerator utilisation
low latency response
high throughput capacity
stable power supply
efficient cooling systems
scalable infrastructure design

What is the operational objective of a modern AI data centre?

The operational objective is to maximise compute output while minimising energy consumption and downtime.

This is achieved through:

efficient acceleration
reliable power infrastructure
intelligent workload scheduling
continuous system monitoring
proactive maintenance

What is the core operating principle of AI infrastructure?

The core principle is:

Inference drives revenue.
Acceleration drives efficiency.
Utilisation drives profitability.

Watercloud Global Infrastructure

Safest, Fastest GPU Honeycomb Infrastructure

Frequently Asked Questions

Watercloud Global Infrastructure

Safest, Fastest GPU Honeycomb Infrastructure

Frequently Asked Questions

This website uses cookies.