# Metrics Reference Complete catalog of metrics collected and exported by Telegen. ## Self-Telemetry Metrics Metrics about Telegen's own operation. ### Collection Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `telegen_spans_collected_total` | Counter | signal_type | Total spans collected | | `telegen_spans_exported_total` | Counter | signal_type, endpoint | Spans exported successfully | | `telegen_spans_dropped_total` | Counter | reason | Spans dropped | | `telegen_metrics_collected_total` | Counter | - | Metrics collected | | `telegen_metrics_exported_total` | Counter | endpoint | Metrics exported | | `telegen_logs_collected_total` | Counter | - | Logs collected | | `telegen_logs_exported_total` | Counter | endpoint | Logs exported | | `telegen_profiles_collected_total` | Counter | - | Profiles collected | ### eBPF Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `telegen_ebpf_programs_loaded` | Gauge | program_type | Number of eBPF programs | | `telegen_ebpf_maps_created` | Gauge | map_type | Number of eBPF maps | | `telegen_ebpf_map_entries` | Gauge | map_name | Entries in each map | | `telegen_ebpf_ringbuf_events_total` | Counter | - | Ring buffer events received | | `telegen_ebpf_ringbuf_lost_total` | Counter | - | Ring buffer events lost | | `telegen_ebpf_perf_events_total` | Counter | cpu | Perf buffer events | | `telegen_ebpf_perf_lost_total` | Counter | cpu | Perf buffer events lost | ### Export Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `telegen_export_requests_total` | Counter | endpoint, status | Export requests | | `telegen_export_errors_total` | Counter | endpoint, error_type | Export errors | | `telegen_export_latency_seconds` | Histogram | endpoint | Export latency | | `telegen_export_batch_size` | Histogram | signal_type | Batch sizes | | `telegen_export_queue_size` | Gauge | signal_type | Current queue depth | ### Process Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `telegen_process_cpu_seconds_total` | Counter | - | CPU time used | | `telegen_process_resident_memory_bytes` | Gauge | - | Memory usage | | `telegen_process_virtual_memory_bytes` | Gauge | - | Virtual memory | | `telegen_process_open_fds` | Gauge | - | Open file descriptors | | `telegen_process_max_fds` | Gauge | - | Max file descriptors | | `telegen_process_start_time_seconds` | Gauge | - | Process start time | | `telegen_go_goroutines` | Gauge | - | Number of goroutines | | `telegen_go_gc_duration_seconds` | Summary | - | GC pause duration | --- ## Node Metrics (node_exporter Compatible) When Node Exporter Fusion is enabled, Telegen exports Prometheus node_exporter compatible metrics. ### CPU Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `node_cpu_seconds_total` | Counter | cpu, mode | CPU time per mode | | `node_cpu_guest_seconds_total` | Counter | cpu, mode | Guest CPU time | | `node_cpu_frequency_hertz` | Gauge | cpu | CPU frequency | | `node_cpu_frequency_max_hertz` | Gauge | cpu | Max CPU frequency | | `node_cpu_frequency_min_hertz` | Gauge | cpu | Min CPU frequency | ### Memory Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `node_memory_MemTotal_bytes` | Gauge | - | Total memory | | `node_memory_MemFree_bytes` | Gauge | - | Free memory | | `node_memory_MemAvailable_bytes` | Gauge | - | Available memory | | `node_memory_Buffers_bytes` | Gauge | - | Buffer memory | | `node_memory_Cached_bytes` | Gauge | - | Cached memory | | `node_memory_SwapTotal_bytes` | Gauge | - | Total swap | | `node_memory_SwapFree_bytes` | Gauge | - | Free swap | | `node_memory_SwapCached_bytes` | Gauge | - | Cached swap | | `node_memory_Active_bytes` | Gauge | - | Active memory | | `node_memory_Inactive_bytes` | Gauge | - | Inactive memory | | `node_memory_Slab_bytes` | Gauge | - | Slab memory | ### Disk Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `node_disk_reads_completed_total` | Counter | device | Read operations | | `node_disk_writes_completed_total` | Counter | device | Write operations | | `node_disk_read_bytes_total` | Counter | device | Bytes read | | `node_disk_written_bytes_total` | Counter | device | Bytes written | | `node_disk_read_time_seconds_total` | Counter | device | Read time | | `node_disk_write_time_seconds_total` | Counter | device | Write time | | `node_disk_io_time_seconds_total` | Counter | device | Total I/O time | | `node_disk_io_now` | Gauge | device | I/Os in progress | | `node_disk_discards_completed_total` | Counter | device | Discard operations | ### Filesystem Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `node_filesystem_size_bytes` | Gauge | device, fstype, mountpoint | Total size | | `node_filesystem_free_bytes` | Gauge | device, fstype, mountpoint | Free space | | `node_filesystem_avail_bytes` | Gauge | device, fstype, mountpoint | Available space | | `node_filesystem_files` | Gauge | device, fstype, mountpoint | Total inodes | | `node_filesystem_files_free` | Gauge | device, fstype, mountpoint | Free inodes | | `node_filesystem_readonly` | Gauge | device, fstype, mountpoint | Read-only flag | ### Network Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `node_network_receive_bytes_total` | Counter | device | Bytes received | | `node_network_transmit_bytes_total` | Counter | device | Bytes transmitted | | `node_network_receive_packets_total` | Counter | device | Packets received | | `node_network_transmit_packets_total` | Counter | device | Packets transmitted | | `node_network_receive_errs_total` | Counter | device | Receive errors | | `node_network_transmit_errs_total` | Counter | device | Transmit errors | | `node_network_receive_drop_total` | Counter | device | Receive drops | | `node_network_transmit_drop_total` | Counter | device | Transmit drops | | `node_network_up` | Gauge | device | Interface up status | | `node_network_speed_bytes` | Gauge | device | Link speed | | `node_network_mtu_bytes` | Gauge | device | MTU | ### Load Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `node_load1` | Gauge | - | 1-minute load average | | `node_load5` | Gauge | - | 5-minute load average | | `node_load15` | Gauge | - | 15-minute load average | ### System Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `node_boot_time_seconds` | Gauge | - | Boot time | | `node_context_switches_total` | Counter | - | Context switches | | `node_forks_total` | Counter | - | Forks | | `node_intr_total` | Counter | - | Interrupts | | `node_procs_running` | Gauge | - | Running processes | | `node_procs_blocked` | Gauge | - | Blocked processes | | `node_uname_info` | Gauge | sysname, release, version, machine, nodename, domainname | System info | --- ## GPU Metrics NVIDIA GPU metrics when AI/ML observability is enabled. | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `telegen_gpu_utilization_ratio` | Gauge | gpu, uuid | GPU utilization (0-1) | | `telegen_gpu_memory_used_bytes` | Gauge | gpu, uuid | Memory used | | `telegen_gpu_memory_total_bytes` | Gauge | gpu, uuid | Total memory | | `telegen_gpu_memory_utilization_ratio` | Gauge | gpu, uuid | Memory utilization (0-1) | | `telegen_gpu_temperature_celsius` | Gauge | gpu, uuid | GPU temperature | | `telegen_gpu_power_watts` | Gauge | gpu, uuid | Power usage | | `telegen_gpu_power_limit_watts` | Gauge | gpu, uuid | Power limit | | `telegen_gpu_clock_graphics_hertz` | Gauge | gpu, uuid | Graphics clock | | `telegen_gpu_clock_sm_hertz` | Gauge | gpu, uuid | SM clock | | `telegen_gpu_clock_memory_hertz` | Gauge | gpu, uuid | Memory clock | | `telegen_gpu_pcie_tx_bytes_total` | Counter | gpu, uuid | PCIe TX bytes | | `telegen_gpu_pcie_rx_bytes_total` | Counter | gpu, uuid | PCIe RX bytes | | `telegen_gpu_ecc_errors_total` | Counter | gpu, uuid, type | ECC errors | | `telegen_gpu_nvlink_tx_bytes_total` | Counter | gpu, uuid, link | NVLink TX | | `telegen_gpu_nvlink_rx_bytes_total` | Counter | gpu, uuid, link | NVLink RX | | `telegen_gpu_compute_processes` | Gauge | gpu, uuid | Compute processes | | `telegen_gpu_graphics_processes` | Gauge | gpu, uuid | Graphics processes | --- ## LLM Inference Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `telegen_llm_requests_total` | Counter | model, endpoint | Inference requests | | `telegen_llm_tokens_input_total` | Counter | model | Input tokens | | `telegen_llm_tokens_output_total` | Counter | model | Output tokens | | `telegen_llm_time_to_first_token_seconds` | Histogram | model | TTFT latency | | `telegen_llm_tokens_per_second` | Gauge | model | Token generation rate | | `telegen_llm_batch_size` | Histogram | model | Batch sizes | | `telegen_llm_cache_hit_ratio` | Gauge | model | KV cache hit ratio | | `telegen_llm_queue_depth` | Gauge | model | Request queue | --- ## Network Flow Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `telegen_flow_bytes_total` | Counter | src, dst, protocol, direction | Flow bytes | | `telegen_flow_packets_total` | Counter | src, dst, protocol, direction | Flow packets | | `telegen_flow_connections_total` | Counter | protocol | Connection count | | `telegen_flow_active_connections` | Gauge | protocol | Active connections | | `telegen_flow_rtt_seconds` | Histogram | src, dst | Round-trip time | | `telegen_flow_retransmits_total` | Counter | src, dst | Retransmissions | --- ## Database Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `telegen_db_queries_total` | Counter | db_type, operation | Query count | | `telegen_db_query_duration_seconds` | Histogram | db_type, operation | Query latency | | `telegen_db_connections_active` | Gauge | db_type, host | Active connections | | `telegen_db_errors_total` | Counter | db_type, error_type | Database errors | | `telegen_db_rows_affected_total` | Counter | db_type, operation | Rows affected | --- ## Connection Statistics Metrics Emitted when a TCP connection closes, providing per-connection byte throughput data. | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `telegen.connection.bytes_sent` | Counter | src_ip, dst_ip, dst_port, protocol | Bytes sent over the connection lifetime | | `telegen.connection.bytes_received` | Counter | src_ip, dst_ip, dst_port, protocol | Bytes received over the connection lifetime | These metrics complement per-request traces, giving aggregate throughput even for protocols that are not fully parsed. --- ## Kafka Consumer Group Metrics Kafka spans with consumer group context include the following additional span attribute: | Attribute | Description | |-----------|-------------| | `messaging.kafka.consumer.group.id` | Consumer group identifier, extracted from JoinGroup and SyncGroup Kafka protocol events | This attribute appears on spans emitted for Fetch requests and group management operations (JoinGroup, SyncGroup). Use it to filter group-specific traces and correlate consumer lag: ```promql # Count Fetch spans by consumer group count(telegen_spans_collected_total{ messaging_system="kafka", messaging_kafka_consumer_group_id=~".+" }) by (messaging_kafka_consumer_group_id) ``` --- ## SNMP Metrics SNMP metrics use the MIB object names with `snmp_` prefix. ### Interface Metrics (IF-MIB) | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `snmp_ifHCInOctets` | Counter | ifIndex, ifDescr | Input octets | | `snmp_ifHCOutOctets` | Counter | ifIndex, ifDescr | Output octets | | `snmp_ifHCInUcastPkts` | Counter | ifIndex, ifDescr | Input packets | | `snmp_ifHCOutUcastPkts` | Counter | ifIndex, ifDescr | Output packets | | `snmp_ifOperStatus` | Gauge | ifIndex, ifDescr | Operational status | | `snmp_ifHighSpeed` | Gauge | ifIndex, ifDescr | Interface speed | | `snmp_ifInErrors` | Counter | ifIndex, ifDescr | Input errors | | `snmp_ifOutErrors` | Counter | ifIndex, ifDescr | Output errors | ### System Metrics (SNMPv2-MIB) | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `snmp_sysUpTime` | Gauge | - | System uptime | | `snmp_sysName` | Info | sysName | System name | | `snmp_sysDescr` | Info | sysDescr | System description | --- ## Storage Array Metrics ### Common Storage Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `telegen_storage_capacity_bytes` | Gauge | array, pool | Total capacity | | `telegen_storage_used_bytes` | Gauge | array, pool | Used capacity | | `telegen_storage_free_bytes` | Gauge | array, pool | Free capacity | | `telegen_storage_iops_read` | Counter | array, volume | Read IOPS | | `telegen_storage_iops_write` | Counter | array, volume | Write IOPS | | `telegen_storage_throughput_read_bytes` | Counter | array, volume | Read throughput | | `telegen_storage_throughput_write_bytes` | Counter | array, volume | Write throughput | | `telegen_storage_latency_read_seconds` | Histogram | array, volume | Read latency | | `telegen_storage_latency_write_seconds` | Histogram | array, volume | Write latency | | `telegen_storage_controller_status` | Gauge | array, controller | Controller health | | `telegen_storage_disk_status` | Gauge | array, disk | Disk health | --- ## Security Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `telegen_security_events_total` | Counter | event_type, severity | Security events | | `telegen_security_syscall_total` | Counter | syscall, comm | Syscall counts | | `telegen_security_file_access_total` | Counter | path, operation | File access | | `telegen_security_process_exec_total` | Counter | binary | Process executions | | `telegen_security_network_connections_total` | Counter | process, direction | Network connections | | `telegen_security_privilege_escalation_total` | Counter | type | Privilege escalations | --- ## Kubernetes Metrics | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `telegen_k8s_pods_discovered` | Gauge | namespace | Pods discovered | | `telegen_k8s_services_discovered` | Gauge | namespace | Services discovered | | `telegen_k8s_deployments_discovered` | Gauge | namespace | Deployments discovered | --- ## Metric Labels ### Common Labels Applied to most metrics: | Label | Description | Example | |-------|-------------|---------| | `host.name` | Hostname | `node-1` | | `service.name` | Service name | `my-service` | | `service.namespace` | Namespace | `production` | | `k8s.pod.name` | Pod name | `my-pod-abc123` | | `k8s.namespace.name` | K8s namespace | `default` | | `k8s.node.name` | K8s node | `node-1` | | `k8s.deployment.name` | Deployment | `my-deployment` | | `container.id` | Container ID | `abc123...` | --- ## Metric Naming Conventions Telegen follows these conventions: 1. **Prefix**: `telegen_` for Telegen-specific metrics 2. **node_exporter**: `node_` prefix for compatibility 3. **SNMP**: `snmp_` prefix with MIB object names 4. **Units**: Suffix with unit (`_bytes`, `_seconds`, `_total`) 5. **Type**: Counter ends with `_total` --- ## Next Steps - {doc}`semantic-conventions` - OpenTelemetry naming - {doc}`../operations/monitoring` - Using these metrics - {doc}`../features/node-exporter-fusion` - node_exporter compatibility