ORACLE DBA · ASM · STORAGE · DISK GROUPS

Oracle ASM (Automatic Storage Management): Disk Groups, Rebalance and Best Practices

👤 Nasir Uddin Khan · OCP · Oracle DBA 18+ Years 📅 July 2026 ⏱️ 17 min read 🏷️ ASM, Storage, Disk Groups, Rebalance, RAC

Storage is where a lot of Oracle performance and availability problems are quietly born - fragmented files, uneven I/O across disks, painful storage migrations that need downtime. Oracle ASM (Automatic Storage Management) exists to make those problems go away by acting as Oracle's own volume manager and file system, purpose-built for database files. If you run RAC you already depend on it, and even on single instances it is Oracle's recommended way to lay out storage. This guide explains what ASM actually is, how disk groups and redundancy work, how to add and drop storage online with rebalance, and how to monitor and troubleshoot it in production.

Key Takeaways

ASM is Oracle's built-in volume manager and file system for database files - it stripes data across all disks in a disk group and removes manual file placement.
A disk group is the core unit: a pool of disks that ASM treats as one, with a redundancy level of external, normal, or high.
Failure groups tell ASM which disks share a point of failure, so normal/high redundancy can mirror across them and survive a disk or enclosure loss.
Adding or dropping disks triggers an online rebalance - storage changes with no downtime, and REBALANCE POWER controls how fast versus how disruptively it runs.
asmcmd and the V$ASM views are your everyday tools for space, rebalance progress, and disk status.
In RAC, ASM runs under Grid Infrastructure, and Flex ASM lets database instances keep running even if a local ASM instance fails.

Hard disk drives and their internal components - the raw disks Oracle ASM manages — Photo: Marta Branco / Pexels

1. What ASM Actually Is

Automatic Storage Management is a lightweight volume manager and file system that Oracle provides specifically for database files - datafiles, control files, redo logs, archive logs, and RMAN backups. Instead of placing files on a traditional file system and worrying about which disk holds what, you give ASM a set of raw disks and it manages the layout for you.

ASM does three things a normal file system does not do well for databases. It stripes every file evenly across all the disks in a group, so I/O is balanced automatically. It can mirror data across disks for redundancy. And it rebalances online when you add or remove storage, moving data with no downtime. It also avoids the fragmentation that plagues ordinary file systems under heavy database write activity.

2. Why Use ASM Instead of a File System?

The honest answer is balance, simplicity, and online change. On a file system, a DBA has to decide which datafile goes on which mount point and hope the I/O works out evenly - it rarely does, and hot files create hot disks. ASM spreads every file across every disk automatically, so the load is even by design.

The second reason is online storage operations. Need more space? Add disks and ASM rebalances live. Retiring an old SAN? Add the new disks, drop the old ones, and ASM migrates the data while the database stays open. Doing that on a file system usually means downtime. For RAC, ASM is effectively required, because it provides the shared storage all nodes see identically.

3. Disk Groups, Redundancy and Failure Groups

The central concept is the disk group - a set of disks ASM manages as a single pool. Most sites run at least two: one for data (often called DATA) and one for the Fast Recovery Area (FRA). Each disk group has a redundancy level chosen when you create it:

External redundancy: ASM does no mirroring; you rely on the storage array's own RAID. Simple and space-efficient when you trust the SAN.
Normal redundancy: ASM keeps two copies of each extent, mirrored across failure groups - survives losing one failure group.
High redundancy: three copies - survives losing two failure groups. Used where the storage itself is not otherwise protected.

A failure group is how you tell ASM which disks share a risk - for example, all disks in one storage enclosure or on one controller. With normal or high redundancy, ASM guarantees the mirror copies land in different failure groups, so a whole enclosure can fail without data loss.

4. Creating and Managing Disk Groups

You can manage ASM through the asmca GUI, but the SQL and command-line paths are what you script and automate. Disk group DDL runs from an ASM instance connection (sqlplus / as sysasm).

-- Create a normal-redundancy disk group with two failure groups
CREATE DISKGROUP data NORMAL REDUNDANCY
  FAILGROUP fg1 DISK '/dev/sdb1' NAME data_0001
  FAILGROUP fg2 DISK '/dev/sdc1' NAME data_0002
  ATTRIBUTE 'au_size' = '4M',
            'compatible.asm' = '19.0.0.0.0',
            'compatible.rdbms' = '19.0.0.0.0';

The allocation unit (au_size) is the striping granularity; 4M is a common choice for large databases. Once created, the database simply references files by disk group name, for example +DATA, and ASM handles physical placement.

Modern server storage array unit - the kind of hardware behind an Oracle ASM disk group — Photo: Jakub Zerdzicki / Pexels

5. Adding and Dropping Disks: Online Rebalance

This is ASM's best feature in daily operations. Storage changes happen while the database is open.

-- Add a new disk - ASM automatically starts a rebalance
ALTER DISKGROUP data ADD DISK '/dev/sdd1' NAME data_0003;

-- Retire an old disk - data migrates off it, then it drops
ALTER DISKGROUP data DROP DISK data_0001;

-- Control how aggressively the rebalance runs (0 = pause, 1 = gentle, up to high)
ALTER DISKGROUP data REBALANCE POWER 4;

-- Watch rebalance progress
SELECT operation, state, power, sofar, est_work, est_minutes
FROM   v$asm_operation;

The REBALANCE POWER setting is the practical lever. A high power finishes faster but consumes more I/O, so during business hours you keep it modest and raise it in quiet windows. A dropped disk is not physically removed until its rebalance completes and V$ASM_DISK shows it gone - pulling it early risks data loss on normal redundancy.

6. Monitoring ASM Day to Day

Two toolsets cover almost everything: the asmcmd command line and the V$ASM views.

-- asmcmd: quick space and file checks (as the grid user)
asmcmd lsdg              # disk groups with free/used space
asmcmd lsdsk -k          # disks and their sizes
asmcmd du +DATA/ORCL     # space used by a database's files

-- SQL: free space and disk health
SELECT name, type, total_mb, free_mb,
       ROUND(free_mb/total_mb*100,1) AS pct_free
FROM   v$asm_diskgroup;

SELECT group_number, name, path, mount_status, mode_status, state
FROM   v$asm_disk ORDER BY group_number;

The one number to alert on is free space per disk group. With normal or high redundancy you must also keep enough free space to survive a disk failure - the space needed to re-mirror the lost data - which ASM reports as required mirror free space. Running a mirrored disk group too full removes its ability to self-heal.

7. ASM in RAC and Flex ASM

In a RAC cluster, ASM runs as part of Grid Infrastructure and provides the shared storage every node sees. Historically each node ran its own ASM instance, and if that instance died the database instance on that node went down with it.

Flex ASM removed that tight coupling. Database instances connect to ASM over the network, and a fixed number of ASM instances serve the whole cluster. If a node's ASM instance fails, its databases simply reconnect to an ASM instance on another node and keep running. This makes the storage layer far more resilient, and it is the default modern configuration for RAC.

8. Common Issues and How to Handle Them

Disk group filling up: the most frequent ASM alert. Add disks (online rebalance) or reclaim space in the FRA. Do not let a mirrored group run so full it loses its rebalance headroom.
Rebalance taking too long or hurting performance: lower the REBALANCE POWER during business hours, raise it overnight. It is fully online and resumable.
ORA-15041 (diskgroup space exhausted) / ORA-15040: space or a missing disk. Check V$ASM_DISK for a disk in an unexpected state and confirm the OS actually presents all the disks.
Disk not discovered: ASM only sees disks matching its discovery string and with correct OS permissions. Verify the device permissions for the grid user and the asm_diskstring parameter.

9. A Real Migration: Swapping the SAN With Zero Downtime

A pharmaceutical client had to move a production database off an ageing storage array onto a new one, and the validated system could not take an outage. On a file system this would have meant a maintenance window and a risky copy.

What we did: presented the new array's LUNs to the server, added them to the DATA and FRA disk groups, and let ASM rebalance the data onto the new disks with a modest REBALANCE POWER during the day and a higher one overnight. Once the new disks were fully populated, we dropped the old ones - ASM migrated the last extents off them online - and had them removed only after V$ASM_OPERATION showed the rebalance complete. The database never closed, application users never noticed, and the validation record showed a clean, controlled change. This pairs naturally with a regular health check and sensible performance tuning, since balanced storage removes a whole class of I/O bottlenecks.

Server racks in a data center - production storage infrastructure running Oracle ASM — Photo: panumas nikhomkhai / Pexels

10. When a Disk Actually Fails: What Really Happens at 2 AM

Redundancy design stays theoretical until a disk dies under load. A few years back a monitoring alert woke me for a normal-redundancy DATA group on a manufacturing system: one disk had gone offline with I/O errors. The database never blinked - ASM was serving every read from the surviving mirror copies - but the clock was already running.

-- First thing I check when a storage alert fires
SELECT name, path, state, mode_status, repair_timer
FROM   v$asm_disk
WHERE  mode_status != 'ONLINE';

That repair_timer is the part many DBAs discover the hard way. The disk_repair_time attribute (3.6 hours by default) is how long ASM waits for an offline disk to come back. Bring it back inside the window and fast mirror resync copies only the extents that changed - minutes of work. Miss the window and ASM force-drops the disk and starts a full rebalance of everything it held, which on a large group can grind for hours.

My replacement drill, in order:

Confirm from the OS and array which physical disk failed, and check repair_timer to see how much window is left.
If the failure was transient (a controller reset, a path flap), fix the path and run ALTER DISKGROUP data ONLINE DISK data_0002; - the resync is quick.
If the disk is genuinely dead, add the replacement LUN and drop the failed disk with DROP DISK ... FORCE in one statement, so a single rebalance covers both.
Watch v$asm_operation until it returns no rows, then re-check that required_mirror_free_mb headroom is restored before closing the ticket.

That last check matters: until the rebalance finishes, a second disk failure in another failure group can cost you the disk group. The incident is not over when the database looks fine - it is over when redundancy is fully re-established.

11. My Honest Redundancy Advice - and the asmcmd Habits That Pay Off

What do I actually choose in production? On a proper enterprise SAN with battery-backed RAID and dual controllers, I use external redundancy for DATA and FRA - mirroring twice (once in ASM, once in the array) wastes half the storage for little gain. On local NVMe, cheap iSCSI, or anywhere the array's protection is doubtful, I use normal redundancy and define failure groups deliberately. The Grid Infrastructure group holding OCR and voting files gets normal or high regardless, because losing quorum takes the whole cluster down.

Day to day, three asmcmd habits catch problems before they page me:

asmcmd lsdg                          # weekly: free space + required mirror free
asmcmd iostat -G DATA 5              # live per-disk I/O when chasing a slow spell
asmcmd md_backup /backup/dg_meta.bkp # monthly: disk group metadata backup

The md_backup one is insurance almost nobody takes. If a disk group is ever destroyed by an admin error, md_restore rebuilds its exact structure - names, attributes, directories - so RMAN can restore into it immediately instead of you reconstructing DDL from memory during the worst hour of your year.

Frequently Asked Questions

What is Oracle ASM and why use it?

ASM (Automatic Storage Management) is Oracle's own volume manager and file system for database files. It automatically stripes data across all disks in a disk group for balanced I/O, can mirror for redundancy, and rebalances online when storage changes. It removes manual file placement and is the recommended storage layout, and effectively required for RAC.

What is a disk group and what redundancy should I choose?

A disk group is a pool of disks ASM manages as one. External redundancy does no ASM mirroring and relies on the array's RAID; normal redundancy keeps two mirrored copies across failure groups; high redundancy keeps three. Choose external when you trust the SAN's protection, and normal or high when you want ASM itself to protect against disk or enclosure loss.

Can I add or remove storage without downtime?

Yes - this is one of ASM's biggest strengths. Adding a disk triggers an online rebalance that spreads data onto it, and dropping a disk migrates data off it first, all while the database stays open. REBALANCE POWER controls how fast the operation runs versus how much I/O it consumes.

What does REBALANCE POWER do?

It sets how aggressively ASM moves data during a rebalance. A higher power finishes sooner but uses more I/O and can affect performance; a lower power is gentler but slower. A common practice is a modest power during business hours and a higher one overnight. Rebalance is online and resumable.

What is Flex ASM in a RAC cluster?

Flex ASM lets database instances connect to ASM over the network rather than depending on a local ASM instance on the same node. If a node's ASM instance fails, its databases reconnect to an ASM instance on another node and keep running, making the storage layer far more resilient. It is the default modern RAC configuration.

💽 Planning ASM, a Storage Migration, or RAC Storage?

I design and manage Oracle ASM disk groups, redundancy, and zero-downtime storage migrations for single-instance and RAC. Bangladesh and worldwide clients.

Book a Consultation → 💬 WhatsApp Me

About the Author

Nasir Uddin Khan Senior IT Consultant · Oracle DBA · ERP & AI Specialist OCP · Red Hat Certified · MBA · CSV · 18+ Years Experience

Nasir is an Oracle Certified Professional and CSV-certified IT consultant based in Dhaka, Bangladesh. He has 18+ years of hands-on experience in Oracle database administration (RAC, Data Guard, RMAN), WebLogic middleware, ERP system design, and AI integration for manufacturing, pharmaceutical, banking, and healthcare organisations worldwide.

About Nasir → LinkedIn Book a Consultation

References & Further Reading

The procedures and case studies in this article are based on 18+ years of Oracle production database administration across manufacturing, banking, and pharmaceutical environments.