Even ACEs Make Mistakes : You never forget the first time you drop the production database!

At Oracle OpenWorld 2018 a few of the EMEA Oracle ACEs got together to do a lighthearted joint presentation, where we had four minutes each to speak about a mistake we had made and what we learned from it. Not surprisingly the session was called, “Even the ACEs make Mistakes – What did they learn?” Some of us said we would blog our stories, so here’s mine…

I had performed some maintenance on a production database, moving some objects into new tablespaces and purging a lot of old data. It meant that some tablespaces now contained a lot of free space, that would never be reused, so I decided to do a manual tablespace reorganisation to reclaim the unused disk space.

Having moved all the objects out of one tablespace I preceded to drop it, but forgot to add the “including contents and datafiles” clause. That meant I had to manually delete the datafiles from the file system. The only problem was I deleted the wrong ones! Here’s a graphical representation of what happened.

Luckily Oracle has a really good backup/recovery tool called Recovery Manager (RMAN), and I had backups of the database from before I messed up, so I restored and recovered the missing datafiles and everything was working again.

What did I learn?

  • Always have a backup! Luckily I did.
  • If you are going to drop a tablespace, make sure it is the correct one (I did) and make sure you add the “including contents and datafiles” clause (I didn’t), so Oracle will delete the datafiles for you. That way you won’t delete the wrong files by accident.
  • Don’t be me. I’m a dick! 🙂

Cheers

Tim…

Author: Tim...

DBA, Developer, Author, Trainer.

5 thoughts on “Even ACEs Make Mistakes : You never forget the first time you drop the production database!”

  1. Eric: This was back in the days before most of the move operations could be (easily) performed online, so it was in maintenance mode, but that doesn’t stop you panicking when something goes wrong. 🙂

  2. hahahah

    Not an ACE, but for now I only did one big “mistake”. It was like 5 years ago and still I don’t know what happened. Our best guess, we hit a bug.

    I was performing some similar operation, moving objects to a fresh new tablespace to claim some waste space in a production 11.2.0.1 database. I did it step by step manually, moving lobs, moving indexes and tables. Once everything was on the new tablespace, I triple checked the old tablespace was empty, so I first deleted the datafiles from the tablespace with “ALTER TABLESPACE … DROP DATAFILE …” to ensure there’s nothing on them (Oracle will stop you with a ORA-03262 if is not empty) until there’s only one, and then I drop the tablespace with “include contents and datafiles” clause. I made a check and everything was OK until this point. So, for guidelines, we decided to rename the new tablespace back to the old one… “ALTER TABLESPACE … RENAME TO … ” … and then BAM the shit hit the fan. Alert.log spitting ORA-600 warnings, some tables where missing, others got corrupted data, etcetc… Aftermath: 8 hours downtime while restoring and recovering from backup.

    Counterpart: Everybody learned a lesson. Since then, our bosses bought more disk space for Flashback (was disabled) and data (I was trying to claim the maximum possible space because our bosses told so. All the risk was taken for claiming barely 20GB in a 4TB storage), they was more open to buy the Partition license (they though we didn’t need it) and I learned to test EVERY OPERATION in advance, even if you did it thousands of times in different situations.

  3. I’ve often wondered why Oracle doesn’t allow for the “including datafiles” clause of drop tablespace without stating “contents”, so “drop tablespace including datafiles”, allowing the tablespace and datafiles to be dropped only if the tablespace is empty. Given that drop tablespace fails with ORA-01549 if segments are present, it seems sensible for the same safeguard to be present when you want to include the datafiles as well.

    I mean checking is everything but as per your article, in the heat of a change window sometimes the unexpected can happen.

    A mistake I made that springs to mind came while preparing for a change which involved shutting the database down. My workplace at the time required us to include all commands in a change document which could then be reviewed by a peer ahead of implementation. In preparing the document in MS Word I hadn’t realised that I’d copied the majority of the script onto my clipboard buffer and whilst checking something on the database server I unintentionally pasted into the command line.

    Result, that horrible feeling in the stomach as the database shutdown during peak hours. What I learned was to always include a ‘#’ or the ‘SQL>’ prompt at the beginning of any documentation of commands which would cause them to either be ignored by Linux or fail in SQL*Plus. Also I always try and clear my clipboard after copying commands into it.

Comments are closed.

Exit mobile version