Groupwise (GWP2) service interruption – Sept. 12, 2012 2:00 AM – 6:45 AM

A nightly backup job that normally runs between 2 AM – 6 AM on GWP2 failed to complete and prevented the startup of the post office agent.   The server was power cycled and services were restored around 6:45 AM.  The affected user community included various administrative staff groups.

We apologize for the service disruption.

– Computing and Communications Services

Posted in GroupWise | Comments Off on Groupwise (GWP2) service interruption – Sept. 12, 2012 2:00 AM – 6:45 AM

Scheduled Wireless Service Disruption – Saturday, September 8, 2012 between 1 AM and 1:05 AM

At approximately 1 AM on Saturday September 8, 2012 the wireless service across campus will be disrupted for about 5 minutes.

CCS will be adding a wireless network in the new MAC building.

We apologize in advance for any inconvenience this will cause.

– Computing and Communications Services

Posted in All Categories, wireless | Tagged | 2 Comments

Wireless service disruption: Friday, September 7, 2012 7:20 AM – 7:45 AM

The wireless network was unavailable in the Library, Podium and Jorgenson Hall buildings this morning. The issue was first reported at 7:20 AM this morning. Network Services restored services by 7:45 AM. The outage was caused by an unresponsive system process.

We apologize for the service disruption.

– Computing and Communications Services

Posted in All Categories, wireless | Tagged | Comments Off on Wireless service disruption: Friday, September 7, 2012 7:20 AM – 7:45 AM

Academic wireless outage: Tuesday, Sept. 4, 2012 07:30 AM – 9:10 AM

A problem with the wireless infrastructure caused users to receive an authentication failure this morning, starting at approximately 7:30 AM.  Services were restored at 9:10 AM.

We apologize for the service disruption.

-Computing and Communications Services

Posted in All Categories, wireless | Comments Off on Academic wireless outage: Tuesday, Sept. 4, 2012 07:30 AM – 9:10 AM

Scheduled wireless security certificate update, Friday, August 31, 2012 between 6 am and 8 am

Wireless security certificates will be updated on August 31, 2012 between 6 am and 8 am. Existing users with MAC and iOS devices having issues connecting to the RU-Secure wireless network should follow the instructions available at http://www.ryerson.ca/ccs/connecting/oncampus/wireless/how/newCertificate.html.
– Computing and Communications Services

Posted in All Categories, wireless | Tagged | Comments Off on Scheduled wireless security certificate update, Friday, August 31, 2012 between 6 am and 8 am

Rmail, my.ryerson outages – Mon. August 27, 2012

Early this morning a device on our Storage Area network failed. As a result disk storage for Rmail, my.ryerson, Blackboard, and other systems went offline. (Other services like RAMSS and eHR are not reachable when my.ryerson is down.) We are investigating the cause and working to restore service. Updates to follow.

Update 8:20 AM: A storage appliance that manages the storage connections to the Rmail, my.ryerson and other servers failed. A second storage appliance is available as a backup but for some reason the servers were not able to tolerate the storage connectivity failures before the second device took over.

Update 9:36 AM: The problem seems to be related with a failed controller on one of the storage appliances that provides storage virtualization. The vendors are working on the problem and replacement parts are being sent to us.

In the meantime, we’re also working on bringing up services on the backup appliance, with the idea that we will be able to “fail over” to the other appliance (in a controlled way) when it is working properly.

Update 12:37 PM: With the help of the vendor, whose staff are on site, we are in the process of restoring/rebuilding the failed storage appliance. We estimate that the earliest we can be back on line is some time after 2:40 PM. It may take significantly longer. In parallel with the work to restore the failed device we are also preparing another appliance which we can reconfigure to replace the failed unit.

Update 2:21 PM: We have run into other problems rebuilding the storage appliance that failed and now think we will have a replacement appliance ready before we can repair the unit that failed. This pushes back the earliest possible time to fix the problem until after 5 PM. Once again, please accept my apologies for the prolonged outage.

Update 5:40 PM: The new storage appliance is in place but it will be many hours before all services are restored. The appliance must go through a diagnostic process, test its connections to the storage devices, and be reconfigured to provide storage for each server. Initial estimates on how long that would take are proving to be optimistic. Once everything is working again we must start the process of validating that all files and databases have been recovered correctly. From what we have seen in the past this will also take hours for some services like Rmail. Based on what we’ve seen over the last hour we can’t provide a good estimate of how long it will take to get on line this evening. CCS staff will work as long as necessary to recover all services.

Update 9:30 PM: At roughly 7:30 PM we experienced a setback with restoring the configuration of the appliance. The problem means we do not have network access to the device and we are working with the vendor to continue recovering the device’s configuration. We have lost at least two hours because of this.

I know many people would like to know when the system will be restored. We are still working towards it being back online in the late evening or early hours in the morning, but at this time, we cannot predict when service will be restored.
-Brian

Update 10:54 PM: With the help of the vendor’s third-level support staff we have finally completed the automatic recovery process and are beginning the final configuration steps of the storage appliance. Network connectivity has also been established.

Update 11:17 PM: Both storage appliances are now restored and working correctly. Staff are now checking the databases and file systems that were impacted by the outage.

Update 00:40 AM Tuesday, Aug. 28: A few systems have now been recovered and services are available.   CAS, uPortal services were restored around 00:20 AM,  Blackboard was restored around 00:25 AM.   RMAIL was restored around 00:45 AM.     The GroupWise web interface had to be restarted and was unavailable between ~ 11:00 PM – 11:40 PM.   The Cold Fusion services were also restored around 00:35 AM.  We continue the recovery of other systems.

Update 01:30 AM Tuesday, Aug. 28: Alfresco services were restored at approximately 01:10 AM.   We continue working on the recovery of the RAMSS and HR systems.

Update 02:20 AM Tuesday, Aug. 28: The RAMSS services were restored at approximately 02:00 AM.   We continue working on the recovery of the HR system and we expect it to be available shortly.

Update 02:45 AM Tuesday, Aug. 28: The HR system is available as of 2:45 AM.  All services have been restored. 

Update 11:25 AM Tuesday, Aug. 28: During the outage the front-end Rmail servers that receive email continued to function while the back-end systems that store mail for each email account were offline. Before service was restored the front-end servers queued over 100,000 messages for later delivery. This morning the queues cleared and all pending mail was delivered. We do not believe any mail was lost during the outage.

We apologize for the service disruption.

-Computing and Communications Services

Posted in Blackboard, eHR, my.ryerson, RMail | 178 Comments

Intermittent my.ryerson.ca login problem – Friday, Aug. 24, 2012

One of the Central Authentication System (CAS) servers that allows you to log into my.ryerson.ca and other systems experienced a problem this morning.   This caused some users to experience an authentication failure when attempting to login to my.ryerson.ca or other systems that rely on CAS.

This particular server has been taken out of the load balanced group until the problem is investigated and resolved.  Services are working normally as of 8:55 AM.

Please accept our apologies for the problems the service disruption caused.

-Computing and Communications Services

Posted in Blackboard, eHR, eResources, my.ryerson, RAMSS | Comments Off on Intermittent my.ryerson.ca login problem – Friday, Aug. 24, 2012

GroupWise web service interruption & mail delivery delay: Thursday, Aug. 23, 2012 7:45 PM – 08:17 PM

A GroupWise server hung at approximately 7:45 PM on Thursday, Aug. 23, causing a short delay in mail delivery and unavailability of web client access to GroupWise.    The affected server (GW1) was rebooted and full services restored around 8:17 PM.

We apologize for the service disruption.

-Computing and Communications Services

Posted in GroupWise | Comments Off on GroupWise web service interruption & mail delivery delay: Thursday, Aug. 23, 2012 7:45 PM – 08:17 PM

Groupwise (GWPOS2) service interruption – Aug. 21, 2012 5:00 AM – 7:05 AM

The web interface for the GroupWise server GWPOS2 was not available this morning from 5 AM – 7:05 AM.   A forced reboot was carried out this morning and full services were restored at approximately 7:05 AM.

The outage affected e-mail services for CCS staff.

We apologize for the service disruption.

– Computing and Communications Services

Posted in All Categories | Comments Off on Groupwise (GWPOS2) service interruption – Aug. 21, 2012 5:00 AM – 7:05 AM

Interruption to Wireless Access – 12:30 pm 20-August-20

Wireless access was interrupted for all locations except TRSM.

One of the wireless controllers had to be rebooted to clear the issue. Service was restored @ 13:20 20-August-20 .

We apologize for the interruption.

Computing and Communications Services

Posted in All Categories | Comments Off on Interruption to Wireless Access – 12:30 pm 20-August-20