CAS, RAMSS, eHR outage: Saturday, Feb. 16 2013 5:04 AM to 8:31 AM

A network device failed at 5:04 AM this morning making Ryerson’s Central Authentication System unavailable as well as RAMSS and eHR. The system was restored by 8:31 AM this morning.

Please accept our apologies for the outage.

For those interested in a more detailed technical explanation of this outage, here is a little more information:

The Virtual Router Redundancy Protocol (VRRP) instance on the Alteon load balancer that services the subnet that the SSL accelerators are on,  stopped working this morning at 5:04am. The result was that the SSL accelerators were not able to comunicate with hosts outside their subnet. As a result, the applications that are ssl offloaded by this load balancer, eHR, RAMMS and CAS were not working. All the health checks were good on the load balancer and there were no log entries regarding the problem. This time the SSL accelerators were not the problem, it was the alteon load balancer.

The VRRP instance was disabled to make the load blancer work. This is the first time we have encountered this type of problem.

-Computing and Communications Services

 

Posted in CAS, eHR, RAMSS | Comments Off on CAS, RAMSS, eHR outage: Saturday, Feb. 16 2013 5:04 AM to 8:31 AM

my.ryerson service disruption for users with applicant role: Monday Feb. 11, 2013

CCS became aware on Monday morning that users who only have an applicant role at Ryerson were unable to login to the my.ryerson portal since Friday, Feb. 8, when the 2-factor authentication service was released.

Upon investigation, a logic error was discovered in the web service that CAS calls for 2-factor authentication.  The problem was resolved on Monday around noon.

We apologize for the inconvenience.

-Computing and Communications Services

Posted in my.ryerson | Comments Off on my.ryerson service disruption for users with applicant role: Monday Feb. 11, 2013

RMS Administration, Activation and Self Service outage: Wednesday, Feb. 6 2013 11:10 AM – 11:15 AM

A back end RMS application failed at approximately 11:10 AM this morning.  It was restarted at 11:15 AM.    During the ~ 5 minute outage, the following services were not available: RMS Administration, Activation and Self Service.

The cause of the problem is still being investigated.

We apologize for the inconvenience.

-Computing and Communications Services

Posted in eResources | Comments Off on RMS Administration, Activation and Self Service outage: Wednesday, Feb. 6 2013 11:10 AM – 11:15 AM

VAPPS service disruption: Monday, Feb. 4 2012 10:18 AM – 10:47 AM

The virtual applications (VAPPS) service was not available between 10:18 AM – 10:47 AM.   Technical staff are investigating the root cause of the problem and working with the vendor to resolve the issue.

We apologize for the inconvenience.

– Computing and Communications Services

Posted in All Categories | Comments Off on VAPPS service disruption: Monday, Feb. 4 2012 10:18 AM – 10:47 AM

VAPPS service disruption: Saturday, Feb. 2 2012 8:38 AM

The virtual applications (VAPPS) service has not been available since approximately 8:38 AM.   The problem seems to be related to a network load balancer.   Network staff are investigating the root cause of the problem and working with the vendor to resolve the issue.

Update 2:10 PM: Service was restored at approximately 1:10 PM.

We apologize for the inconvenience.

– Computing and Communications Services

Posted in All Categories | Comments Off on VAPPS service disruption: Saturday, Feb. 2 2012 8:38 AM

VAPPS service disruption: Thursday, Jan. 31 2012 02:45 AM – 09:50 AM

Virtual applications (VAPPS) experienced intermittent service disruptions this morning, starting at approximately 2:45 AM and ending around 9:50 AM.   The problem seems to be related to a network load balancer issue.   Technical staff are investigating the root cause of the problem.

We apologize for the inconvenience.

– Computing and Communications Services

Posted in All Categories | Comments Off on VAPPS service disruption: Thursday, Jan. 31 2012 02:45 AM – 09:50 AM

Confluence and Jira outage: Tuesday, Jan. 22 2012 11:15 AM – noon

We will be doing some emergency maintenance on Confluence and Jira from 11:15 AM – noon, today.  During this period services will not be available.

We apologize for the inconvenience.

– Computing and Communications Services

Posted in All Categories | Comments Off on Confluence and Jira outage: Tuesday, Jan. 22 2012 11:15 AM – noon

FAX / XMedius service outage: Monday, Jan. 21 2013 9:00AM

The XMedius FAX service has been unavailable since 9:00 AM due to a performance issues on a new RAID storage system.   We are planning to migrate the server’s storage to a new RAID system overnight to correct the problem.

We apologize for any inconvenience this problem may cause.

– Computing and Communications Service

Posted in All Categories | Comments Off on FAX / XMedius service outage: Monday, Jan. 21 2013 9:00AM

Alfresco service outage: Sunday, Jan. 20 2013 8:53 PM – 9:35 PM

While troubleshooting a storage performance problem, the Alfresco database became unavailable at approximately 8:53 PM.   Services were restored around 9:35 PM.   The outage affected the Admissions and Campus Facilities and Sustainability document  management systems.

Update: A problem with the storage provided to Alfresco caused database crashes aroudn 1:19 AM.  Staff are looking into the problem.

We apologize for any inconvenience this problem may cause.

– Computing and Communications Service

Posted in All Categories | Comments Off on Alfresco service outage: Sunday, Jan. 20 2013 8:53 PM – 9:35 PM

VMware performance degradation – Sunday, Jan. 20 2013

VMware has been experiencing a performance degradation due to the migration of the storage from one storage RAID array to another.   Technical staff are working with the vendor to determine the cause of the problem.    Some virtual servers have been migrated to another RAID array.   Updates will be provided periodically until the problem is resolved.

Update 1:  We continue to experience performance degradation.  A few virtual machines have been migrated to a different storage RAID system, and we will continue this process overnight.

Update 2 (Monday, Jan 21, 11:30pm):  We are still experiencing performance issues, but a number of virtual servers have been migrated to a different RAID storage system where performance is good.   The migration process is slow because the READs off the current storage is slow.  We’re scheduling a number of batch migrations over night. 

Update 3 (Tuesday, Jan 22, 11:20pm):  Some performance issues are still present, but we have migrated many virtual servers (VMs) from the affected RAID system to a different RAID system where the performance is good.   The migration process is continuing overnight.  The vendors continue to investigate and are narrowing down the source of the problem.  We expect the impact to be minor on Wednesday morning.

Final update  (Friday, Jan 25, 8:45 am): By Tuesday evening, overall performance had improved.  We continued the migration of virtual machines overnight and services behaved normally throughout the day.  The migration of the virtual machines completed last night.   A failing IO module on the disk array was replaced on Wednesday, but vendor support continues their investigation.  However, the VMware infrastructure will stay on the new storage array system and will not be migrated back.

We apologize for any inconvenience this problem caused.

– Computing and Communications Services

Posted in All Categories | Comments Off on VMware performance degradation – Sunday, Jan. 20 2013