In September 2023, Nyla Technology Solutions sent three members of the Data Science Team to the annual AnitaB.org Grace Hopper Celebration (GHC) in Orlando, Florida. GHC is the largest gathering of women and non-binary technologists in the world.
The conference was a chance for members of industry, academia, and government to convene and discuss the latest developments in the field of technology. The conference covered a wide range of topics, including Artificial Intelligence, with notable presentations on Large Language Models. Attendees participated in engaging discussions centered around the challenges that women and non-binary technologists face today from pay gaps to visibility in healthcare. Networking opportunities, like Brain Dates, fueled collaboration, honest conversations, and the exchange of ideas, fostering an environment for growth and innovation. Finally, volunteering opportunities gave GHC attendees greater access to experts, speakers, and fellow attendees to connect on a more personal level.
Below, our Nyla attendees reflect on several different elements of the conference: Sessions, Brain Dates, and Being a Volunteer.
GHC made it really easy to view and pick sessions using their Agenda Builder. Sessions were organized by tracks, from General Session to Artificial Intelligence to Career Development and more. There was a track for everyone. This section summarizes sessions from three tracks: Artificial Intelligence, Data Science, and Software Engineering, summarized by Data Scientist Jaysha Camacho-Irizarry.
Applications and Tools
In software development, we use CI/CD pipelines to automate repetitive tasks in the software development lifecycle. For example, we automate tasks such as code integration, testing, building, and deployment. CI/CD pipelines greatly enhance efficiency, reduce human error, and accelerate the development process. In the ML and AI space, we use MLOps to manage the end-to-end lifecycle of ML and AI models, from development and training to deployment and monitoring.
This section is about tools and resources we can leverage to develop better software and machine learning models by automating or speeding up the completion of repetitive tasks. For example, we can use tools like Prefect and Legend to implement MLOps. Tools like Review Copilot can help us create labeled datasets, a crucial component for supervised machine learning. Finally, we can take advantage of tools like concurrency to speed up tasks amenable to parallelism.
- Prefect – Prefect is an open-source library and “workflow orchestration tool that lets developers build, observe, and react to data pipelines.” In Friend Up Your Cash App Game: Build a Machine Learning Pipeline, the speakers explained how to integrate Prefect with Google Cloud Platform to recommend friends to Cash App members.
- Legend – Legend is an open-source platform by Goldman Sachs that enables the creation and “usage of data models in numerous data-driven business flows, and a robust Change Management (SDLC) process, which ensures the safe collaboration on and stability.” In Lead with Data: Using Goldman Sachs’ Open-Sourced Legend Platform to Unlock Sustainability Insights, the speakers showed how to leverage Legend to perform data exploration, define relationships between variables to create a shareable data model, and execute queries to generate insights.
- Review Copilot – Review Copilot is an AI-assisted content moderation tool. It highlights contextual information to help moderators make decisions about content. In AI-Assisted Human Content Moderation – Balancing Efficiency and Fairness During Decision Making, the speaker provided examples of how Review Copilot shifts feature extraction and contextualization from humans to systems to improve the accuracy and speed of data labeling.
- Concurrency – In Relay Race: Concurrent Software Development in Python, the speaker shared how to determine when to use asynchronous programming over synchronous programming, and which solution to use (multiprocessing, multithreading, or asynchronous I/O) if concurrency and parallelism are necessary.
Who hasn’t heard of ChatGPT? In recent years, Large Language Models (LLMs) have taken the world by storm, revolutionizing natural language processing (NLP) through advanced deep learning architectures, leading to breakthroughs in text generation, translation, and conversational AI.
Beyond NLP, researchers have also made great advancements in the field of reinforcement learning (RL). In reinforcement learning, agents learn by interacting with their environment and receiving feedback in the form of rewards and penalties. AWS is on a mission to make reinforcement learning more accessible with AWS DeepRacer, a fully autonomous 1/18th scale race car driven by AI.
This section gives a high-level overview of the LLM and RL presentations featured at GHC.
- Large Language Models
- In From Words to Understanding: A Beginner’s Guide for Demystifying ML Models and NLP, the speakers defined terms like artificial intelligence, machine learning, and deep learning. They described the different types of large language models (LLMs). For example, LLMs based on transformer decoders, like Generative Pre-trained Transformer (GPT), are generative models; while LLMs based on transformer encoders, like Bidirectional Encoder Representations from Transformers (BERT), are discriminative models.
- In Is Your Large Language Model Biased? Here’s How to Fix It!, the speakers evaluated an LLM for gender bias using three metrics: toxicity, regard, and honesty. Then, they explained how to use Counterfactual Data Augmentation (CDA) and Low-Rank Adaptation (LoRA) for preprocessing and fine-tuning, respectively.
- In Natural Language Understanding: Create Your Own Language Model, the speakers showed how to preprocess text for fine-tuning a pre-trained GPT model using PyTorch.
- Reinforcement Learning – In AWS DeepRacer: Get Hands-On with Machine Learning, solutions architects from AWS provided an overview of reinforcement learning and provisioned AWS resources so that attendees could rapidly train an agent to race around a track using AWS DeepRacer.
Research and Experimentation
Machine learning is rapidly evolving. So, as practitioners, it is important to stay abreast of the latest research, methodologies, and best practices in our fields. For example, we can use A/B testing to make data-driven decisions by objectively comparing the impact of changes on a target variable. In cybersecurity, we can leverage the MITRE ATT&CK framework to test our cyber defenses and simulate a specific threat or adversary. Finally, in education, we can mine student data to improve learning outcomes.
This section highlights some of the frameworks and methodologies featured at GHC that cut across various disciplines such as marketing, cybersecurity, and education.
- A/B Testing – In Racing for Innovation: How Experimentation Speeds ESPN Data Science Teams to Success, the speakers shared how they use A/B testing, power analysis, key performance metrics (KIPs), and hypothesis testing to evaluate new product changes.
- MITRE – The MITRE ATT&CK framework is a cybersecurity “knowledge base of adversary tactics and techniques based on real-world observations.” In Effective Cyber Attack Detection for a Safer Digital Landscape, the speaker walked through how to mine data sets labeled by tactics and techniques to build anomaly detection models.
- Educational Data Mining – Tiffany Barnes, a distinguished professor of Computer Science at N.C. State University, gave an overview of her award-winning research, The Q-Matrix Method: Mining Student Response Data for Knowledge. The Q-matrix method can be applied to large sets of student data to diagnose student misconceptions and guide student knowledge remediation.
GHC offered opportunities to meet one on one or in small groups with other attendees to discuss any topic. These Brain Dates were a first this year and allowed attendees to add a Brain Date topic and meeting time where any other attendee could sign up to meet. Some topics focused on a leader wanting to share their experience and expertise with a small group. Some topics focused on networking with other professionals from the same city. Some topics were focused on specific work environment issues the creator of the Brain Date wanted to learn how others handled the experience. It was a wide variety of topics and for some Brain Dates, they filled quickly. Lauren Clerkin, Data Scientist, attended several Brain Dates and summarizes the experiences and topics below:
What are some of the challenges in the tech workplace at mid-career level?
Janaki Ganapathi a Software Engineer at Capital One led the conversation asking everyone (4 women/non-binary attendees, including myself) where they worked and what they find challenging. It was interesting to hear how similar our challenges were even though our fields of expertise were different. 5/5 of us did not initially bring up technical challenges. Our first to mind challenges were related to our work environment. Janaki for example had recently moved into the Software Engineering role and faced challenges with managing other software engineers when she was also new to the work role. We talked about how she was hired into that position based on her abilities now. She can learn in her new role while supporting her team. 2/5 in our group were early in their careers and looking for advice on how to move around, when is a good time to change, and should they move within or outside their company. We all took turns sharing our experiences and advice. One bit of important advice I took away was not to wait until you feel stuck in a job, to start thinking about why you like the job. We shared stories of loving a job because of the people we were/are working with and how when those people move(d) onto other opportunities, the job is not the same. We talked about what we can do to maintain a breadth of skill sets, to remain marketable while not actively looking for a job. I pressed how important it was to know your job’s leadership is supportive of training outside your specific job, how important it is to them that you not get stuck in one job. It was a really great discussion. I went in thinking we’d talk about being a woman/non-binary and the challenges we face because of it, but based on the conversations we had it seemed for us the first to mind challenges are how to not get stuck.
Data Science Networking (Mid-Senior Level)
Ekta Kapoor a Data Scientist at BENlabs led the conversation asking everyone (4 women/non-binary attendees, including myself) what our specific daily tasks were as title wise Data Scientist. We each had varying skills with different data formats and database knowledge. Some of us spend a majority of our time working ETL (Extract, Transform, Load) and some of us spend more time running statistical models. Ekta worked primarily with video datasets and developing metrics on interactions with videos. Her daily task focused on Influencer’s YouTube videos. It was really interesting to hear about her datasets because it was nice to know how applicable ETL skills are, how we can turn any formatted data into numbers. Some conversation also touched on the importance of gaining skill sets outside of Python and R. I shared my experience and job needs to learn Docker and Openstack to create virtualized environments for myself and colleagues. I shared how these non Data Science skills still directly helped grow my abilities and marketable skills for when I was ready to change jobs.
Both Brain Dates I attended were great for networking and creating opportunities to connect with people outside the industry I work in. They gave all of us different perspectives on the same type of jobs. It was interesting to listen and kind of motivating to experience enthusiasm from strangers wanting to hear my stories. I recommend future GHC attendees try a Brain Date or two.
Volunteering at GHC
In late June/early July, GHC provided the opportunity to volunteer to work at the conference. The application process was relatively quick and easy with some basic questions about the applicant’s role and experience level, demographics questions, and a brief narrative about why you are interested in being a volunteer. Stephanie Beben, Nyla’s Chief Data Scientist, was selected as a volunteer AKA ‘Hopper.’ She provides an overview of her experience below.
Volunteers were notified of selection just before conference registration opened and were given the opportunity to register with conference fees covered. As a volunteer, there was a small amount of preparation information shared out prior to arriving at the conference. Most importantly, the volunteer schedule. With appropriate notice and reasons, your conference assignments could be switched for issues such as travel schedule. But, volunteers were asked to be in attendance for the full duration of the conference.
Volunteers primarily served as Greeters, Session Chairs, or general helpers for networking sessions. Things were a bit disorganized the first day since the lead for the volunteers was also new to her role, but everyone I encountered as a volunteer was friendly and willing to pitch in. As a session chair, I really enjoyed meeting the other volunteers (typically two of us per session) and also sitting in on talks that I never would have selected for myself. For example, I supported two hands-on workshops related to Hardware Engineering, something I have never had any experience with before. I would strongly recommend applying for a volunteer position given the cost savings for the conference fees, the opportunity to give back to this event and community, and I also felt that it left ample time for me to enjoy the conference as an attendee too.
GHC 2023 featured a variety of opportunities to gain different perspectives. Like many conferences there were opportunities to learn new technologies and network to enhance your career. What is unique to GHC is the diverse activities provided to encourage discussions after the main talks, opportunities to hear from attendees of different career levels, technology experiences, and cultural backgrounds. The team recommends the conference for all technologists to broaden their perspectives on the enhancements of technology and how it is used.
Stephanie Beben, Lauren Clerkin & Jaysha Camacho Irizarry
“Convince Your Boss.” Grace Hopper Celebration, 8 Aug. 2023, ghc.anitab.org/convince-your-boss/.
“Agenda Builder.” Grace Hopper Celebration 2023, gracehoppercelebration.com/embed/?_gl=1%2A1tdhoj2%2A_ga%2AMTkxMTQ1NDUwMS4xNjk1MDc3MzA4%2A_ga_370K3Z436K%2AMTY5NjgwNDgwOC4zLjEuMTY5NjgwNDgyNi4wLjAuMA.. Accessed 17 Jan. 2024.
Prefect Open Source, www.prefect.io/opensource. Accessed 17 Jan. 2024.
Finos. “Legend Project Overview.” Legend Project Overview, www.finos.org/legend. Accessed 17 Jan. 2024.
Racing Simulator Software – AWS DeepRacer – AWS, aws.amazon.com/deepracer/. Accessed 17 Jan. 2024.
The Q-Matrix Method: Mining Student Response Data for Knowledge,
www.semanticscholar.org/paper/The-Q-matrix-Method%3A-Mining-Student-Response-Data-Barnes/38a97cb33582f14b09c856c4d0d69f8a19f0f6fa. Accessed 17 Jan. 2024.