Data Pipeline Engineer Job at Flow Labs, San Francisco, CA

UXRLeHVsM1cwUmFuUDNMM3BUamtzR1RuR1E9PQ==
  • Flow Labs
  • San Francisco, CA

Job Description

Flow Labs is Leading the AI Revolution in Transportation At Flow Labs , we’re harnessing AI and big data to transform how the world moves. Our platform captures massive quantities of real-time transportation data , feeding powerful blended data sets and AI models that optimize and automate how traffic systems operate. We’re not just theorizing about the future— we’re building it now . The Flow Platform is already deployed in dozens of cities , optimizing thousands of traffic signals and managing hundreds of thousands of miles of roadway . And we’re scaling fast. Our AI-driven platform enables transportation agencies to proactively monitor, analyze, and optimize their roadways, making real-time decisions that reduce congestion, lower emissions, and save lives. The results? Travel times cut by 24%, emissions reduced by 21% , and a path to a more efficient, sustainable, and intelligent transportation network. This is a once-in-a-generation opportunity to redefine an industry . If you’re a software engineer who wants to build at scale, tackle complex AI and big data challenges, and make a real-world impact —we want you on our team. About You We're looking for an experienced data pipeline engineer to join our team and help us build the future of traffic management. The ideal candidate: Is excited about making an impact with their work. Is looking for an environment where difficult real-life problems are being tackled. Enjoys working autonomously, with supportive & collaborative teammates from different disciplines. Has experience solving large-scale data ingestion and transformation problems, ideally ingesting petabytes of data. Has experience designing robust, flexible systems for batch and real-time data ingestion. Has experience building services to store and manage massive data sets, leveraging object storage (e.g., S3, Ceph). Has set up, managed, and used distributed messaging systems (Pulsar, Kafka, or similar). Has set up, managed, and used distributed databases like Clickhouse. Has worked with watermarking solutions to combine multiple data streams for real time processing. Strong understanding of statitics and data science. Role and Responsibilities We are a small nimble team tackling challenging problems, so there will be many opportunities to build and take ownership over complex systems that are integral to our platform. Your primary responsibilities will include: Designing and implement large-scale data ingestion workflows Ingest data from multiple providers, some in real time and others in large batches. Develop a “homegrown Hudi-like” layer that manages data in object storage (S3, Ceph). Implementing data watermarking and fusing logic Detect when a complete set of data is available. Trigger data fusion and ensure data is reliable and consistent before it’s aggregated. Building and maintaining aggregation services Aggregate and bucket massive data sets into partitioned ClickHouse tables for front-end consumption. Share aggregation logic/functions with our backend API server to enable on-the-fly or dynamic aggregations when needed. Optiming for performance, scalability, and flexibility Work with Pulsar or similar to handle real-time streaming workloads. Work with Airflow or similar to handle batch-processing workloads. Ensure that the system can performantly and cost-effectively manage petabytes of data Collaborating with cross-functional teams Work closely with backend and DevOps engineers to ensure seamless data flow and easy maintainability. Align with business stakeholders to deliver insights and features needed by end users. Desired Qualifications Extensive experience with data pipelines handling large volumes (tens of terabytes to petabytes). Proficient with Python and/or Golang and a high degree of professional software development experience for building backend/middleware services. Familiarity with object storage (S3, Ceph) and data formats/frameworks (e.g., Hudi, Parquet). ClickHouse or other partitioned database expertise (designing partitioning, indexing, SQL, materialized views). Knowledge of Pulsar or similar messaging systems for real-time event streaming. Understanding of best practices in data modeling, aggregation, and OLAP workloads. High degree of professional software development experience Why Flow Labs? Work with cutting edge technologies. Learn about and solve sophisticated engineering, scientific, and mathematical problems. Contribute to solving a global problem that impacts climate, sustainability, safety, health, economic vitality, equity, and resilient cities. Work with a collaborative, supportive, and diverse team that prioritizes meritocracy, taking action, teaching co-workers new skills, work/life balance, flexibility, and humor Get ample opportunities to grow, learn, and contribute Bonus Points! Technologies and techniques you're likely to work on are listed below. It's helpful (though optional) if you have experience with some of these as well: Kubernetes Experience working with geospatial or timeseries data sets On-prem object storage solutions like Ceph #J-18808-Ljbffr Flow Labs

Job Tags

Flexible hours,

Similar Jobs

JONESWORKS

Account Director, Consumer & Lifestyle Job at JONESWORKS

Overview Account Director, Consumer & Lifestyle at JONESWORKS , a strategy-driven communications, marketing, and management agency, is seeking a driven, creative, and highly organized Director (Consumer & Lifestyle) in our NYC office. The ideal candidate is a natural leader... 

Nederveld, Inc.

Forensic Engineering and Fire Investigation Sales Job at Nederveld, Inc.

The Sales position involves cultivating new, as well as managing and growing existing client relationships with insurance, restoration, and legal professionals. Essential Functions Build and maintain relationships with various types of clients through networking...

Superior Paving Corp.

Human Resources Recruiter/Generalist Job at Superior Paving Corp.

 ...re driven to exceed expectations and lead the way in your industry, we want to grow with you. We're hiring a Human Resources Recruiter / Generalist whos passionate about people and performance. At our company, we deliver uncompromising service and products while building... 

LHH

Receptionist Job at LHH

 ...a national healthcare organization to identify a Front Office Receptionist to support multiple medical clinics within North County San Diego...  ...-hour shifts Must be available between 6:30am 10:30pm Weekends Required Pay Rate: $24 per hour Location: North... 

Olive and Finch

Counter and Barista Part Time Job at Olive and Finch

 ...-task while working clean, following recipes, in a high volume, fast paced environment. Must have experience in a high volume coffee bar or bar. Counter (Full and Part Time) - AM & PM Smiling faces need apply. Able to multi-task while working clean, following recipes...