Data Pipeline Engineer Job at Flow Labs, San Francisco, CA

UXRLeHVsM1cwUmFuUDNMM3BUamtzR1RuR1E9PQ==
  • Flow Labs
  • San Francisco, CA

Job Description

Flow Labs is Leading the AI Revolution in Transportation At Flow Labs , we’re harnessing AI and big data to transform how the world moves. Our platform captures massive quantities of real-time transportation data , feeding powerful blended data sets and AI models that optimize and automate how traffic systems operate. We’re not just theorizing about the future— we’re building it now . The Flow Platform is already deployed in dozens of cities , optimizing thousands of traffic signals and managing hundreds of thousands of miles of roadway . And we’re scaling fast. Our AI-driven platform enables transportation agencies to proactively monitor, analyze, and optimize their roadways, making real-time decisions that reduce congestion, lower emissions, and save lives. The results? Travel times cut by 24%, emissions reduced by 21% , and a path to a more efficient, sustainable, and intelligent transportation network. This is a once-in-a-generation opportunity to redefine an industry . If you’re a software engineer who wants to build at scale, tackle complex AI and big data challenges, and make a real-world impact —we want you on our team. About You We're looking for an experienced data pipeline engineer to join our team and help us build the future of traffic management. The ideal candidate: Is excited about making an impact with their work. Is looking for an environment where difficult real-life problems are being tackled. Enjoys working autonomously, with supportive & collaborative teammates from different disciplines. Has experience solving large-scale data ingestion and transformation problems, ideally ingesting petabytes of data. Has experience designing robust, flexible systems for batch and real-time data ingestion. Has experience building services to store and manage massive data sets, leveraging object storage (e.g., S3, Ceph). Has set up, managed, and used distributed messaging systems (Pulsar, Kafka, or similar). Has set up, managed, and used distributed databases like Clickhouse. Has worked with watermarking solutions to combine multiple data streams for real time processing. Strong understanding of statitics and data science. Role and Responsibilities We are a small nimble team tackling challenging problems, so there will be many opportunities to build and take ownership over complex systems that are integral to our platform. Your primary responsibilities will include: Designing and implement large-scale data ingestion workflows Ingest data from multiple providers, some in real time and others in large batches. Develop a “homegrown Hudi-like” layer that manages data in object storage (S3, Ceph). Implementing data watermarking and fusing logic Detect when a complete set of data is available. Trigger data fusion and ensure data is reliable and consistent before it’s aggregated. Building and maintaining aggregation services Aggregate and bucket massive data sets into partitioned ClickHouse tables for front-end consumption. Share aggregation logic/functions with our backend API server to enable on-the-fly or dynamic aggregations when needed. Optiming for performance, scalability, and flexibility Work with Pulsar or similar to handle real-time streaming workloads. Work with Airflow or similar to handle batch-processing workloads. Ensure that the system can performantly and cost-effectively manage petabytes of data Collaborating with cross-functional teams Work closely with backend and DevOps engineers to ensure seamless data flow and easy maintainability. Align with business stakeholders to deliver insights and features needed by end users. Desired Qualifications Extensive experience with data pipelines handling large volumes (tens of terabytes to petabytes). Proficient with Python and/or Golang and a high degree of professional software development experience for building backend/middleware services. Familiarity with object storage (S3, Ceph) and data formats/frameworks (e.g., Hudi, Parquet). ClickHouse or other partitioned database expertise (designing partitioning, indexing, SQL, materialized views). Knowledge of Pulsar or similar messaging systems for real-time event streaming. Understanding of best practices in data modeling, aggregation, and OLAP workloads. High degree of professional software development experience Why Flow Labs? Work with cutting edge technologies. Learn about and solve sophisticated engineering, scientific, and mathematical problems. Contribute to solving a global problem that impacts climate, sustainability, safety, health, economic vitality, equity, and resilient cities. Work with a collaborative, supportive, and diverse team that prioritizes meritocracy, taking action, teaching co-workers new skills, work/life balance, flexibility, and humor Get ample opportunities to grow, learn, and contribute Bonus Points! Technologies and techniques you're likely to work on are listed below. It's helpful (though optional) if you have experience with some of these as well: Kubernetes Experience working with geospatial or timeseries data sets On-prem object storage solutions like Ceph #J-18808-Ljbffr Flow Labs

Job Tags

Flexible hours,

Similar Jobs

Campbell County Health

RETAIL PHARMACY TECH Job at Campbell County Health

 ...Job Description Job Description RETAIL PHARMACY TECH JOB SUMMARY The Retail Pharmacy Technician assists the pharmacy department with inventory management, stocking, maintenance, recordkeeping, filing, and drug orders under the direct supervision of the Director... 

California Department of Public Health

HEALTH PROGRAM SPECIALIST II Job at California Department of Public Health

 ...Join to apply for the HEALTH PROGRAM SPECIALIST II JC-496764 role at California Department of Public Health California Department of Public Health provided pay...  ...Type: Permanent, Full Time Seniority level ~ Entry level Employment type ~ Full-time... 

The Loop Minneapolis

Restaurant and Bar Manager Job at The Loop Minneapolis

 ...You will be responsible for providing customers with a memorable experience. Responsibilities: Supervise and coordinate all restaurant activities Oversee guest services and resolve issues Train and manage front of house personnel Create and adjust staff... 

Stone Logistics Inc

Copy Editor Job at Stone Logistics Inc

 ...Job Description Job Description Position: Copy Editor About Us Stone Logistics Inc. is a trusted leader in logistics and supply chain management, delivering efficient and cost-effective solutions for businesses of all sizes. We are committed to excellence, professionalism... 

Sandys 4Paws Care LLC

Dog Walker/Pet Sitter/Training assistant/Daycare attendant Job at Sandys 4Paws Care LLC

 ...Room for growth. Daycare/training assistant/Dog walker/Pet-sitter wanted to move into...  ...structured dog daycare facility. Assist with Dog walking & Pet-sitting from client's homes....  ...license. A background check is required. Experience a plus, but not required. Must be able to...