Some notes to myself on my first Apache Airflow, Amazon S3 and AWS Redshift
Tips to avoid too much time Googling!
Notes:
Airflow PostgresOperator will accept Redshift SQL statements as long as the statements are syntactically correct. Successful execution of an operation means the python code did not throw an error. In my case, it didn’t mean SQL insertion was succesful. Had to figure this out the hard way!
You have to pass autocommit = True.
dev database at redshift (default database when you create free cluster) already has a table named users. This can mess with you, if you want to work with your own table that is named users. Again, learnt this hard way!
Insert statements should provide column lists, because:
Using this statement without the column list is error-prone because of the default behaviour in case the value list does not match with the column structure. If the value list is less than the column list, Redshift will try to insert the values to first n columns and will not give an error if the data types are compatible.
If you can’t seem to find your AWS cluster, check the region. Perhaps the region got switched and your cluster is not in the region that is currently selected.
I felt Airflow had some weird bugs, at least the version I worked with. There were two occasions where total reset fixed my issues that I could not explain.